Sanjeev Mohan, SanjMo | MongoDB World 2022
>>Mhm. Mhm. Yeah. Hello, everybody. Welcome to the Cubes. Coverage of Mongo db World 2022. This is the first Mongo live mongo DB World. Since 2019, the Cube has covered a number of of mongo shows actually going back to when the company was called Engine. Some of you may recall Margo since then has done an i p o p o in 2017, it's It's been a rocket ship company. It's up. It'll probably do 1.2 billion in revenue this year. It's got a billion dollars in cash on the balance sheet. Uh, despite the tech clash, it's still got a 19 or $20 million valuation growing above 50% a year. Uh, company just had a really strong quarter, and and there seems to be hitting on all cylinders. My name is Dave Volonte. And here to kick it off with me as Sanjeev Mohan, who was the principal at Sanremo. So great to see you. You become a wonderful cube contributor, Former Gartner analyst. Really sharp? No, the database space in the data space generally really well, so thanks for coming back on >>you. You know, it's just amazing how exciting. The entire data space is like they used to say. Companies are All companies are software companies. All companies are data >>companies, >>so data has become the the foundation. >>They say software is eating the world. Data is eating software and a little little quips here. But this is a good size show. Four or 5000 people? I don't really know exactly. You know the numbers, but it's exciting. And of course, a lot of financial services were here at the Javits Centre. Um, let's let's lay down the basics for people of Mongo, DB is a is a document database, but they've been advancing. That's a document database as an alternative to R D. B M s. Explain that, but explain also how Mongo has broadened its capabilities and serving a lot more use cases. >>So that's my forte is like databases technology. But before even I talk about that, I have to say I am blown away by this mongo db world because mongo db uh, in beckons to all of us during the pandemic has really come of age, and it's a billion dollar company. Now we are in this brand new Javits Centre That's been built during the pandemic. And and now the company is holding this event the high 1000 people last year. So I think this company has really grown. And why has it drawn is because its offerings have grown to more developers than just a document database document databases. Revolution revolutionised the whole DBM s space where no sequel came up. Because for a change, you don't need a structured schema. You could start bringing data in this document model scheme, uh, like varying schema. But since then, they've added, uh, things like such. So they have you seen such? They added a geospatial. They had a time series last year, and this year they keep adding more and more so like, for example, they are going to add some column store indexes. So from being a purely transactional, they are now starting to address analytical. And they're starting to address more use cases, like, you know, uh, like what? What was announced this morning at keynote was faceted search. So they're expanding the going deeper and deeper into these other data >>structures. Taking Lucy made a search of first class citizens, but I want to ask you some basic questions about document database. So it's no fixed schemes. You put anything in there? Actually, so more data friendly. They're trying to simplify the use of data. Okay, that's that's pretty clear. >>What are the >>trade offs of a document database? >>So it's not like, you know, one technology has solved every problem. Every technology comes with its own tradeoffs. So in a document, you basically get rid of joining tables with primary foreign keys because you can have a flexible schemer and so and wouldn't sing single document. So it's very easy to write and and search. But when you have a lot of repeated elements and you start getting more and more complex, your document size can start expanding quite a bit because you're trying to club everything into a single space. So So that is where the complexity goes >>up. So what does that mean for for practitioner, it means they have to think about what? How they how they are ultimately gonna structure, how they're going to query so they can get the best performances that right. So they're gonna put some time in up front in order to make it pay back at the tail end, but clearly it's it's working. But is that the correct way of thinking about >>100% in, uh, the sequel world? You didn't care about the sequel. Analytical queries You just cared about how your data model was structured and then sequel would would basically such any model. But in the new sequel world, you have to know your patterns before you. You invest into the database so it's changed that equation where you come in knowing what you are signing up. >>So a couple of questions, if I can kind of Colombo questions so to Margo talks about how it's really supporting mission critical applications and at the same time, my understanding is the architecture of mongo specifically, or a document database in general. But specifically, you've got a a primary, uh, database, and you and that is the sort of the master, if you will, right and then you can create secondaries. But so help me square the circle between mission critical and really maybe a more of a focus on, say, consistency versus availability. Do customers have to sort of think about and design in that availability? How do they do that? How a Mongol customers handling that. >>So I have to say, uh, my experience of mongo db was was that the whole company, the whole ethos was developed a friendly. So, to be honest, I don't think Mongo DB was as much focused on high availability, disaster, recovery, even security. To some extent, they were more focused on developer productivity. >>And you've experienced >>simplicity. Make it simple, make the developers productive as fast as you can. What has really, uh, was an inflexion point for Mongo DB was the launch of Atlas because the atlas they were able to introduce all of these management features and hide it abstracted from the end users. So now they've got, you know, like 2014 is when Atlas came out and it was in four regions. But today they're in 100 regions, so they keep expanding, then every hyper scale cloud provider, and they've abstracted that whole managed. >>So Atlas, of course, is the managed database as a service in the cloud. And so it's those clouds, cloud infrastructure and cloud tooling that has allowed them to go after those high available application. My other question is when you talk about adding search, geospatial time series There are a lot of specialised databases that take time series persons. You have time series specialists that go deep into time series can accompany like Mongo with an all in one strategy. Uh, how close can they get to that functionality? Do they have to be? You know, it's kind of a classic Microsoft, you know, maybe not perfect, but good enough. I mean, can they compete with those other areas? Uh, with those other specialists? And what happens to those specialists if the answer is yes. What's your take on that? If that question >>makes sense So David, this is not a mongo db only issue This is this is an issue with, you know, anytime serious database, any graph database Should I put a graph database or should I put a multifunctional database multidimensional database? And and I really think there is no right or wrong answer. It just really comes down to your use case. If you have an extremely let's, uh, complex graph, you know, then maybe you should go with best of breed purpose built database. But more and more, we're starting to see that organisations are looking to simplify their environment by going in for maybe a unified database that has multiple data structures. Yeah, well, >>it's certainly it's interesting when you hear Mongo speak. They don't They don't call out Oracle specifically, but when they talk about legacy r d m r d B m s that don't scale and are complex and are expensive, they're talking about Oracle first. And of course, there are others. Um, And then when they talk about, uh, bespoke databases the horses for courses, databases that they show a picture of that that's like the poster child for Amazon. Of course, they don't call out Amazon. They're a great partner of Amazon's. But those are really the sort of two areas that mangoes going after, Um, now Oracle. Of course, we'll talk about their converged strategy, and they're taking a similar approach. But so help us understand the difference. There is just because they're sort of or close traditional r d B M s, and they have all the drawbacks associated with that. But by the way, there are some benefits as well. So how do you see that all playing >>out? So you know it. Really, uh, it's coming down to the the origins of these databases. Uh, I think they're converging to a point where they are offering similar services. And if you look at some of the benchmark numbers or you talk to users, I from a business point of view, I I don't think there's too much of a difference. Uh, technology writes. The difference is that Mongo DB started in the document space. They were more interested in availability rather than consistency. Oracle started in the relation database with focus on financial services, so asset compliance is what they're based on. And since then they've been adding other pieces, so so they differ from where they started. Oracle has been in the industry for some since 19 seventies, so they have that maturity. But then they have that legacy, >>you know, I love. Recently, Oracle announced the mongo db uh, kpi. So basically saying why? Why leave Oracle when you can just, you know, do the market? So that, to me, is a sign that Mongo DB is doing well because the Oracle calls you out, whether your workday or snowflake or mongo. You know, whoever that's a sign to me that you've got momentum and you're stealing share in that marketplace, and clearly Mongo is they're growing at 50 plus percent per year. So thinking about the early I mentioned 10 gen Early on, I remember that one of the first conferences I went to mongo conferences. It was just It was all developers. A lot of developers here as well. But they have really, since 2014, expanded the capabilities you talk about, Atlas, you talked about all these other you know, types of databases that they've added. If it seems like Mongo is becoming a platform company, uh, what are your thoughts on that in terms of them sort of up levelling the message there now, a billion dollar plus company. What's the next? You know, wave for Mongo. >>So, uh, Oracle announced mongo db a p i s a W s has document d. B has cost most db so they all have a p. I compatible a p. I s not the source code because, you know, mongo DB has its own SPL licence, so they have written their own layer on top. But at the end of the day, you know, if you if you these companies have to keep innovating to catch up with Mongo DB because we can announce a brand new capability, then all these other players have to catch up. So other cloud providers have 80% or so of capabilities, but they'll never have 100% of what Mongo DB has. So people who are diehard Mongo DB fans they prefer to stay on mongo db. They are now able to write more applications like you know, mongo DB bought realm, which is their front end. Uh, like, you know, like, if you're on social media kind of thing, you can build your applications and sink it with Atlas. So So mongo DB is now at a point where they are adding more capabilities that more like developers like, You know, five G is coming. Autonomous cars are coming, so now they can address Iot kind of use cases. So that's why it's becoming such a juggle, not because it's becoming a platform rather than a single document database. >>So atlases, the near the midterm future. Today it's about 60% of revenues, but they have what we call self serve, which is really the traditional on premise stuff. They're connecting those worlds. You're bringing up the point that. Of course, they go across clouds. You also bring up the point that they've got edge plays. We're gonna talk to Verizon later on today. And they're they've got, uh, edge edge activity going on with developers. I I call it Super Cloud. Right, This layer that floats above. Now, of course, a lot of the super Cloud concert says we're gonna hide the underlying complexity. But for developers, they wanna they might want to tap those primitives, so presumably will let them do that. But But that hybrid that what we call Super Cloud that is a new wave of innovation, is it not? And do you? Do you agree with that? And do you see that as a real opportunity from Mongo in terms of penetrating a new tan? >>Yes. So I see this is a new opportunity. In fact, one of the reasons mongo DB has grown so quickly is because they are addressing more markets than they had three pandemic. Um, Also, there are all gradations of users. Some users want full control. They want an eye as kind of, uh, someone passed. And some businesses are like, you know, we don't care. We don't want to deal with the database. So today we heard, uh, mongo db. Several went gear. So now they have surveillance capability, their past. But if you if you're more into communities, they have communities. Operator. So they're addressing the full stack of different types of developers different workloads, different geographical regions. So that that's why the market is expected. >>We're seeing abstraction layers, you know, throughout the started a physical virtual containers surveillance and eventually SuperClubs Sanjeev. Great analysis. Thanks so much for taking your time to come with the cube. Alright, Keep it right there. But right back, right after this short break. This is Dave Volonte from the Javits Centre. Mongo db World 2022. Thank you. >>Mm.
SUMMARY :
So great to see you. like they used to say. You know the numbers, but it's exciting. So they have you seen such? Taking Lucy made a search of first class citizens, but I want to ask you So it's not like, you know, one technology has solved every problem. But is that the correct way of thinking about But in the new sequel world, you have to know your patterns before you. is the sort of the master, if you will, right and then you can create secondaries. So I have to say, uh, my experience of mongo db was was that the So now they've got, you know, like 2014 is when Atlas came out and So Atlas, of course, is the managed database as a service in the cloud. let's, uh, complex graph, you know, then maybe you should go So how do you see that all playing in the industry for some since 19 seventies, so they have that So that, to me, is a sign that Mongo DB is doing well because the Oracle calls you out, db. They are now able to write more applications like you know, mongo DB bought realm, So atlases, the near the midterm future. So now they have surveillance We're seeing abstraction layers, you know, throughout the started a physical virtual containers surveillance
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Dave Volonte | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Verizon | ORGANIZATION | 0.99+ |
Four | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
1.2 billion | QUANTITY | 0.99+ |
2017 | DATE | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
$20 million | QUANTITY | 0.99+ |
Mongo | ORGANIZATION | 0.99+ |
Margo | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
Lucy | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
this year | DATE | 0.99+ |
19 | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
5000 people | QUANTITY | 0.99+ |
100 regions | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
four regions | QUANTITY | 0.98+ |
pandemic | EVENT | 0.98+ |
Today | DATE | 0.97+ |
Margo | ORGANIZATION | 0.97+ |
first | QUANTITY | 0.97+ |
1000 people | QUANTITY | 0.97+ |
about 60% | QUANTITY | 0.97+ |
one technology | QUANTITY | 0.97+ |
2019 | DATE | 0.95+ |
first conferences | QUANTITY | 0.95+ |
above 50% a year | QUANTITY | 0.94+ |
single space | QUANTITY | 0.94+ |
Atlas | TITLE | 0.94+ |
mongo DB | TITLE | 0.93+ |
two areas | QUANTITY | 0.93+ |
single document | QUANTITY | 0.93+ |
atlases | TITLE | 0.92+ |
19 seventies | DATE | 0.92+ |
this morning | DATE | 0.91+ |
Atlas | ORGANIZATION | 0.9+ |
Mongo DB | TITLE | 0.89+ |
billion dollar | QUANTITY | 0.86+ |
one strategy | QUANTITY | 0.85+ |
Mm. | PERSON | 0.84+ |
50 plus percent per year | QUANTITY | 0.84+ |
Javits Centre | LOCATION | 0.83+ |
>100% | QUANTITY | 0.82+ |
couple | QUANTITY | 0.81+ |
Mongo db World 2022 | EVENT | 0.81+ |
single document database | QUANTITY | 0.79+ |
Gartner | ORGANIZATION | 0.77+ |
mongo db | TITLE | 0.77+ |
10 gen | DATE | 0.77+ |
three | QUANTITY | 0.77+ |
Mongo DB | ORGANIZATION | 0.74+ |
billion dollars | QUANTITY | 0.74+ |
mongo db | TITLE | 0.72+ |
Sanremo | LOCATION | 0.72+ |
MongoDB World 2022 | EVENT | 0.69+ |
Sanjeev Mohan, SanjMo & Nong Li, Okera | AWS Startup Showcase
(cheerful music) >> Hello everyone, welcome to today's session of theCUBE's presentation of AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics, Cloud Management Tools, featuring Okera from the cloud management migration track. I'm John Furrier, your host. We've got two great special guests today, Nong Li, founder and CTO of Okera, and Sanjeev Mohan, principal @SanjMo, and former research vice president of big data and advanced analytics at Gartner. He's a legend, been around the industry for a long time, seen the big data trends from the past, present, and knows the future. Got a great lineup here. Gentlemen, thank you for this, so, life in the trenches, lessons learned across compliance, cloud migration, analytics, and use cases for Fortune 1000s. Thanks for joining us. >> Thanks for having us. >> So Sanjeev, great to see you, I know you've seen this movie, I was saying that in the open, you've at Gartner seen all the visionaries, the leaders, you know everything about this space. It's changing extremely fast, and one of the big topics right out of the gate is not just innovation, we'll get to that, that's the fun part, but it's the regulatory compliance and audit piece of it. It's keeping people up at night, and frankly if not done right, slows things down. This is a big part of the showcase here, is to solve these problems. Share us your thoughts, what's your take on this wide-ranging issue? >> So, thank you, John, for bringing this up, and I'm so happy you mentioned the fact that, there's this notion that it can slow things down. Well I have to say that the old way of doing governance slowed things down, because it was very much about control and command. But the new approach to data governance is actually in my opinion, it's liberating data. If you want to democratize or monetize, whatever you want to call it, you cannot do it 'til you know you can trust said data and it's governed in some ways, so data governance has actually become very interesting, and today if you want to talk about three different areas within compliance regulatory, for example, we all know about the EU GDPR, we know California has CCPA, and in fact California is now getting even a more stringent version called CPRA in a couple of years, which is more aligned to GDPR. That is a first area we know we need to comply to that, we don't have any way out. But then, there are other areas, there is insider trading, there is how you secure the data that comes from third parties, you know, vendors, partners, suppliers, so Nong, I'd love to hand it over to you, and see if you can maybe throw some light into how our customers are handling these use cases. >> Yeah, absolutely, and I love what you said about balancing agility and liberating, in the face of what may be seen as things that slow you down. So we work with customers across verticals with old and new regulations, so you know, you brought up GDPR. One of our clients is using this to great effect to power their ecosystem. They are a very large retail company that has operations and customers across the world, obviously the importance of GDPR, and the regulations that imposes on them are very top of mind, and at the same time, being able to do effective targeting analytics on customer information is equally critical, right? So they're exactly at that spot where they need this customer insight for powering their business, and then the regulatory concerns are extremely prevalent for them. So in the context of GDPR, you'll hear about things like consent management and right to be forgotten, right? I, as a customer of that retailer should say "I don't want my information used for this purpose," right? "Use it for this, but not this." And you can imagine at a very, very large scale, when you have a billion customers, managing that, all the data you've collected over time through all of your devices, all of your telemetry, really, really challenging. And they're leveraging Okera embedded into their analytics platform so they can do both, right? Their data scientists and analysts who need to do everything they're doing to power the business, not have to think about these kind of very granular customer filtering requirements that need to happen, and then they leverage us to do that. So that's kind of new, right, GDPR, relatively new stuff at this point, but we obviously also work with customers that have regulations from a long long time ago, right? So I think you also mentioned insider trading and that supply chain, so we'll talk to customers, and they want really data-driven decisions on their supply chain, everything about their production pipeline, right? They want to understand all of that, and of course that makes sense, whether you're the CFO, if you're going to make business decisions, you need that information readily available, and supply chains as we know get more and more and more complex, we have more and more integrated into manufacturing and other verticals. So that's your, you're a little bit stuck, right? You want to be data-driven on those supply chain analytics, but at the same time, knowing the details of all the supply chain across all of your dependencies exposes your internal team to very high blackout periods or insider trading concerns, right? For example, if you knew Apple was buying a bunch of something, that's maybe information that only a select few people can have, and the way that manifests into data policies, 'cause you need the ability to have very, very scalable, per employee kind of scalable data restriction policies, so they can do their job easier, right? If we talk about speeding things up, instead of a very complex process for them to get approved, and approved on SEC regulations, all that kind of stuff, you can now go give them access to the part of the supply chain that they need, and no more, and limit their exposure and the company's exposure and all of that kind of stuff. So one of our customers able to do this, getting two orders of magnitude, a 100x reduction in the policies to manage the system like that. >> When I hear you talking like that, I think the old days of "Oh yeah, regulatory, it kind of slows down innovation, got to go faster," pretty basic variables, not a lot of combination of things to check. Now with cloud, there seems to be combinations, Sanjeev, because how complicated has the regulatory compliance and audit environment gotten in the past few years, because I hear security in a supply chain, I hear insider threats, I mean these are security channels, not just compliance department G&A kind of functions. You're talking about large-scale, potentially combinations of access, distribution, I mean it seems complicated. How much more complicated is it now, just than it was a few years ago? >> So, you know the way I look at it is, I'm just mentioning these companies just as an example, when PayPal or Ebay, all these companies started, they started in California. Anybody who ever did business on Ebay or PayPal, guess where that data was? In the US in some data center. Today you cannot do it. Today, data residency laws are really tough, and so now these organizations have to really understand what data needs to remain where. On top of that, we now have so many regulations. You know, earlier on if you were healthcare, you needed to be HIPAA compliant, or banking PCI DSS, but today, in the cloud, you really need to know, what data I have, what sensitive data I have, how do I discover it? So that data discovery becomes really important. What roles I have, so for example, let's say I work for a bank in the US, and I decide to move to Germany. Now, the old school is that a new rule will be created for me, because of German... >> John: New email address, all these new things happen, right? >> Right, exactly. So you end up with this really, a mass of rules and... And these are all static. >> Rules and tools, oh my god. >> Yeah. So Okera actually makes a lot of this dynamic, which reduces your cloud migration overhead, and Nong used some great examples, in fact, sorry if I take just a second, without mentioning any names, there's one of the largest banks in the world is going global in the digital space for the first time, and they're taking Okera with them. So... >> But what's the point? This is my next topic in cloud migration, I want to bring this up because, complexity, when you're in that old school kind of data center, waterfall, these old rules and tools, you have to roll this out, and it's a pain in the butt for everybody, it's a hassle, huge hassle. Cloud gives the agility, we know that, and cloud's becoming more secure, and I think now people see the on-premise, certainly things that'd be on-premises for secure things, I get that, but when you start getting into agility, and you now have cloud regions, you can start being more programmatic, so I want to get you guys' thoughts on the cloud migration, how companies who are now lifting and shifting, replatforming, what's the refactoring beyond that, because you can replatform in the cloud, and still some are kind of holding back on that. Then when you're in the cloud, the ones that are winning, the companies that are winning are the ones that are refactoring in the cloud. Doing things different with new services. Sanjeev, you start. >> Yeah, so you know, in fact lot of people tell me, "You know, we are just going to lift and shift into the cloud." But you're literally using cloud as a data center. You still have all the, if I may say, junk you had on-prem, you just moved it into the cloud, and now you're paying for it. In cloud, nothing is free. Every storage, every processing, you're going to pay for it. The most successful companies are the ones that are replatforming, they are taking advantage of the platform as a service or software as a service, so that includes things like, you pay as you go, you pay for exactly the amount you use, so you scale up and scale down or scale out and scale in, pretty quickly, you know? So you're handling that demand, so without replatforming, you are not really utilizing your- >> John: It's just hosting. >> Yeah, you're just hosting. >> It's basically hosting if you're not doing anything right there. >> Right. The reason why people sometimes resist to replatform, is because there's a hidden cost that we don't really talk about, PaaS adds 3x to IaaS cost. So, some organizations that are very mature, and they have a few thousand people in the IT department, for them, they're like "No, we just want to run it in the cloud, we have the expertise, and it's cheaper for us." But in the long run, to get the most benefit, people should think of using cloud as a service. >> Nong what's your take, because you see examples of companies, I'll just call one out, Snowflake for instance, they're essentially a data warehouse in the cloud, they refactored and they replatformed, they have a competitive advantage with the scale, so they have things that others don't have, that just hosting. Or even on-premise. The new model developing where there's real advantages, and how should companies think about this when they have to manage these data lakes, and they have to manage all these new access methods, but they want to maintain that operational stability and control and growth? >> Yeah, so. No? Yeah. >> There's a few topics that are all (indistinct) this topic. (indistinct) enterprises moving to the cloud, they do this maybe for some cost savings, but a ton of it is agility, right? The motor that the business can run at is just so much faster. So we'll work with companies in the context of cloud migration for data, where they might have a data warehouse they've been using for 20 years, and building policies over that time, right? And it's taking a long time to go proof of access and those kind of things, made more sense, right? If it took you months to procure a physical infrastructure, get machines shipped to your data center, then this data access taking so long feels okay, right? That's kind of the same rate that everything is moving. In the cloud, you can spin up new infrastructure instantly, so you don't want approvals for getting policies, creating rules, all that stuff that Sanjeev was talking about, that being slow is a huge, huge problem. So this is a very common environment that we see where they're trying to do that kind of thing. And then, for replatforming, again, they've been building these roles and processes and policies for 20 years. What they don't want to do is take 20 years to go migrate all that stuff into the cloud, right? That's probably an experience nobody wants to repeat, and frankly for many of them, people who did it originally may or may not be involved in this kind of effort. So we work with a lot of companies like that, they have their, they want stability, they got to have the business running as normal, they got to get moving into the new infrastructure, doing it in a new way that, you know, with all the kind of lessons learned, so, as Sanjeev said, one of these big banks that we work with, that classical story of on-premise data warehousing, maybe a little bit of Hadoop, moved onto AWS, S3, Snowflake, that kind of setup, extremely intricate policies, but let's go reimagine how we can do this faster, right? What we like to talk about is, you're an organization, you need a design that, if you onboarded 1000 more data users, that's got to be way, way easier than the first 10 you onboarded, right? You got to get it to be easier over time, in a really, really significant way. >> Talk about the data authorization safety factor, because I can almost imagine all the intricacies of these different tools creates specialism amongst people who operate them. And each one might have their own little authorization nuance. Trend is not to have that siloed mentality. What's your take on clients that want to just "Hey, you know what? I want to have the maximum agility, but I don't want to get caught in the weeds on some of these tripwires around access and authorization." >> Yeah, absolutely, I think it's real important to get the balance of it, right? Because if you are an enterprise, or if you have diversive teams, you want them to have the ability to use tools as best of breed for their purpose, right? But you don't want to have it be so that every tool has its own access and provisioning and whatever, that's definitely going to be a security, or at least, a lot of friction for you to get things going. So we think about that really hard, I think we've seen great success with things like SSO and Okta, right? Unifying authentication. We think there's a very, very similar thing about to happen with authorization. You want that single control plane that can integrate with all the tools, and still get the best of what you need, but it's much, much easier (indistinct). >> Okta's a great example, if people don't want to build their own thing and just go with that, same with what you guys are doing. That seems to be the dots that are connecting you, Sanjeev. The ease of use, but yet the stability factor. >> Right. Yeah, because John, today I may want to bring up a SQL editor to go into Snowflake, just as an example. Tomorrow, I may want to use the Azure Bot, you know? I may not even want to go to Snowflake, I may want to go to an underlying piece of data, or I may use Power BI, you know, for some reason, and come from Azure side, so the point is that, unless we are able to control, in some sort of a centralized manner, we will not get that consistency. And security you know is all or nothing. You cannot say "Well, I secured my Snowflake, but if you come through HTFS, Hadoop, or some, you know, that is outside of my realm, or my scope," what's the point? So that is why it is really important to have a watertight way, in fact I'm using just a few examples, maybe tomorrow I decide to use a data catalog, or I use Denodo as my data virtualization and I run a query. I'm the same identity, but I'm using different tools. I may use it from home, over VPN, or I may use it from the office, so you want this kind of flexibility, all encompassed in a policy, rather than a separate rule if you do this and this, if you do that, because then you end up with literally thousands of rules. >> And it's never going to stop, either, it's like fashion, the next tool's going to come out, it's going to be cool, and people are going to want to use it, again, you don't want to have to then move the train from the compliance side this way or that way, it's a lot of hassle, right? So we have that one capability, you can bring on new things pretty quickly. Nong, am I getting it right, this is kind of like the trend, that you're going to see more and more tools and/or things that are relevant or, certain use cases that might justify it, but yet, AppSec review, compliance review, I mean, good luck with that, right? >> Yeah, absolutely, I mean we certainly expect tools to continue to get more and more diverse, and better, right? Most innovation in the data space, and I think we... This is a great time for that, a lot of things that need to happen, and so on and so forth. So I think one of the early goals of the company, when we were just brainstorming, is we don't want data teams to not be able to use the tools because it doesn't have the right security (indistinct), right? Often those tools may not be focused on that particular area. They're great at what they do, but we want to make sure they're enabled, they do some enterprise investments, they see broader adoption much easier. A lot of those things. >> And I can hear the sirens in the background, that's someone who's not using your platform, they need some help there. But that's the case, I mean if you don't get this right, there are some consequences, and I think one of the things I would like to bring up on next track is, to talk through with you guys is, the persona pigeonhole role, "Oh yeah, a data person, the developer, the DevOps, the SRE," you start to see now, developers and with cloud developers, and data folks, people, however they get pigeonholed, kind of blending in, okay? You got data services, you got analytics, you got data scientists, you got more democratization, all these things are being kicked around, but the notion of a developer now is a data developer, because cloud is about DevOps, data is now a big part of it, it's not just some department, it's actually blending in. Just a cultural shift, can you guys share your thoughts on this trend of data people versus developers now becoming kind of one, do you guys see this happening, and if so, how? >> So when, John, I started my career, I was a DBA, and then a data architect. Today, I think you cannot have a DBA who's not a developer. That's just my opinion. Because there is so much of CICD, DevOps, that happens today, and you know, you write your code in Python, you put it in version control, you deploy using Jenkins, you roll back if there's a problem. And then, you are interacting, you're building your data to be consumed as a service. People in the past, you would have a thick client that would connect to the database over TCP/IP. Today, people don't want to connect over TCP/IP necessarily, they want to go by HTTP. And they want an API gateway in the middle. So, if you're a data architect or DBA, now you have to worry about, "I have a REST API call that's coming in, how am I going to secure that, and make sure that people are allowed to see that?" And that was just yesterday. >> Exactly. Got to build an abstraction layer. You got to build an abstraction layer. The old days, you have to worry about schema, and do all that, it was hard work back then, but now, it's much different. You got serverless, functions are going to show way... It's happening. >> Correct, GraphQL, and semantic layer, that just blows me away because, it used to be, it was all in database, then we took it out of database and we put it in a BI tool. So we said, like BusinessObjects started this whole trend. So we're like "Let's put the semantic layer there," well okay, great, but that was when everything was surrounding BusinessObjects and Oracle Database, or some other database, but today what if somebody brings Power BI or Tableau or Qlik, you know? Now you don't have a semantic layer access. So you cannot have it in the BI layer, so you move it down to its own layer. So now you've got a semantic layer, then where do you store your metrics? Same story repeats, you have a metrics layer, then the data centers want to do feature engineering, where do you store your features? You have a feature store. And before you know, this stack has disaggregated over and over and over, and then you've got layers and layers of specialization that are happening, there's query accelerators like Dremio or Trino, so you've got your data here, which Nong is trying really hard to protect, and then you've got layers and layers and layers of abstraction, and networks are fast, so the end user gets great service, but it's a nightmare for architects to bring all these things together. >> How do you tame the complexity? What's the bottom line? >> Nong? >> Yeah, so, I think... So there's a few things you need to do, right? So, we need to re-think how we express security permanence, right? I think you guys have just maybe in passing (indistinct) talked about creating all these rules and all that kind of stuff, that's been the way we've done things forever. We get to think about policies and mechanisms that are much more dynamic, right? You need to really think about not having to do any additional work, for the new things you add to the system. That's really, really core to solving the complexity problem, right? 'Cause that gets you those orders of magnitude reduction, system's got to be more expressive and map to those policies. That's one. And then second, it's got to be implemented at the right layer, right, to Sanjeev's point, close to the data, and it can service all of those applications and use cases at the same time, and have that uniformity and breadth of support. So those two things have to happen. >> Love this universal data authorization vision that you guys have. Super impressive, we had a CUBE Conversation earlier with Nick Halsey, who's a veteran in the industry, and he likes it. That's a good sign, 'cause he's seen a lot of stuff, too, Sanjeev, like yourself. This is a new thing, you're seeing compliance being addressed, and with programmatic, I'm imagining there's going to be bots someday, very quickly with AI that's going to scale that up, so they kind of don't get in the innovation way, they can still get what they need, and enable innovation. You've got cloud migration, which is only going faster and faster. Nong, you mentioned speed, that's what CloudOps is all about, developers want speed, not things in days or hours, they want it in minutes and seconds. And then finally, ultimately, how's it scale up, how does it scale up for the people operating and/or programming? These are three major pieces. What happens next? Where do we go from here, what's, the customer's sitting there saying "I need help, I need trust, I need scale, I need security." >> So, I just wrote a blog, if I may diverge a bit, on data observability. And you know, so there are a lot of these little topics that are critical, DataOps is one of them, so to me data observability is really having a transparent view of, what is the state of your data in the pipeline, anywhere in the pipeline? So you know, when we talk to these large banks, these banks have like 1000, over 1000 data pipelines working every night, because they've got that hundred, 200 data sources from which they're bringing data in. Then they're doing all kinds of data integration, they have, you know, we talked about Python or Informatica, or whatever data integration, data transformation product you're using, so you're combining this data, writing it into an analytical data store, something's going to break. So, to me, data observability becomes a very critical thing, because it shows me something broke, walk me down the pipeline, so I know where it broke. Maybe the data drifted. And I know Okera does a lot of work in data drift, you know? So this is... Nong, jump in any time, because I know we have use cases for that. >> Nong, before you get in there, I just want to highlight a quick point. I think you're onto something there, Sanjeev, because we've been reporting, and we believe, that data workflows is intellectual property. And has to be protected. Nong, go ahead, your thoughts, go ahead. >> Yeah, I mean, the observability thing is critically important. I would say when you want to think about what's next, I think it's really effectively bridging tools and processes and systems and teams that are focused on data production, with the data analysts, data scientists, that are focused on data consumption, right? I think bridging those two, which cover a lot of the topics we talked about, that's kind of where security almost meets, that's kind of where you got to draw it. I think for observability and pipelines and data movement, understanding that is essential. And I think broadly, on all of these topics, where all of us can be better, is if we're able to close the loop, get the feedback loop of success. So data drift is an example of the loop rarely being closed. It drifts upstream, and downstream users can take forever to figure out what's going on. And we'll have similar examples related to buy-ins, or data quality, all those kind of things, so I think that's really a problem that a lot of us should think about. How do we make sure that loop is closed as quickly as possible? >> Great insight. Quick aside, as the founder CTO, how's life going for you, you feel good? I mean, you started a company, doing great, it's not drifting, it's right in the stream, mainstream, right in the wheelhouse of where the trends are, you guys have a really crosshairs on the real issues, how you feeling, tell us a little bit about how you see the vision. >> Yeah, I obviously feel really good, I mean we started the company a little over five years ago, there are kind of a few things that we bet would happen, and I think those things were out of our control, I don't think we would've predicted GDPR security and those kind of things being as prominent as they are. Those things have really matured, probably as best as we could've hoped, so that feels awesome. Yeah, (indistinct) really expanded in these years, and it feels good. Feels like we're in the right spot. >> Yeah, it's great, data's competitive advantage, and certainly has a lot of issues. It could be a blocker if not done properly, and you're doing great work. Congratulations on your company. Sanjeev, thanks for kind of being my cohost in this segment, great to have you on, been following your work, and you continue to unpack it at your new place that you started. SanjMo, good to see your Twitter handle taking on the name of your new firm, congratulations. Thanks for coming on. >> Thank you so much, such a pleasure. >> Appreciate it. Okay, I'm John Furrier with theCUBE, you're watching today's session presentation of AWS Startup Showcase, featuring Okera, a hot startup, check 'em out, great solution, with a really great concept. Thanks for watching. (calm music)
SUMMARY :
and knows the future. and one of the big topics and I'm so happy you in the policies to manage of things to check. and I decide to move to Germany. So you end up with this really, is going global in the digital and you now have cloud regions, Yeah, so you know, if you're not doing anything right there. But in the long run, to and they have to manage all Yeah, so. In the cloud, you can spin up get caught in the weeds and still get the best of what you need, with what you guys are doing. the Azure Bot, you know? are going to want to use it, a lot of things that need to happen, the SRE," you start to see now, People in the past, you The old days, you have and networks are fast, so the for the new things you add to the system. that you guys have. So you know, when we talk Nong, before you get in there, I would say when you want I mean, you started a and I think those things and you continue to unpack it Thank you so much, of AWS Startup Showcase,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Nick Halsey | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
US | LOCATION | 0.99+ |
Nong Li | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Germany | LOCATION | 0.99+ |
Ebay | ORGANIZATION | 0.99+ |
PayPal | ORGANIZATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Tomorrow | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Today | DATE | 0.99+ |
One | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
Snowflake | TITLE | 0.99+ |
today | DATE | 0.99+ |
Python | TITLE | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
Tableau | TITLE | 0.99+ |
first time | QUANTITY | 0.99+ |
3x | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
100x | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Okera | ORGANIZATION | 0.99+ |
Informatica | ORGANIZATION | 0.98+ |
two orders | QUANTITY | 0.98+ |
Nong | ORGANIZATION | 0.98+ |
SanjMo | PERSON | 0.98+ |
second | QUANTITY | 0.98+ |
Power BI | TITLE | 0.98+ |
1000 | QUANTITY | 0.98+ |
tomorrow | DATE | 0.98+ |
two things | QUANTITY | 0.98+ |
Qlik | TITLE | 0.98+ |
each one | QUANTITY | 0.97+ |
thousands of rules | QUANTITY | 0.97+ |
1000 more data users | QUANTITY | 0.96+ |
ORGANIZATION | 0.96+ | |
first 10 | QUANTITY | 0.96+ |
Okera | PERSON | 0.96+ |
AWS | ORGANIZATION | 0.96+ |
hundred, 200 data sources | QUANTITY | 0.95+ |
HIPAA | TITLE | 0.94+ |
EU | ORGANIZATION | 0.94+ |
CCPA | TITLE | 0.94+ |
over 1000 data pipelines | QUANTITY | 0.93+ |
single | QUANTITY | 0.93+ |
first area | QUANTITY | 0.93+ |
two great special guests | QUANTITY | 0.92+ |
BusinessObjects | TITLE | 0.92+ |
Analyst Predictions 2023: The Future of Data Management
(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)
SUMMARY :
and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Curt Monash | PERSON | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Christian Kleinerman | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Tony Bear | PERSON | 0.99+ |
25% | QUANTITY | 0.99+ |
2021 | DATE | 0.99+ |
last year | DATE | 0.99+ |
65% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
five-year | QUANTITY | 0.99+ |
TigerGraph | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two services | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
RisingWave Labs | ORGANIZATION | 0.99+ |
Breaking Analysis: What we hope to learn at Supercloud22
>> From theCUBE studios in Palo Alto in Boston bringing you data driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante. >> The term Supercloud is somewhat new, but the concepts behind it have been bubbling for years, early last decade when NIST put forth a definition of cloud computing it said services had to be accessible over a public network essentially cutting the on-prem crowd out of the cloud conversation. Now a guy named Chuck Hollis, who was a field CTO at EMC at the time and a prolific blogger objected to that criterion and laid out his vision for what he termed a private cloud. Now, in that post, he showed a workload running both on premises and in a public cloud sharing the underlying resources in an automated and seamless manner. What later became known more broadly as hybrid cloud that vision as we now know, really never materialized, and we were left with multi-cloud sets of largely incompatible and disconnected cloud services running in separate silos. The point is what Hollis laid out, IE the ability to abstract underlying infrastructure complexity and run workloads across multiple heterogeneous estates with an identical experience is what super cloud is all about. Hello and welcome to this week's Wikibon cube insights powered by ETR and this breaking analysis. We share what we hope to learn from super cloud 22 next week, next Tuesday at 9:00 AM Pacific. The community is gathering for Supercloud 22 an inclusive pilot symposium hosted by theCUBE and made possible by VMware and other founding partners. It's a one day single track event with more than 25 speakers digging into the architectural, the technical, structural and business aspects of Supercloud. This is a hybrid event with a live program in the morning running out of our Palo Alto studio and pre-recorded content in the afternoon featuring industry leaders, technologists, analysts and investors up and down the technology stack. Now, as I said up front the seeds of super cloud were sewn early last decade. After the very first reinvent we published our Amazon gorilla post, that scene in the upper right corner here. And we talked about how to differentiate from Amazon and form ecosystems around industries and data and how the cloud would change IT permanently. And then up in the upper left we put up a post on the old Wikibon Wiki. Yeah, it used to be a Wiki. Check out my hair by the way way no gray, that's how long ago this was. And we talked about in that post how to compete in the Amazon economy. And we showed a graph of how IT economics were changing. And cloud services had marginal economics that looked more like software than hardware at scale. And this would reset, we said opportunities for both technology sellers and buyers for the next 20 years. And this came into sharper focus in the ensuing years culminating in a milestone post by Greylock's Jerry Chen called Castles in the Cloud. It was an inspiration and catalyst for us using the term Supercloud in John Furrier's post prior to reinvent 2021. So we started to flesh out this idea of Supercloud where companies of all types build services on top of hyperscale infrastructure and across multiple clouds, going beyond multicloud 1.0, if you will, which was really a symptom, as we said, many times of multi-vendor at least that's what we argued. And despite its fuzzy definition, it resonated with people because they knew something was brewing, Keith Townsend the CTO advisor, even though he frankly, wasn't a big fan of the buzzy nature of the term Supercloud posted this awesome Blackboard on Twitter take a listen to how he framed it. Please play the clip. >> Is VMware the right company to make the super cloud work, term that Wikibon came up with to describe the taking of discreet services. So it says RDS from AWS, cloud compute engines from GCP and authentication from Azure to build SaaS applications or enterprise applications that connect back to your data center, is VMware's cross cloud vision 'cause it is just a vision today, the right approach. Or should you be looking towards companies like HashiCorp to provide this overall capability that we all agree, or maybe you don't that we need in an enterprise comment below your thoughts. >> So I really like that Keith has deep practitioner knowledge and lays out a couple of options. I especially like the examples he uses of cloud services. He recognizes the need for cross cloud services and he notes this capability is aspirational today. Remember this was eight or nine months ago and he brings HashiCorp into the conversation as they're one of the speakers at Supercloud 22 and he asks the community, what they think, the thing is we're trying to really test out this concept and people like Keith are instrumental as collaborators. Now I'm sure you're not surprised to hear that mot everyone is on board with the Supercloud meme, in particular Charles Fitzgerald has been a wonderful collaborator just by his hilarious criticisms of the concept. After a couple of super cloud posts, Charles put up his second rendition of "Supercloudifragilisticexpialidoucious". I mean, it's just beautiful, but to boot, he put up this picture of Baghdad Bob asking us to just stop, Bob's real name is Mohamed Said al-Sahaf. He was the minister of propaganda for Sadam Husein during the 2003 invasion of Iraq. And he made these outrageous claims of, you know US troops running in fear and putting down their arms and so forth. So anyway, Charles laid out several frankly very helpful critiques of Supercloud which has led us to really advance the definition and catalyze the community's thinking on the topic. Now, one of his issues and there are many is we said a prerequisite of super cloud was a super PaaS layer. Gartner's Lydia Leong chimed in saying there were many examples of successful PaaS vendors built on top of a hyperscaler some having the option to run in more than one cloud provider. But the key point we're trying to explore is the degree to which that PaaS layer is purpose built for a specific super cloud function. And not only runs in more than one cloud provider, Lydia but runs across multiple clouds simultaneously creating an identical developer experience irrespective of a state. Now, maybe that's what Lydia meant. It's hard to say from just a tweet and she's a sharp lady, so, and knows more about that market, that PaaS market, than I do. But to the former point at Supercloud 22, we have several examples. We're going to test. One is Oracle and Microsoft's recent announcement to run database services on OCI and Azure, making them appear as one rather than use an off the shelf platform. Oracle claims to have developed a capability for developers specifically built to ensure high performance low latency, and a common experience for developers across clouds. Another example we're going to test is Snowflake. I'll be interviewing Benoit Dageville co-founder of Snowflake to understand the degree to which Snowflake's recent announcement of an application development platform is perfect built, purpose built for the Snowflake data cloud. Is it just a plain old pass, big whoop as Lydia claims or is it something new and innovative, by the way we invited Charles Fitz to participate in Supercloud 22 and he decline saying in addition to a few other somewhat insulting things there's definitely interesting new stuff brewing that isn't traditional cloud or SaaS but branding at all super cloud doesn't help either. Well, indeed, we agree with part of that and we'll see if it helps advanced thinking and helps customers really plan for the future. And that's why Supercloud 22 has going to feature some of the best analysts in the business in The Great Supercloud Debate. In addition to Keith Townsend and Maribel Lopez of Lopez research and Sanjeev Mohan from former Gartner analyst and principal at SanjMo participated in this session. Now we don't want to mislead you. We don't want to imply that these analysts are hopping on the super cloud bandwagon but they're more than willing to go through the thought experiment and mental exercise. And, we had a great conversation that you don't want to miss. Maribel Lopez had what I thought was a really excellent way to think about this. She used TCP/IP as an historical example, listen to what she said. >> And Sanjeev Mohan has some excellent thoughts on the feasibility of an open versus de facto standard getting us to the vision of Supercloud, what's possible and what's likely now, again, I don't want to imply that these analysts are out banging the Supercloud drum. They're not necessarily doing that, but they do I think it's fair to say believe that something new is bubbling and whether it's called Supercloud or multicloud 2.0 or cross cloud services or whatever name you choose it's not multicloud of the 2010s and we chose Supercloud. So our goal here is to advance the discussion on what's next in cloud and Supercloud is meant to be a term to describe that future of cloud and specifically the cloud opportunities that can be built on top of hyperscale, compute, storage, networking machine learning, and other services at scale. And that is why we posted this piece on Answering the top 10 questions about Supercloud. Many of which were floated by Charles Fitzgerald and others in the community. Why does the industry need another term what's really new and different? And what is hype? What specific problems does Supercloud solve? What are the salient characteristics of Supercloud? What's different beyond multicloud? What is a super pass? Is it necessary to have a Supercloud? How will applications evolve on superclouds? What workloads will run? All these questions will be addressed in detail as a way to advance the discussion and help practitioners and business people understand what's real today. And what's possible with cloud in the near future. And one other question we'll address is who will build super clouds? And what new entrance we can expect. This is an ETR graphic that we showed in a previous episode of breaking analysis, and it lays out some of the companies we think are building super clouds or in a position to do so, by the way the Y axis shows net score or spending velocity and the X axis depicts presence in the ETR survey of more than 1200 respondents. But the key callouts to this slide in addition to some of the smaller firms that aren't yet showing up in the ETR data like Chaossearch and Starburst and Aviatrix and Clumio but the really interesting additions are industry players Walmart with Azure, Capital one and Goldman Sachs with AWS, Oracle, with Cerner. These we think are early examples, bubbling up of industry clouds that will eventually become super clouds. So we'll explore these and other trends to get the community's input on how this will all play out. These are the things we hope you'll take away from Supercloud 22. And we have an amazing lineup of experts to answer your question. Technologists like Kit Colbert, Adrian Cockcroft, Mariana Tessel, Chris Hoff, Will DeForest, Ali Ghodsi, Benoit Dageville, Muddu Sudhakar and many other tech athletes, investors like Jerry Chen and In Sik Rhee the analyst we featured earlier, Paula Hansen talking about go to market in a multi-cloud world Gee Rittenhouse talking about cloud security, David McJannet, Bhaskar Gorti of Platform9 and many, many more. And of course you, so please go to theCUBE.net and register for Supercloud 22, really lightweight reg. We're not doing this for lead gen. We're doing it for collaboration. If you sign in you can get the chat and ask questions in real time. So don't miss this inaugural event Supercloud 22 on August 9th at 9:00 AM Pacific. We'll see you there. Okay. That's it for today. Thanks for watching. Thank you to Alex Myerson who's on production and manages the podcast. Kristen Martin and Cheryl Knight. They help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some really wonderful editing. Thank you to all. Remember these episodes are all available as podcasts wherever you listen, just search breaking analysis podcast. I publish each week on wikibon.com and Siliconangle.com. And you can email me at David.Vellantesiliconangle.com or DM me at Dvellante, comment on my LinkedIn post. Please do check out ETR.AI for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE insights powered by ETR. Thanks for watching. And we'll see you next week in Palo Alto at Supercloud 22 or next time on breaking analysis. (calm music)
SUMMARY :
This is breaking analysis and buyers for the next 20 years. Is VMware the right company is the degree to which that PaaS layer and specifically the cloud opportunities
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alex Myerson | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
David McJannet | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
Paula Hansen | PERSON | 0.99+ |
Jerry Chen | PERSON | 0.99+ |
Adrian Cockcroft | PERSON | 0.99+ |
Maribel Lopez | PERSON | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Kristen Martin | PERSON | 0.99+ |
Chuck Hollis | PERSON | 0.99+ |
Charles Fitz | PERSON | 0.99+ |
Charles | PERSON | 0.99+ |
Chris Hoff | PERSON | 0.99+ |
Keith | PERSON | 0.99+ |
Mariana Tessel | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Ali Ghodsi | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Charles Fitzgerald | PERSON | 0.99+ |
Mohamed Said al-Sahaf | PERSON | 0.99+ |
Kit Colbert | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Rob Hof | PERSON | 0.99+ |
Clumio | ORGANIZATION | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
Gee Rittenhouse | PERSON | 0.99+ |
Aviatrix | ORGANIZATION | 0.99+ |
Chaossearch | ORGANIZATION | 0.99+ |
Benoit Dageville | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
NIST | ORGANIZATION | 0.99+ |
Lydia Leong | PERSON | 0.99+ |
Muddu Sudhakar | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Cerner | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Capital one | ORGANIZATION | 0.99+ |
David.Vellantesiliconangle.com | OTHER | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
EMC | ORGANIZATION | 0.99+ |
2010s | DATE | 0.99+ |
Will DeForest | PERSON | 0.99+ |
more than 1200 respondents | QUANTITY | 0.99+ |
one day | QUANTITY | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
2021 | DATE | 0.99+ |
next week | DATE | 0.99+ |
Supercloud 22 | EVENT | 0.99+ |
theCUBE.net | OTHER | 0.99+ |
Bhaskar Gorti | PERSON | 0.99+ |
Supercloud | ORGANIZATION | 0.98+ |
each week | QUANTITY | 0.98+ |
eight | DATE | 0.98+ |
SanjMo | ORGANIZATION | 0.98+ |
Lydia | PERSON | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
PaaS | TITLE | 0.98+ |
more than 25 speakers | QUANTITY | 0.98+ |
Snowflake | ORGANIZATION | 0.98+ |
Platform9 | ORGANIZATION | 0.97+ |
first | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Hollis | PERSON | 0.97+ |
Sadam Husein | PERSON | 0.97+ |
second rendition | QUANTITY | 0.97+ |
Boston | LOCATION | 0.97+ |
SiliconANGLE | ORGANIZATION | 0.96+ |
more than one cloud provider | QUANTITY | 0.96+ |
both | QUANTITY | 0.95+ |
super cloud 22 | EVENT | 0.95+ |
The Great Supercloud Debate | Supercloud22
[Music] welcome to the great super cloud debate a power panel of three top technology industry analysts maribel lopez is here she's the founder and principal analyst at lopez research keith townsend is ceo and founder of the cto advisor and sanjeev mohan is principal at sanjmo super cloud is a term that we've used to describe the future of cloud architectures the idea is that super clouds are built on top of hyperscaler capex infrastructure and the idea is it goes beyond multi-cloud the premise being that multi-cloud is primarily a symptom of multi-vendor or m a or both and results in more stove we're going to talk about that super cloud's meant to connote a new architecture that leverages the underlying primitives of hyperscale clouds but hides and abstracts that complexity of each of their respective clouds and adds new value on top of that with services and a continuous experience a similar or identical experience across more than one cloud people may say hey that's multi-cloud we're going to talk about that as well so with that as brief background um i'd like to first welcome our painless guys thanks so much for coming on thecube it's great to see you all again great to be here thank you to be here so i'm going to start with maribel you know what i just described what's your reaction to that is it just like what like cloud is supposed to be is that really what multi-cloud is do you agree with the premise that multi-cloud has really been you know what like chuck whitten from dell calls it it's been multi-cloud by default i call it a symptom of multi-vendor what's your take on on what this is oh wow dave another term here we go right more more to define for people but okay the reality is i agree that it's time for something new something evolved right whether we call that super cloud or something else i you know i don't want to really debate the term but we need to move beyond where we are today in multi-cloud and into if we want to call it cloud 5 multi-cloud 2 whatever we want to call it i believe that we're at the next generation that we have to define what that next generation is but if you think about it we went from public to private to hybrid to multi and every time you have a discussion with somebody about cloud you spend 10 minutes defining what you're talking about so this doesn't seem any different to me so let's just go with super cloud for the moment and see where we go and you know if you're interested after everybody else makes their comments i got a few thoughts about what super cloud might mean as well yeah great so i and i agree with you when we like i said in a recent post you could call it cl cloud you know multi-cloud 2.0 but it's something different is happening and sanjeev i know you're not a you're not a big fan of buzz words either but i wonder if you could weigh in on this topic uh you mean by the way sanjeev is at the mit cdo iq conference a great conference uh in boston uh and so he's it's a public place so we're going to have i think you viewed his line when he's not speaking please go ahead yeah so you know i come from a pedigree of uh being an analyst of uh firms that love inventing new terms i am not a big fan of inventing new terms i feel that when we come up with a new term i spend all my time standing on a stage trying to define what it is it takes me away from trying to solve the problem so so i'm you know i find these terms to be uh words of convenience like for example big data you know big data to me may not mean anything but big data connotes some of this modern way of handling vast volumes of data that traditional systems could not handle so from that point of view i'm i'm completely okay with super cloud but just inventing a new term is what i have called in my previous sessions tyranny of jargons where we have just too many jargons and uh and they resonate with i.t people they do not resonate with the business people business people care about the problem they don't care about what we and i t called them yeah and i think this is a really important point that you make and by the way we're not trying to create a new industry category per se yeah we leave that to gartner that's why actually i like super cloud because nobody's going to use that no vendor's going to use the term super cloud it's just too buzzy so so but but but it brings up the point about practitioners and so keith i want to bring you in so the what we've talked about and i'll just sort of share some some thoughts on the problems that we see and and get keith get your practitioner view most clouds most companies use multiple clouds we all kind of agree on that i think and largely these clouds operate in silos and they have their own development environment their own operating environment different apis different primitives and the functionality of a particular cloud doesn't necessarily extend to other clouds so the problem is that increases friction for customers increases cost increases security risk and so there's this promise maribel multi-cloud 2.0 that's going to solve that problem so keith my question to you is is is that an accurate description of the problem that practitioners face today do what did i miss and i wonder if you could elaborate so i think we'll get into some of the detail later on why this is a problem specifically around technologies but if we think about it in the abstract most customers have their hands full dealing with one cloud like we'll you know through m a and such and you zoom in and you look at companies that have multiple clouds or multi-cloud from result of mma mna m a activity you'll see that most of that is in silos so organizationally the customer may have multiple clouds but sub orchid silos they're generally a single silo in a single cloud so as you think about being able to take advantage of of tooling across the multicloud of what dave you guys are calling the super cloud this becomes a serious problem it's just a skill problem it's too much capability uh across too many things that look completely different than another okay so dave can i pick up on that please i'd love i was gonna just go to you maribel please chime in here okay so if we think about what we're talking about with super cloud and what keith just mentioned remember when we went to see tcp ip and the whole idea was like how do we get computers to talk to each other in a more standardized way how do we get data to move in a more standardized way i think that the problem we have with multi-cloud right now is that we don't have that so i think that's sort of a ground level of getting us to your super cloud premise is that and and you know google's tried it with anthony's like everybody every hyperscaler has tried their like right one to run anywhere but that abstraction layer you talk about what whatever we want to call it is super necessary and it's sort of the foundation so if you really think about it we've spent like 15 years or so building out all the various components of cloud and now's the time to take it so that cloud is actually more of an operating model versus a place there's at least a base level of it that is vendor neutral and then to your point the value that's going to be built on top of that you know people been trying to commoditize the basic infrastructure for a while now and i think that's what you're seeing in your super cloud multi-cloud whatever you want to call it the infrastructure is the infrastructure and then what would have been traditionally that past layer and above is where we're going to start to see some real innovation but we still haven't gotten to that point where you can do visibility observability manageability across that really complex cloud stack that we have the reason i the reason i love that tcpip example hm is because it changed the industry and it had an ecosystem effect in sanjiv the the the example that i first example that i used was snowflake a company that you're very familiar with that is sort of hiding all that complexity and right and so we're not there yet but please chime in on this topic uh you gotta you gotta view it again uh after you building upon what maribel said you know to me uh this sounds like a multi-cloud operating system where uh you know you need that kind of a common uh set of primitives and layers because if you go in in the typical multi-cloud process you've got multiple identities and you can't have that you how can you govern if i'm if i have multiple identities i don't have observability i don't know what's going on across my different stacks so to me super cloud is that call it single pane of glass or or one way through which i'm unifying my experience my my technology interfaces my integration and uh and i as an end user don't even care which uh which cloud i'm in it makes no difference to me it makes a difference to the vendor the vendor may say this is coming from aws and this is coming from gcp or azure but to the end user it is a consistent experience with consistent id and and observability and governance so that to me makes it a big difference and so one of floyer's contribution conversation was in order to have a super cloud you got to have a super pass i'm like oh boy people are going to love that but the point being that that allows a consistent developer experience and to maribel's earlier point about tcp it explodes the ecosystem because the ecosystem can now write to that super pass if you will those apis so keith do you do do you buy that number one and number two do you see that industries financial services and healthcare are actually going to be on clouds or what we call super clouds so sanjeev hit on a really key aspect of this is identity let's make this real they you love talk about data collaboration i love senji's point on the business user kind of doesn't care if this is aws versus super cloud versus etc i was collaborating with the client and he wanted to send video file and the video file uh his organization's access control policy didn't allow him to upload or share the file from their preferred platform so he had to go out to another cloud provider and create yet another identity for that data on the cloud same data different identity a proper super cloud will enable me to simply say as a end user here's a set of data or data sets and i want to share a collaboration a collaborator and that requires cross identity across multiple clouds so even before we get to the past layer and the apis we have to solve the most basic problem which is data how do we stop data scientists from shipping snowballs to a location because we can't figure out the identity the we're duplicating the same data within the same cloud because we can't share identity across customer accounts or etc we we have to solve these basic thoughts before we get to supercloud otherwise we get to us a turtles all the way down thing so we'll get into snowflake and what snowflake can do but that's what happens when i want to share my snowflake data across multiple clouds to a different platform yeah you have to go inside the snowflake cloud which leads right so i would say to keith's question sanjeev snowflake i think is solving that problem but then he brings up the other problem which is what if i want to share share data outside the snowflake cloud so that gets to the point of visit open is it closed and so sanji chime in on the sort of snowflake example and in maribel i wonder if there are networking examples because that's that's keith's saying you got to fix the plumbing before you get these higher level abstractions but sanji first yeah so i so i actually want to go and talk a little bit about network but from a data and analytics point of view so i never built upon what what keith said so i i want to give an example let's say i am getting fantastic web logs i and i know who uh uh how much time they're spending on my web pages and which pages they're looking at so i have all of that now all of that is going into cloud a now it turns out that i use google analytics or maybe i use adobe's you know analytics uh suite now that is giving me the business view and i'm trying to do customer journey analytics and guess what i now have two separate identities two separate products two separate clouds if i and i as an id person no problem i can solve any problem by writing tons of code but why would i do that if i can have that super pass or a multi-cloud layout where i've got like a single way of looking at my network traffic my customer metrics and i can do my customer journey analytics it solves a huge problem and then i can share that data with my with my partners so they can see data about their products which is a combination of data from different uh clouds great thank you uh maribel please i think we're having a lord of the rings moment here with the run one room to rule them all concept and i'm not sure that anybody's actually incented to do that right so i think there's two levels of the stack i think in the basic we're talking a lot about we don't have the basic fundamentals of how do you move data authenticate data secure data do data lineage all that stuff across different clouds right we haven't even spoken right now i feel like we're really just talking about the public cloud venue and we haven't even pulled in the fact that people are doing hybrid cloud right so hybrid cloud you know then you're talking about you've got hardware vendors and you've got hyperscaler vendors and there's two or three different ways of doing things so i honestly think that something will emerge like if we think about where we are in technology today it's almost like we need back to that operating system that sanji was talking about like we need a next generation operating system like nobody wants to build the cloud mouse driver of the 21st century over and over again right we need something like that as a foundation layer but then on top of it you know there's obviously a lot of opportunity to build differentiation like when i think back on what happened with cloud amazon remained aws remained very powerful and popular because people invested in building things on amazon right they created a platform and it took a while for anybody else to catch up to that or to have that kind of presence and i still feel that way when i talk to companies but having said that i talked to retail the other day and they were like hey we spent a long time building an abstraction layer on top of the clouds so that our developers could basically write once and run anywhere but they were a massive global presence retailer that's not something that everybody can do so i think that we are still missing a gap i don't know if that exactly answers your question but i i do feel like we're kind of in this chicken and egg thing which comes first and nobody wants to necessarily invest in like oh well you know amazon has built a way to do this so we're all just going to do it the amazon way right it seems like that's not going to work either but i think you bring up a really important point which there is going to be no one ring to rule them all you're going to have you know vmware is going to solve its multi-cloud problem snowflake's going to do a very has a very specific you know purpose-built system for it itself databricks is going to do its thing and it's going to be you know more open source i would companies like aviatrix i would say cisco even is going to go out and solve this problem dell showed at uh at dell tech world a thing called uh project alpine which is basically storage across clouds they're going to be many super clouds we're going to get maybe super cloud stove pipes but but the point is however for a specific problem in a set of use cases they will be addressing those and solving incremental value so keith maybe we won't have that single cloud operating you know system but we'll have multiple ones what are your thoughts on that yeah we're definitely going to have multiple ones uh the there is no um there is no community large enough or influential enough to push a design take maribel's example of the mega retailer they've solved it but they're not going to that's that's competitive that's their competitive advantage they're not going to share that with the rest of us and open source that and force that upon the industry via just agreement from everyone else so we're not going to get uh the level of collaboration either originated by the cloud provider originated from user groups that solves this problem big for us we will get silos in which this problem is solved we'll get groups working together inside of maybe uh industry or subgroups within the industry to say that hey we're going to share or federate identity across our three or four or five or a dozen organizations we'll be able to share data we're going to solve that data problem but in the same individual organizations in another part of the super cloud problem are going to again just be silos i can't uh i can't run machine learning against my web assets for the community group that i run because that's not part of the working group that solved a different data science problem so yes we're going to have these uh bifurcations and forks within the super cloud the question is where is the focus for each individual organization where do i point my smart people and what problems they solve okay i want to throw out a premise and get you guys reaction to it because i think this again i go back to the maribel's tcpip example it changed the industry it opened up an ecosystem and to me this is what digital transformation is all about you've got now industry participants marc andreessen says every company is a software company you've now got industry participants and here's some examples it's not i wouldn't call them true super clouds yet but walmart's doing their hybrid thing with azure you got goldman sachs announced at the last reinvent and it's going to take its tools its software its data and which is on-prem and connect that to the aws cloud and actually deliver a service capital one we saw sanjiv at the snowflake summit is is taking their tooling and doing it now granted just within snowflake and aws but i fully expect them to expand that across other clouds these are industry examples capital one software is the name of the division that are now it's to the re reason why i don't get so worried that we're not solving the lord of the rings problem that maribel mentioned is because it opens up tremendous opportunities for companies we got like just under five minutes left i want to throw that out there and see what you guys think yeah i would just i want to build upon what maribel said i love what she said you're not going to build a mouse driver so if multi-cloud supercloud is a multi-cloud os the mouse driver would be identity or maybe it's data quality and to teach point that data quality is not going to come from a single vendor that is going to come from a different vendor whose job is to to harmonize data because there might be data might be for the same identity but it may be a different granularity level so you cannot just mix and match so you need to have some sort of like resolution and that is is an example of a driver for multi-cloud interesting okay so you know octa might be the identity cloud or z scaler might be the security cloud or calibre has its cloud etc any thoughts on that keith or maribel yeah so let's talk about where the practical challenges run into this we did some really great research that was sponsored by one of the large cloud providers in which we took all we looked at all the vmware cloud solutions when i say vmware cloud vmware has a lot of products across multi-cloud now in the rock broadcloud portfolio but we're talking about the og solution vmware vsphere it would seem like on paper if i put vmware vsphere in each cloud that is therefore a super cloud i think we would all agree to that in principle what we found in our research was that when we put hands on keyboard the differences of the clouds show themselves in the training gap and that skills gap between the clouds show themselves if i needed to expose less our favorite friend a friend a tc pip address to the public internet that is a different process on each one of the clouds that needs to be done on each one of the clouds and not abstracted in vmware vsphere so as we look at the nuance yes we can give the big controls but where the capital ones the uh jp morgan chase just spent two billion dollars on this type of capability where the spin effort is done is taking it from that 80 percent to that 90 95 experience and that's where the effort and money is spent on that last mile maribel we're out of time but please you know bring us home give us your closing thoughts hey i think we're still going to be working on what the multi-cloud thing is for a while and you know super cloud i think is a direction of the future of cloud computing but we got some real problems to solve around authentication uh identity data lineage data security so i think those are going to be sort of the tactical things that we're working on for the next couple years right guys always a pleasure having you on the cube i hope we see you around keith i understand you're you're bringing your airstream to vmworld or vmware explorer putting it on the on the floor i can't wait to see that and uh mrs cto advisor i'm sure we'll be uh by your side so looking forward to that hopefully sanjeev and maribel we'll see you uh on the circuit as well yes hope to see you there right looking forward to hopefully even doing some content with you guys at vmware explorer too awesome looking forward all right keep it right there for more content from super cloud 22 right back [Music] you
SUMMARY :
that problem so keith my question to you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
marc andreessen | PERSON | 0.99+ |
maribel lopez | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
amazon | ORGANIZATION | 0.99+ |
10 minutes | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
two billion dollars | QUANTITY | 0.99+ |
maribel | PERSON | 0.99+ |
sanjeev | PERSON | 0.99+ |
four | QUANTITY | 0.99+ |
cisco | ORGANIZATION | 0.99+ |
five | QUANTITY | 0.99+ |
keith | PERSON | 0.99+ |
80 percent | QUANTITY | 0.99+ |
sanji | PERSON | 0.99+ |
walmart | ORGANIZATION | 0.99+ |
aviatrix | ORGANIZATION | 0.99+ |
boston | LOCATION | 0.99+ |
sanjmo | ORGANIZATION | 0.99+ |
cto advisor | ORGANIZATION | 0.99+ |
two levels | QUANTITY | 0.98+ |
15 years | QUANTITY | 0.98+ |
sanjeev mohan | PERSON | 0.98+ |
21st century | DATE | 0.98+ |
more than one cloud | QUANTITY | 0.97+ |
uh project alpine | ORGANIZATION | 0.96+ |
each one | QUANTITY | 0.96+ |
aws | ORGANIZATION | 0.96+ |
lopez | ORGANIZATION | 0.96+ |
each cloud | QUANTITY | 0.96+ |
under five minutes | QUANTITY | 0.96+ |
senji | PERSON | 0.96+ |
today | DATE | 0.95+ |
one | QUANTITY | 0.94+ |
first example | QUANTITY | 0.94+ |
first | QUANTITY | 0.94+ |
vmware | TITLE | 0.93+ |
both | QUANTITY | 0.93+ |
one room | QUANTITY | 0.92+ |
vmworld | ORGANIZATION | 0.92+ |
azure | TITLE | 0.92+ |
single cloud | QUANTITY | 0.92+ |
keith townsend | PERSON | 0.91+ |
one way | QUANTITY | 0.91+ |
ORGANIZATION | 0.9+ | |
three different ways | QUANTITY | 0.89+ |
two separate | QUANTITY | 0.89+ |
single way | QUANTITY | 0.89+ |
each | QUANTITY | 0.88+ |
adobe | TITLE | 0.88+ |
each individual organization | QUANTITY | 0.86+ |
gartner | ORGANIZATION | 0.86+ |
dell | ORGANIZATION | 0.86+ |
aws | TITLE | 0.86+ |
vmware | ORGANIZATION | 0.85+ |
uh | ORGANIZATION | 0.85+ |
single pane | QUANTITY | 0.84+ |
next couple years | DATE | 0.83+ |
single vendor | QUANTITY | 0.83+ |
a dozen organizations | QUANTITY | 0.83+ |
floyer | PERSON | 0.82+ |
tons of code | QUANTITY | 0.81+ |
one cloud | QUANTITY | 0.81+ |
super cloud | TITLE | 0.8+ |
maribel | LOCATION | 0.79+ |
three top technology industry analysts | QUANTITY | 0.78+ |
dell tech world | ORGANIZATION | 0.78+ |
dave | PERSON | 0.77+ |
clouds | ORGANIZATION | 0.77+ |
Data Power Panel V3
(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)
SUMMARY :
And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Doug | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Frank | PERSON | 0.99+ |
Frank Klutman | PERSON | 0.99+ |
Tony Baers | PERSON | 0.99+ |
Mars | LOCATION | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Venus | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
2012 | DATE | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Holger Mueller | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
last year | DATE | 0.99+ |
$5 billion | QUANTITY | 0.99+ |
$10,000 | QUANTITY | 0.99+ |
14 vendors | QUANTITY | 0.99+ |
Last year | DATE | 0.99+ |
last week | DATE | 0.99+ |
San Francisco | LOCATION | 0.99+ |
SanjMo | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
8,500 users | QUANTITY | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
32 concurrent users | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
Mongo | ORGANIZATION | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Ahana | ORGANIZATION | 0.99+ |
DaaS | ORGANIZATION | 0.99+ |
EMR | ORGANIZATION | 0.99+ |
32 | QUANTITY | 0.99+ |
Atlas | ORGANIZATION | 0.99+ |
Delta | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
each | QUANTITY | 0.99+ |
Athena | ORGANIZATION | 0.99+ |
next week | DATE | 0.99+ |
Tony Baer, Doug Henschen and Sanjeev Mohan, Couchbase | Couchbase Application Modernization
(upbeat music) >> Welcome to this CUBE Power Panel where we're going to talk about application modernization, also success templates, and take a look at some new survey data to see how CIOs are thinking about digital transformation, as we get deeper into the post isolation economy. And with me are three familiar VIP guests to CUBE audiences. Tony Bear, the principal at DB InSight, Doug Henschen, VP and principal analyst at Constellation Research and Sanjeev Mohan principal at SanjMo. Guys, good to see you again, welcome back. >> Thank you. >> Glad to be here. >> Thanks for having us. >> Glad to be here. >> All right, Doug. Let's get started with you. You know, this recent survey, which was commissioned by Couchbase, 650 CIOs and CTOs, and IT practitioners. So obviously very IT heavy. They responded to the following question, "In response to the pandemic, my organization accelerated our application modernization strategy and of course, an overwhelming majority, 94% agreed or strongly agreed." So I'm sure, Doug, that you're not shocked by that, but in the same survey, modernizing existing technologies was second only behind cyber security is the top investment priority this year. Doug, bring us into your world and tell us the trends that you're seeing with the clients and customers you work with in their modernization initiatives. >> Well, the survey, of course, is spot on. You know, any Constellation Research analyst, any systems integrator will tell you that we saw more transformation work in the last two years than in the prior six to eight years. A lot of it was forced, you know, a lot of movement to the cloud, a lot of process improvement, a lot of automation work, but transformational is aspirational and not every company can be a leader. You know, at Constellation, we focus our research on those market leaders and that's only, you know, the top 5% of companies that are really innovating, that are really disrupting their markets and we try to share that with companies that want to be fast followers, that these are the next 20 to 25% of companies that don't want to get left behind, but don't want to hit some of the same roadblocks and you know, pioneering pitfalls that the real leaders are encountering when they're harnessing new technologies. So the rest of the companies, you know, the cautious adopters, the laggards, many of them fall by the wayside, that's certainly what we saw during the pandemic. Who are these leaders? You know, the old saw examples that people saw at the Amazons, the Teslas, the Airbnbs, the Ubers and Lyfts, but new examples are emerging every year. And as a consumer, you immediately recognize these transformed experiences. One of my favorite examples from the pandemic is Rocket Mortgage. No disclaimer required, I don't own stock and you're not client, but when I wanted to take advantage of those record low mortgage interest rates, I called my current bank and some, you know, stall word, very established conventional banks, I'm talking to you Bank of America, City Bank, and they were taking days and weeks to get back to me. Rocket Mortgage had the locked in commitment that day, a very proactive, consistent communications across web, mobile, email, all customer touchpoints. I closed in a matter of weeks an entirely digital seamless process. This is back in the gloves and masks days and the loan officer came parked in our driveway, wiped down an iPad, handed us that iPad, we signed all those documents digitally, completely electronic workflow. The only wet signatures required were those demanded by the state. So it's easy to spot these transformed experiences. You know, Rocket had most of that in place before the pandemic, and that's why they captured 8% of the national mortgage market by 2020 and they're on track to hit 10% here in 2022. >> Yeah, those are great examples. I mean, I'm not a shareholder either, but I am a customer. I even went through the same thing in the pandemic. It was all done in digital it was a piece of cake and I happened to have to do another one with a different firm and stuck with that firm for a variety of reasons and it was night and day. So to your point, it was a forced merge to digital. If you were there beforehand, you had real advantage, it could accelerate your lead during the pandemic. Okay, now Tony bear. Mr. Bear, I understand you're skeptical about all this buzz around digital transformation. So in that same survey, the data shows that the majority of respondents said that their digital initiatives were largely reactive to outside forces, the pandemic compliance changes, et cetera. But at the same time, they indicated that the results while somewhat mixed were generally positive. So why are you skeptical? >> The reason being, and by the way, I have nothing against application modernization. The problem... I think the problem I ever said, it often gets conflated with digital transformation and digital transformation itself has become such a buzzword and so overused that it's really hard, if not impossible to pin down (coughs) what digital transformation actually means. And very often what you'll hear from, let's say a C level, you know, (mumbles) we want to run like Google regardless of whether or not that goal is realistic you know, for that organization (coughs). The thing is that we've been using, you know, businesses have been using digital data since the days of the mainframe, since the... Sorry that data has been digital. What really has changed though, is just the degree of how businesses interact with their customers, their partners, with the whole rest of the ecosystem and how their business... And how in many cases you take look at the auto industry that the nature of the business, you know, is changing. So there is real change of foot, the question is I think we need to get more specific in our goals. And when you look at it, if we can boil it down to a couple, maybe, you know, boil it down like really over simplistically, it's really all about connectedness. No, I'm not saying connectivity 'cause that's more of a physical thing, but connectedness. Being connected to your customer, being connected to your supplier, being connected to the, you know, to the whole landscape, that you operate in. And of course today we have many more channels with which we operate, you know, with customers. And in fact also if you take a look at what's happening in the automotive industry, for instance, I was just reading an interview with Bill Ford, you know, their... Ford is now rapidly ramping up their electric, you know, their electric vehicle strategy. And what they realize is it's not just a change of technology, you know, it is a change in their business, it's a change in terms of the relationship they have with their customer. Their customers have traditionally been automotive dealers who... And the automotive dealers have, you know, traditionally and in many cases by state law now have been the ones who own the relationship with the end customer. But when you go to an electric vehicle, the product becomes a lot more of a software product. And in turn, that means that Ford would have much more direct interaction with its end customers. So that's really what it's all about. It's about, you know, connectedness, it's also about the ability to act, you know, we can say agility, it's about ability not just to react, but to anticipate and act. And so... And of course with all the proliferation, you know, the explosion of data sources and connectivity out there and the cloud, which allows much more, you know, access to compute, it changes the whole nature of the ball game. The fact is that we have to avoid being overwhelmed by this and make our goals more, I guess, tangible, more strictly defined. >> Yeah, now... You know, great points there. And I want to just bring in some survey data, again, two thirds of the respondents said their digital strategies were set by IT and only 26% by the C-suite, 8% by the line of business. Now, this was largely a survey of CIOs and CTOs, but, wow, doesn't seem like the right mix. It's a Doug's point about, you know, leaders in lagers. My guess is that Rocket Mortgage, their digital strategy was led by the chief digital officer potentially. But at the same time, you would think, Tony, that application modernization is a prerequisite for digital transformation. But I want to go to Sanjeev in this war in the survey. And respondents said that on average, they want 58% of their IT spend to be in the public cloud three years down the road. Now, again, this is CIOs and CTOs, but (mumbles), but that's a big number. And there was no ambiguity because the question wasn't worded as cloud, it was worded as public cloud. So Sanjeev, what do you make of that? What's your feeling on cloud as flexible architecture? What does this all mean to you? >> Dave, 58% of IT spend in the cloud is a huge change from today. Today, most estimates, peg cloud IT spend to be somewhere around five to 15%. So what this number tells us is that the cloud journey is still in its early days, so we should buckle up. We ain't seen nothing yet, but let me add some color to this. CIOs and CTOs maybe ramping up their cloud deployment, but they still have a lot of problems to solve. I can tell you from my previous experience, for example, when I was in Gartner, I used to talk to a lot of customers who were in a rush to move into the cloud. So if we were to plot, let's say a maturity model, typically a maturity model in any discipline in IT would have something like crawl, walk, run. So what I was noticing was that these organizations were jumping straight to run because in the pandemic, they were under the gun to quickly deploy into the cloud. So now they're kind of coming back down to, you know, to crawl, walk, run. So basically they did what they had to do under the circumstances, but now they're starting to resolve some of the very, very important issues. For example, security, data privacy, governance, observability, these are all very big ticket items. Another huge problem that nav we are noticing more than we've ever seen, other rising costs. Cloud makes it so easy to onboard new use cases, but it leads to all kinds of unexpected increase in spikes in your operating expenses. So what we are seeing is that organizations are now getting smarter about where the workloads should be deployed. And sometimes it may be in more than one cloud. Multi-cloud is no longer an aspirational thing. So that is a huge trend that we are seeing and that's why you see there's so much increased planning to spend money in public cloud. We do have some issues that we still need to resolve. For example, multi-cloud sounds great, but we still need some sort of single pane of glass, control plane so we can have some fungibility and move workloads around. And some of this may also not be in public cloud, some workloads may actually be done in a more hybrid environment. >> Yeah, definitely. I call it Supercloud. People win sometimes-- >> Supercloud. >> At that term, but it's above multi-cloud, it floats, you know, on topic. But so you clearly identified some potholes. So I want to talk about the evolution of the application experience 'cause there's some potholes there too. 81% of their respondents in that survey said, "Our development teams are embracing the cloud and other technologies faster than the rest of the organization can adopt and manage them." And that was an interesting finding to me because you'd think that infrastructure is code and designing insecurity and containers and Kubernetes would be a great thing for organizations, and it is I'm sure in terms of developer productivity, but what do you make of this? Does the modernization path also have some potholes, Sanjeev? What are those? >> So, first of all, Dave, you mentioned in your previous question, there's no ambiguity, it's a public cloud. This one, I feel it has quite a bit of ambiguity because it talks about cloud and other technologies, that sort of opens up the kimono, it's like that's everything. Also, it says that the rest of the organization is not able to adopt and manage. Adoption is a business function, management is an IT function. So I feed this question is a bit loaded. We know that app modernization is here to stay, developing in the cloud removes a lot of traditional barriers or procuring instantiating infrastructure. In addition, developers today have so many more advanced tools. So they're able to develop the application faster because they have like low-code/no-code options, they have notebooks to write the machine learning code, they have the entire DevOps CI/CD tool chain that makes it easy to version control and push changes. But there are potholes. For example, are developers really interested in fixing data quality problems, all data, privacy, data, access, data governance? How about monitoring? I doubt developers want to get encumbered with all of these operationalization management pieces. Developers are very keen to deliver new functionality. So what we are now seeing is that it is left to the data team to figure out all of these operationalization productionization things that the developers have... You know, are not truly interested in that. So which actually takes me to this topic that, Dave, you've been quite actively covering and we've been talking about, see, the whole data mesh. >> Yeah, I was going to say, it's going to solve all those data quality problems, Sanjeev. You know, I'm a sucker for data mesh. (laughing) >> Yeah, I know, but see, what's going to happen with data mesh is that developers are now going to have more domain resident power to develop these applications. What happens to all of the data curation governance quality that, you know, a central team used to do. So there's a lot of open ended questions that still need to be answered. >> Yeah, That gets automated, Tony, right? With computational governance. So-- >> Of course. >> It's not trivial, it's not trivial, but I'm still an optimist by the end of the decade we'll start to get there. Doug, I want to go to you again and talk about the business case. We all remember, you know, the business case for modernization that is... We remember the Y2K, there was a big it spending binge and this was before the (mumbles) of the enterprise, right? CIOs, they'd be asked to develop new applications and the business maybe helps pay for it or offset the cost with the initial work and deployment then IT got stuck managing the sprawling portfolio for years. And a lot of the apps had limited adoption or only served a few users, so there were big pushes toward rationalizing the portfolio at that time, you know? So do I modernize, they had to make a decision, consolidate, do I sunset? You know, it was all based on value. So what's happening today and how are businesses making the case to modernize, are they going through a similar rationalization exercise, Doug? >> Well, the Y2K era experience that you talked about was back in the days of, you know, throw the requirements over the wall and then we had waterfall development that lasted months in some cases years. We see today's most successful companies building cross functional teams. You know, the C-suite the line of business, the operations, the data and analytics teams, the IT, everybody has a seat at the table to lead innovation and modernization initiatives and they don't start, the most successful companies don't start by talking about technology, they start by envisioning a business outcome by envisioning a transformed customer experience. You hear the example of Amazon writing the press release for the product or service it wants to deliver and then it works backwards to create it. You got to work backwards to determine the tech that will get you there. What's very clear though, is that you can't transform or modernize by lifting and shifting the legacy mess into the cloud. That doesn't give you the seamless processes, that doesn't give you data driven personalization, it doesn't give you a connected and consistent customer experience, whether it's online or mobile, you know, bots, chat, phone, everything that we have today that requires a modern, scalable cloud negative approach and agile deliver iterative experience where you're collaborating with this cross-functional team and course correct, again, making sure you're on track to what's needed. >> Yeah. Now, Tony, both Doug and Sanjeev have been, you know, talking about what I'm going to call this IT and business schism, and we've all done surveys. One of the things I'd love to see Couchbase do in future surveys is not only survey the it heavy, but also survey the business heavy and see what they say about who's leading the digital transformation and who's in charge of the customer experience. Do you have any thoughts on that, Tony? >> Well, there's no question... I mean, it's kind like, you know, the more things change. I mean, we've been talking about that IT and the business has to get together, we talked about this back during, and Doug, you probably remember this, back during the Y2K ERP days, is that you need these cross functional teams, we've been seeing this. I think what's happening today though, is that, you know, back in the Y2K era, we were basically going into like our bedrock systems and having to totally re-engineer them. And today what we're looking at is that, okay, those bedrock systems, the ones that basically are keeping the lights on, okay, those are there, we're not going to mess with that, but on top of that, that's where we're going to innovate. And that gives us a chance to be more, you know, more directed and therefore we can bring these related domains together. I mean, that's why just kind of, you know, talk... Where Sanjeev brought up the term of data mesh, I've been a bit of a cynic about data mesh, but I do think that work and work is where we bring a bunch of these connected teams together, teams that have some sort of shared context, though it's everybody that's... Every team that's working, let's say around the customer, for instance, which could be, you know, in marketing, it could be in sales, order processing in some cases, you know, in logistics and delivery. So I think that's where I think we... You know, there's some hope and the fact is that with all the advanced, you know, basically the low-code/no-code tools, they are ways to bring some of these other players, you know, into the process who previously had to... Were sort of, you know, more at the end of like a, you know, kind of a... Sort of like they throw it over the wall type process. So I do believe, but despite all my cynicism, I do believe there's some hope. >> Thank you. Okay, last question. And maybe all of you could answer this. Maybe, Sanjeev, you can start it off and then Doug and Tony can chime in. In the survey, about a half, nearly half of the 650 respondents said they could tangibly show their organizations improve customer experiences that were realized from digital projects in the last 12 months. Now, again, not surprising, but we've been talking about digital experiences, but there's a long way to go judging from our pandemic customer experiences. And we, again, you know, some were great, some were terrible. And so, you know, and some actually got worse, right? Will that improve? When and how will it improve? Where's 5G and things like that fit in in terms of improving customer outcomes? Maybe, Sanjeev, you could start us off here. And by the way, plug any research that you're working on in this sort of area, please do. >> Thank you, Dave. As a resident optimist on this call, I'll get us started and then I'm sure Doug and Tony will have interesting counterpoints. So I'm a technology fan boy, I have to admit, I am in all of all these new companies and how they have been able to rise up and handle extreme scale. In this time that we are speaking on this show, these food delivery companies would have probably handled tens of thousands of orders in minutes. So these concurrent orders, delivery, customer support, geospatial location intelligence, all of this has really become commonplace now. It used to be that, you know, large companies like Apple would be able to handle all of these supply chain issues, disruptions that we've been facing. But now in my opinion, I think we are seeing this in, Doug mentioned Rocket Mortgage. So we've seen it in FinTech and shopping apps. So we've seen the same scale and it's more than 5G. It includes things like... Even in the public cloud, we have much more efficient, better hardware, which can do like deep learning networks much more efficiently. So machine learning, a lot of natural language programming, being able to handle unstructured data. So in my opinion, it's quite phenomenal to see how technology has actually come to rescue and as, you know, billions of us have gone online over the last two years. >> Yeah, so, Doug, so Sanjeev's point, he's saying, basically, you ain't seen nothing yet. What are your thoughts here, your final thoughts. >> Well, yeah, I mean, there's some incredible technologies coming including 5G, but you know, it's only going to pave the cow path if the underlying app, if the underlying process is clunky. You have to modernize, take advantage of, you know, serverless scalability, autonomous optimization, advanced data science. There's lots of cutting edge capabilities out there today, but you know, lifting and shifting you got to get your hands dirty and actually modernize on that data front. I mentioned my research this year, I'm doing a lot of in depth looks at some of the analytical data platforms. You know, these lake houses we've had some conversations about that and helping companies to harness their data, to have a more personalized and predictive and proactive experience. So, you know, we're talking about the Snowflakes and Databricks and Googles and Teradata and Vertica and Yellowbrick and that's the research I'm focusing on this year. >> Yeah, your point about paving the cow path is right on, especially over the pandemic, a lot of the processes were unknown. But you saw this with RPA, paving the cow path only got you so far. And so, you know, great points there. Tony, you get the last word, bring us home. >> Well, I'll put it this way. I think there's a lot of hope in terms of that the new generation of developers that are coming in are a lot more savvy about things like data. And I think also the new generation of people in the business are realizing that we need to have data as a core competence. So I do have optimism there that the fact is, I think there is a much greater consciousness within both the business side and the technical. In the technology side, the organization of the importance of data and how to approach that. And so I'd like to just end on that note. >> Yeah, excellent. And I think you're right. Putting data at the core is critical data mesh I think very well describes the problem and (mumbles) credit lays out a solution, just the technology's not there yet, nor are the standards. Anyway, I want to thank the panelists here. Amazing. You guys are always so much fun to work with and love to have you back in the future. And thank you for joining today's broadcast brought to you by Couchbase. By the way, check out Couchbase on the road this summer at their application modernization summits, they're making up for two years of shut in and coming to you. So you got to go to couchbase.com/roadshow to find a city near you where you can meet face to face. In a moment. Ravi Mayuram, the chief technology officer of Couchbase will join me. You're watching theCUBE, the leader in high tech enterprise coverage. (bright music)
SUMMARY :
Guys, good to see you again, welcome back. but in the same survey, So the rest of the companies, you know, and I happened to have to do another one it's also about the ability to act, So Sanjeev, what do you make of that? Dave, 58% of IT spend in the cloud I call it Supercloud. it floats, you know, on topic. Also, it says that the say, it's going to solve that still need to be answered. Yeah, That gets automated, Tony, right? And a lot of the apps had limited adoption is that you can't transform or modernize One of the things I'd love to see and the business has to get together, nearly half of the 650 respondents and how they have been able to rise up you ain't seen nothing yet. and that's the research paving the cow path only got you so far. in terms of that the new and love to have you back in the future.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Doug | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Ravi Mayuram | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
Tony Bear | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
Bank of America | ORGANIZATION | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Ford | ORGANIZATION | 0.99+ |
iPad | COMMERCIAL_ITEM | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
94% | QUANTITY | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
58% | QUANTITY | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
Yellowbrick | ORGANIZATION | 0.99+ |
8% | QUANTITY | 0.99+ |
2022 | DATE | 0.99+ |
today | DATE | 0.99+ |
City Bank | ORGANIZATION | 0.99+ |
Bill Ford | PERSON | 0.99+ |
two years | QUANTITY | 0.99+ |
Googles | ORGANIZATION | 0.99+ |
81% | QUANTITY | 0.99+ |
10% | QUANTITY | 0.99+ |
DB InSight | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Today | DATE | 0.99+ |
2020 | DATE | 0.99+ |
Couchbase | ORGANIZATION | 0.99+ |
Snowflakes | ORGANIZATION | 0.99+ |
5% | QUANTITY | 0.98+ |
650 CIOs | QUANTITY | 0.98+ |
Amazons | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
Lyfts | ORGANIZATION | 0.98+ |
second | QUANTITY | 0.98+ |
SanjMo | ORGANIZATION | 0.98+ |
26% | QUANTITY | 0.98+ |
Ubers | ORGANIZATION | 0.98+ |
three years | QUANTITY | 0.98+ |
650 respondents | QUANTITY | 0.98+ |
pandemic | EVENT | 0.97+ |
this year | DATE | 0.97+ |
15% | QUANTITY | 0.97+ |
Rocket | ORGANIZATION | 0.97+ |
more than one cloud | QUANTITY | 0.97+ |
25% | QUANTITY | 0.97+ |
Tony bear | PERSON | 0.97+ |
around five | QUANTITY | 0.96+ |
two thirds | QUANTITY | 0.96+ |
about a half | QUANTITY | 0.96+ |
Predictions 2022: Top Analysts See the Future of Data
(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)
SUMMARY :
and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Menninger | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Brad Shimmin | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Dave Velannte | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Brad | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
2014 | DATE | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
January of 2022 | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
381 databases | QUANTITY | 0.99+ |
IDC | ORGANIZATION | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Sanjeev | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Omdia | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
SanjMo | ORGANIZATION | 0.99+ |
79% | QUANTITY | 0.99+ |
second question | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
15 data stores | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
SAP | ORGANIZATION | 0.99+ |