Manish Devgan, Hazelcast | Kubecon + Cloudnativecon Europe 2022
>>The cube presents, Coon and cloud native con Europe, 2022. Brought to you by red hat, the cloud native computing foundation and its ecosystem partners. >>Welcome to Licia Spain and cube con cloud native con 2022 Europe. I'm Keith Townsend, along with Paul Gillon senior editor, enterprise architecture for Silicon angle. We're gonna talk to some amazing folks. Day two coverage of Q con cloud native con Paul. We did the wrap up yesterday. Great. A great back and forth about what en Rico about yesterday's, uh, session. What are you looking for to today? >>I'm looking for, uh, to understand better, uh, how Kubernetes is being put into production, the types of applications that are being built on top of it. Yesterday, we talked a lot about infrastructure today. I think we're gonna talk a little bit more about applications, including with our first guest. >>Yeah, I was speaking our first guest. We have ish Degan CPO chief product officer at Hazelcast Hazelcast has been on the program before, but you, this is your first time in the queue, correct? >>It, it is Keith. Yeah. Well, >>Welcome to been Cuban. So we're talking data, which is always a fascinating topic. Containers are, have been known for not being supportive of stateful applications. At least you shouldn't hold the traditional thought. You shouldn't hold stateful data in containers. Tell me about the relationship between Hazel cast and containers we're at Cuan. >>Yeah, so a little bit about, uh, Hazelcast. We are a real time data platform and, uh, we are not a database, but a data platform because we basically allow, uh, data at rest as well as data in motion. So you can imagine that if you're writing an application, you can basically query and join a data coming in events, as well as data, which might have been persisted. So you can do both stream processing as well as, you know, low latency data access. And, and this platform of course, is supported on all the clouds. And we kind of delegate the orchestration of this kind of scale out system to Kubernetes. Um, and you know, that provides a resiliency and many things which go along with that. >>So you say you don't, you're not a database platform. What are you used for to manage the data? >>So we are, uh, we are memory first. So we are, you know, we started with low latency applications, but then we realized that real time has really become a business term. It's it's more of a business SLA mm-hmm, <affirmative>, it's really the, we see the opportunity, the punctuated change, which is happening in the market today is about real time data access to real time. I mean, there are real time applications. Our customers are building around real time offers, um, realtime thread detection. I mean, just imagine, you know, one of our customers like B and P par bars, they have, they basically originate a loan while the customer is banking. So you are in an ATM machine and you swipe your card and you are asking for, you know, taking 50 euros out. And at that point they can actually originate a custom loan offer based on your existing balance you're existing request and your credit score in that moment. So that's a value moment for them and they actually saw 400% loan origination go up because of that, because nobody's gonna be thinking about a credit, uh, line of credit after they're done banking. So it's in that value moment and we allow basically our data platform allows you to have fast access to data and also process incoming streams. So not before they get stored, but as they're coming in. >>So if I'm a developer and cuon is definitely a conference for developer and I, I come to the booth and I hear <inaudible>, that's the end value. I, I hear what I can do with my application. I guess the question is, how do I get there? I mean, uh, if it's not a database, how do I make a call from a container to, from my microservice to Hazel cath? Like, do I think of this as a, uh, a CNI or, or C CSI? How do I access >>PA care? Yeah. So, so we, uh, you know, we are, our server is actually built in Java. So a lot of the application which get written on top of the data platform are basically accessing through Java APIs. Or as you have a.net shop, you can actually use.net API. So we are basically an API first platform and SQL is basically the polyglot way of accessing data, both streaming data, as well as it store data. So most of the application developers, a lot of it is run done in microservices, and they're doing these fast get inputs for data. So they, they have a key, they want to get to a customer, they give a customer ID. And the beauty is that, um, while they're processing the events, they can actually enrich it because you need contextual information as well. So going back to the ATM example, you know, at that event happened, somebody swiped the card and ask for 50 euros, and now you want more information like credit score information, all that needs to be combined in that, in that value moment. >>So we allow you to do those joins and, you know, the contextual information is very important. So you see a lot of streaming platform out there, which just do streaming, but if you're an application developer, like you asked, you have to basically do call out to a streaming platform to get, um, to do streaming analytics and then do another call to get the context of that. You know, what is the credit score for this customer? But whereas in our case, because the data platform supports both streaming as well as data at rest, you can do that in one call and, you know, you don't want to have the operational complexity to stand out. Two different scale out servers is, is, is, is humongous, right? I mean, you want to build your business application. So, >>So you are querying data streaming data and data rest yes. In the same query >>Yes. In the same query. And we are memory first. So what happens is that we store a lot of the hot data in memory. So we have a scale out Ram based server. So that's where you get the low latency from. In fact, last year we did a benchmark. We were able to process a billion events a second, uh, with 99% of the latency under 30 milliseconds. So that kind of processing and that kind of power is, and, and the most important thing is determinism. I mean, you know, there's a lot of, um, if you look at real time, what real time is, is about this predictable latency at scale, because ultimately your, your adhering to a business SLA is not about milliseconds or microsecond. It's what your business needs. If your business needs that you need to deny or, uh, approve a credit credit card transaction in 50 milliseconds, that's your business SLA, and you need that predictability for every transaction. >>So talk to us about how how's this packaged in consumed. Cause I'm hearing a, a bunch of server Ram I'm hearing numbers that we're trying to adapt away from at this conference. We don't wanna see the onlay. We just want to use it. >>Yeah. So, so we kind of take a bit that, that complexity of managing this scale out, um, uh, uh, cluster, which actually utilizes Rams from each server. And then, you know, if you, you can configure it so that the hard set of data is in Ram, but the data, which is, you know, not so hard can actually go into a tiered storage model. So we are memory first. So, but what you are doing is you're doing simple, it's an API. So you do basically a crud, right? You create records, you read them through SQL. So for you, it's, it's, it's kind of like how you access that database. And we also provide you, you know, real time is also a journey. I mean, a lot of customers, you know, you don't want to rip their existing system and deploy another kind of scale out platform. Right? So we, we see a lot of these use cases where they have a database and we can sit in between the database, a system of record and the application. So we are kind of in between there. So that's, that's the journey you can take to real time. >>How does Kubernetes, uh, containers and Kubernetes change the game for real time analytics? >>Yeah. So, uh, Kubernetes does change it because what's hap first of all, we service most of the operational workloads. So it's, it's more on the, a lot of our customers. We have most, most of the big banks credit card companies in financial services and retail. Those are the two big sectors for us. And first of all, you know, a lot of these operational workloads are moving to the cloud and with move to the cloud, they're actually taking their existing applications and, and moving to, you know, one of the providers and to kind of orchestrate this scale out platform, which does auto scaling, that's where the benefit comes from mm-hmm <affirmative>. And it also gives them the freedom of choice. So, you know, the Kubernetes is, you know, a standard which goes across cloud providers. So that gives them the benefit that they can actually take their application. And if they want, they can actually move it to a different, a different cloud provider because we take away the orchestration complexity, you know, in that abstraction layer. >>So what happens when I need to go really fast? I mean, I, I, I need, uh, I'm looking at bare metal and I'm looking at really scaling a, a, a homogeneous application in a single data center set of data centers. Is there a bare metal play here? >>Yes. There, there, there are some very, very, uh, like if you want microsecond latency, mm-hmm, <affirmative>, um, you know, we have customers who actually store two to four terabytes in Ram and, and they can actually stand up. Um, you know, again, it depends on what kind of deployment you want. You can either scale up or scale out, scaling up is expensive, you know, because those boxes are not cheap, but if you have a requirement like that, where there is sub millisecond or microphone latency requirement, you could actually store the entire data set. I mean, a lot of the operational data sets are under four terabytes. So it's not uncommon that you could actually take the entire operational transactional data set, actually move, move that to a pure Ram. But, uh, I think now we, we also see that these operational workloads are also, there's a need for analytics to be done on top as well. >>I mean, we, going back to the example I gave you, so this, this, uh, customer is not only doing stream crossing, they're also influencing a machine learning algorithm in that same, in the same kind of cycle in the life cycle. So they might have trained a machine learning or algorithm on a data lake somewhere, but once they're ready, they're actually influencing the ML algorithm in our kind of life cycle right there. So, you know, that that really brings analytics and transactions kind of together because after all transactions are where the real, you know, insights are. >>Yeah. I'm, I'm struggling a little bit with this, with these two different use cases where I have transactional basically a transactional database or transactional data platform alongside a analytics platform. Those are two, like they're two different things. I have a, you know, I, I have spinning rust for one, and then I have memory and, and MBME for another. Uh, and that requires tuning requires DBAs. It requires a lot of overhead, there seems to be some type of secret sauce going on here. >>Yeah. Yeah. So, I mean, you know, we, we basically say that if you are, if you have a business case where you want to make a decision, you know, you, the only chance to succeed is where you are not making a decision tomorrow based on today's data. Right? I mean, the only way to act on that data is today. So the act is a keyword here. We actually let you generate a realtime offer. We, we let you do credit card fraud detection. In that moment, the analytics is about knowing less about acting on it. Right? Most of our applications are machine critical. They're acting on real time. I think when you talk about like the data lakes there, there's actually a real time there as well, but it's about knowing, and we believe that the operational side is where, you know, that value moment is there, you know, what good is, is to know about something tomorrow, you know, if something wrong happened, I mean, it, yeah, so there's a latency squeeze there as well, but we are on, on more on the kind of transaction and operational side. >>I gotcha. Yeah. So help me understand, like integrations. A lot of the, the, when I think of transactions, I'm thinking of SAP, Oracle, where the process is done, or some legacy banking or not legacy or new modern banking app, how does the data get from one platform to a, to Hazel cast so I can make those >>Decisions? Yeah. So we have, uh, this, the streaming engine, we have has a whole bunch of connectors to a lot of data sources. So in fact, most of our use cases already have data sources underneath there, their databases there's KA connectors, you know, joining us because if you look at it, events is, are comprised of transactions. So something, a customer did, uh, a credit card swipe, right. And also events events could be machine or IOT. So it's really unique connectivity and data ingestion before you can process that. So we have, uh, a whole suite of connectors to kind of bring data in, in our platform. >>We've been talking a lot, these last couple of days about, uh, about the edge and about moving processing capability closer to the edge. How do you enable that? >>Yeah. So edge is actually very, very relevant because of what's happening is that, um, you know, if you, if you look at like a edge deployment use case, um, you know, we have a use case where data is being pushed from these different edge devices to cloud data warehouse. Right. But just imagine that you want to be filtering data at the, at, at where it is being originated from, and you wanna push only relevant data to, to maybe a central data lake where you might want to do, you know, train your machine learning models. Mm-hmm <affirmative> so that at the edge, we are actually able to process that data. So Hazel cast will allow you to actually write a data pipeline and do stream processing so that you might want to just push, you know, a part or a subset of data, which applies by the rules. Uh, so there's, there's a big, um, uh, I think edge is, you know, there's a lot of data being generated and you don't want like garbage and garbage out there's there's, there is there's filtration done at the edge. So that only the relevant data lands in a data, data lake or something like that. >>Well, Monash, we really appreciate you stopping by realtime data is an exciting area of coverage for the queue overall from Valencia Spain, I'm Keith Townsend, along with Paul Gillon, and you're watching the queue, the leader in high tech coverage.
SUMMARY :
Brought to you by red hat, What are you looking for to today? the types of applications that are being built on top of it. product officer at Hazelcast Hazelcast has been on the program before, It, it is Keith. At least you shouldn't hold the traditional thought. So you can imagine that if you're writing an application, So you say you don't, you're not a database platform. So we are, you know, we started with low So if I'm a developer and cuon is definitely a conference for developer So a lot of the application which get written on top of the data platform are basically accessing through Java So we allow you to do those joins and, you know, the contextual information is very important. So you are querying data streaming data and data rest yes. I mean, you know, So talk to us about how how's this packaged in consumed. I mean, a lot of customers, you know, you don't want to rip their existing system and deploy another a different cloud provider because we take away the orchestration complexity, you know, So what happens when I need to go really fast? So it's not uncommon that you could after all transactions are where the real, you know, insights are. I have a, you know, I, I have spinning rust for one, you know, that value moment is there, you know, what good is, is to know about something tomorrow, not legacy or new modern banking app, how does the data get from one platform to a, you know, joining us because if you look at it, events is, are comprised of transactions. How do you enable that? um, you know, if you, if you look at like a edge deployment use Well, Monash, we really appreciate you stopping by realtime data is an
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Keith Townsend | PERSON | 0.99+ |
Paul Gillon | PERSON | 0.99+ |
99% | QUANTITY | 0.99+ |
400% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Hazel cast | ORGANIZATION | 0.99+ |
Java | TITLE | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
50 milliseconds | QUANTITY | 0.99+ |
50 euros | QUANTITY | 0.99+ |
Keith | PERSON | 0.99+ |
Manish Devgan | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
today | DATE | 0.99+ |
Yesterday | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
first guest | QUANTITY | 0.99+ |
first time | QUANTITY | 0.99+ |
Valencia Spain | LOCATION | 0.99+ |
50 euros | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
one call | QUANTITY | 0.99+ |
four terabytes | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
each server | QUANTITY | 0.98+ |
one platform | QUANTITY | 0.98+ |
SAP | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.97+ |
under 30 milliseconds | QUANTITY | 0.97+ |
first platform | QUANTITY | 0.97+ |
a billion events | QUANTITY | 0.95+ |
Coon | ORGANIZATION | 0.94+ |
2022 | DATE | 0.94+ |
single | QUANTITY | 0.94+ |
two different things | QUANTITY | 0.94+ |
Kubecon | ORGANIZATION | 0.93+ |
Cloudnativecon | ORGANIZATION | 0.93+ |
two different use cases | QUANTITY | 0.92+ |
Day two | QUANTITY | 0.92+ |
two big sectors | QUANTITY | 0.91+ |
red hat | ORGANIZATION | 0.87+ |
Europe | LOCATION | 0.84+ |
use.net | OTHER | 0.83+ |
under four terabytes | QUANTITY | 0.82+ |
Two different scale | QUANTITY | 0.78+ |
Kubernetes | ORGANIZATION | 0.75+ |
a second | QUANTITY | 0.72+ |
Kubernetes | TITLE | 0.71+ |
cube con cloud native con | ORGANIZATION | 0.7+ |
cloud native con | ORGANIZATION | 0.67+ |
Degan | PERSON | 0.66+ |
Silicon | LOCATION | 0.63+ |
Licia Spain | ORGANIZATION | 0.62+ |
Hazel cath | ORGANIZATION | 0.61+ |
con cloud native con | ORGANIZATION | 0.58+ |
Rico | LOCATION | 0.57+ |
Cuban | OTHER | 0.56+ |
Monash | ORGANIZATION | 0.55+ |
Hazel | TITLE | 0.53+ |
Cuan | LOCATION | 0.53+ |
foundation | ORGANIZATION | 0.52+ |
Q | EVENT | 0.51+ |
last couple | DATE | 0.5+ |
CNI | TITLE | 0.46+ |
C | TITLE | 0.45+ |
Paul | PERSON | 0.44+ |
2022 | EVENT | 0.33+ |
Kelly Herrell, Hazelcast | RSAC USA 2020
>>Fly from San Francisco. It's the cube covering RSA conference, 2020 San Francisco brought to you by Silicon angle media. >>Hey, welcome back everyone. Cubes coverage here in San Francisco, the Moscone South. We're here at the RSA conference. I'm John, your host and the cube. You know, cybersecurity is now a global phenomenon, but cubbies have to move at the speed of business, which now is at the speed of the potential attacks. This is a new paradigm shift. New generation of problems that have to be solved and companies solving them. We have a hot startup here that's growing. Hazel cast, the CEO, Kelly Kelly Harrell is here. Cube alumni. Good to see you. Good to see you, John. Hey, so we know each other you've been on before. Um, you know, networking, you know, compute, you know the industry. You're now the CEO of Hazel cast. So first of all, what does Hazelcast do? And then we can get into some of the cool things. Hazel cast is an in memory compute platform. >>So we're a kind of a neutral platform. You write your applications to us. We sit in front of things like databases and stream, uh, streaming sources, uh, and we execute applications at microsecond speeds, which is really, really important as we move more and more towards digital and AI. Uh, so basically when, when time matters, when time is money people buy Hazelcast. So I've got to ask you your interest in better, you can do a lot of different things. You can run any companies you want. Why Hazelcast what attracted you to this company? What was unique about it that got your attention and what made you join the firm? Well, when I first started looking at it and realized that a hundred of the world's largest companies are their customers and this company really was kind of kind of a run silent run deep company. A lot of people didn't know about it. >>Um, I could not, I had this dissonance, like how can this possibly be the case? Well, it turns out, uh, if you go into the Java developer world, the name is like Kleenex. Everybody knows Hazelcast because of the open source adoption of it, which has gone viral a long, long time ago. So once I started realizing what they had and why people were buying it, and I looked at that, that problem statement and the problem statement is really increasing with digitalization. So the more things are speeding up, the more applications have to perform at really, really low latency. So there was this big big growth market opportunity and Hazel CAS clearly had the had the drop on the market. So I've got to ask you, so we're at RSA and I mentioned on my intro here the speed of business while he's been down the, it kind of cliche moving at the speed of business, but now business has to move as the speed of how to react to some of the large scale things, whether it's compute power, cloud computing, and obviously cyber is attacks and a response. >>How do you view that and how are you guys attacking that problem? Well, you know, it's funny. I think the first time I truly understood security was the day that I was shopping for a home safe. You know, because I realized that all of these safes, they all were competing on one of the common metric, which is the meantime to break in, right? Is that you had one job and all you can tell me is that it's going to happen eventually, you know? So the kind of the scales got peeled off my eyes and I realized that, that when it comes to security, the only common factor is elapsed time, you know, and uh, so the last time is what matters. And then the second thing is that time is relative. It's relative to the speed of the attack. You know, if I'm just trying to protect my goods in a safe, the last amount of time is how long it can take for the bad guy to break into the safe. >>But now we're working at digital speeds, you know, so, you know, you take a second break that down to a thousand, uh, that's uh, you know, milliseconds. It takes 300 milliseconds. The blink. Yeah. Now we're working at microsecond speeds. Uh, and we're finding that there are just a really rapidly growing number of transactions that have to perform at that scale and that, and that speed. Um, you know, it, it may have escaped people completely, but card processing, credit card, debit card processing, ever Dawn on you that that's an IOT application now. Yeah, because my phone is a terminal. Amazon's a giant terminal number of transactions go up. They have three milliseconds to decide whether or not they're going to approve that. And uh, now with using Hazel Cassady not just handle it within that three milliseconds, but they also are running multiple fraud detection algorithms in that same window. >>Okay. So I get it now. That's why the in-memory becomes critical. You can't gotta be in memory. Okay, so I got to ask the next logical question, which is okay, I get that it makes less sense and I want to dig into that in a second. But let's go to the application developer. Okay, I'm doing dev ops, I'm doing cloud. I'm cool. Right? So now you just wake me up and say, wait a minute, I'm not dealing with nanosecond latency. What do I do? Like what's I mean, who's, how to applications respond to that kind of attack velocity? Well, it's not a not a a an evolution. So the application is written to Hazel cast is very, very simple to do. Um, there are, uh, like 60 million Hazel cast cluster starts every month. So people out in the wild are doing this all day long and we're really big in the Java developer community, but not only Java. >>Um, and so it's very, very straightforward with how to write your application and pointed at Hazelcast instead of pointing at the database behind us. Uh, so that part is actually very, very simple. All right, so take me through, I get the market space you're going after. It makes total sense. You run the, I think the right wave in my opinion. Business model product, how you guys organize, how do people sign up for our development and the development side? Who's your buyer? What's the business look like? Share a Hazel. Cast a one-on-one. Yeah. So we're an open core model, meaning the core engine is open source and fully downloadable and you know, free to use, uh, the additional functionality is the commercial aspect of it, which are tend to be features that are used when you're really going into, into sensitive and large scale deployments. Um, so the developers have access to a, they just come to hazelcast.org and uh, and join the community that way. >>Um, the people that we engage with are everyone from the developer all the way up through the architect and then the a C level member who's charged with standing up whatever this new capability is. So we talk up and down that chain, um, where you're a very, very technical company. Uh, but we've got a very, very powerful RLM. What's the developer makeup look like? Is it a software developer? Is it an engineer? So what's the makeup of the, of the developer? They're core application developers. Um, a lot in Java, increasingly in.net, uh, as a MLM AR coming on, we're getting a lot of Python. Uh, so it's, it's developers with that skillset and they're basically, uh, writing an application that they're, um, uh, basically their division is specified. So we need this new application. It could be a new application for a customer engagement and application for fraud detection and an application for stock trading. >>Anything that's super, super time sensitive and, uh, they, they select us and they build on us. So you get the in-memory solution for developers. Take me through the monetization on the open core. Is it services? Is it, uh, it's a subscription. It's a subscription model. So, okay. Uh, w we are paid on a, on a annual basis, uh, for, for use of the software. And um, you know, however large the installation gets is a function that basically determines, uh, you know, what the price is and then it's just renewed annually. Awesome. We'll do good subscriptions. Good economics. It is. What about the secret sauce? What sun in the cut was on the covers? Can you share what the magic is or is it proprietary? Is that now what's, uh, it's, it's hardcore computer science. It really is what it is. And that is actually what is in the core engine. >>Um, but I mean, we've got PhDs on staff. We're tackling some really, really hard problems. You know, I can, I can build anything in memory. I can make a spreadsheet in memory, I can make a word processor and memory. But you know, the question is how good is it? How fast is it, how scalable, how resilient. So, um, you know, those, you're saying speed, resilience and scale are the foundation and it took the company years and years to be able to master this. That's an asymptotic attempt and you're never at the end of that. But we've got, you know, the most resilience, so something, it doesn't go down. It can't go down because our customers lose millions for every second that it's down. So it can't go down. It's got to scale. And it's gotta be low latency number of customers you guys have right now. Can you tell us about the public references and why they using Hazelcast and what did they say about it? >>Yeah, I mean we've got a hundred of the largest, uh, financial services, about 60% of our revenue. E-commerce is a, another 20% large telcos. Another job. There are a lot of IOPS type companies, right? Yeah. Basically it's, um, so you know, in the financial services, uh, it's all the names that you would know, uh, every logo in your wallet. It's probably one of our customers as an example. Um, massive banks, uh, card processors, uh, we don't get to talk about very many of them, but you know, something like national bank of Australia, uh, capital one, um, you know, you can, you can let your, your mind run there. Um, our largest customer has over a trillion dollar market cap. There's only a few that meet that criteria. So I'll let you on that one. One of the three. Um, all right, so what's next for you guys? >>Give the quick plug in. The company would appreciate the insights. I think he'd memory's hot. What do you guys are going to do? What's your growth strategy? Uh, what's, what do you, what's your priorities? The CEO? Yeah. Well, we just raised a $50 million round, which is a very, very significant round. Um, and we're putting that to work aggressively. We just came off the biggest quarter in the company's history. So we're really on fire right now. Uh, we've established a very strong technology partnership with Intel, uh, including specialty because of their AI initiatives. Because we power a lot of AI, uh, uh, applications. IBM has become a strategic partner. They're now reselling Hazelcast. Uh, so we've got a bunch of, uh, a bunch of wind in our sails right now coming into this year, what we're going to be doing is, uh, really delivering a full blown, uh, in memory compute platform that delivers, that can process stored and streaming data simultaneously. >>Nothing else on the planet can do that. We're finding some really innovative applications and, um, you know, we're just really, really working on market penetration right now. You know, when you see all these supply chain hacks out there, you're going to look at more in memory detection, prevention, counter strike, you know, all this provision things you got to take care of. Mean applications have to now respond. It's almost like a whole new SLA for application requirement. Yeah, it is. I mean, the bad guys are moving to digital speed, you know, if you have important apps that, uh, that are affected by that. Right. You know, you'd better get ahead of that. Well, actually you could be doing that, by the way. You can be doing that on your, on premise or you could be doing in the cloud with the managed service that we've also stood up while still we get the Cuban in, in memory Africa and when we were there, I will be happy. Kelly, congratulations on the funding. Looking forward to tracking you. We'll follow up and check in with you guys. All right. Congratulations. Awesome. Thanks John. I appreciate it. Okay. It's keep coverage here in San Francisco, the Moscone. I'm John furrier. Thanks for watching.
SUMMARY :
RSA conference, 2020 San Francisco brought to you by Silicon Um, you know, to ask you your interest in better, you can do a lot of different things. it turns out, uh, if you go into the Java developer world, the name is like Kleenex. the only common factor is elapsed time, you know, and uh, But now we're working at digital speeds, you know, so, you know, you take a second break that down to a thousand, So now you just wake me up and say, wait a minute, Um, so the developers have access to a, they just come to hazelcast.org and uh, Um, the people that we engage with are everyone from the developer all the way up through the architect and then the determines, uh, you know, what the price is and then it's just renewed annually. But we've got, you know, the most resilience, so something, it doesn't go down. so you know, in the financial services, uh, it's all the names that you would know, uh, every logo in your wallet. What do you guys are going to do? I mean, the bad guys are moving to digital speed, you know, if you have important apps that,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kelly | PERSON | 0.99+ |
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Kelly Herrell | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Kelly Kelly Harrell | PERSON | 0.99+ |
Kleenex | ORGANIZATION | 0.99+ |
RSA | ORGANIZATION | 0.99+ |
$50 million | QUANTITY | 0.99+ |
300 milliseconds | QUANTITY | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
hazelcast.org | OTHER | 0.99+ |
Python | TITLE | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
RSAC | ORGANIZATION | 0.99+ |
Java | TITLE | 0.99+ |
three | QUANTITY | 0.99+ |
60 million | QUANTITY | 0.99+ |
20% | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.98+ |
RSA | EVENT | 0.98+ |
three milliseconds | QUANTITY | 0.98+ |
about 60% | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
Africa | LOCATION | 0.97+ |
Moscone South | LOCATION | 0.97+ |
one job | QUANTITY | 0.97+ |
one | QUANTITY | 0.96+ |
first time | QUANTITY | 0.94+ |
first | QUANTITY | 0.94+ |
Silicon angle | ORGANIZATION | 0.93+ |
Hazel cast | TITLE | 0.92+ |
Hazel | TITLE | 0.9+ |
Moscone | LOCATION | 0.9+ |
Cuban | LOCATION | 0.86+ |
Australia | LOCATION | 0.85+ |
John furrier | PERSON | 0.85+ |
a second | QUANTITY | 0.81+ |
2020 | ORGANIZATION | 0.8+ |
every second | QUANTITY | 0.8+ |
hundred | QUANTITY | 0.8+ |
RSA conference | EVENT | 0.78+ |
Hazelcast | TITLE | 0.76+ |
over a trillion dollar | QUANTITY | 0.76+ |
2020 | DATE | 0.71+ |
one of | QUANTITY | 0.7+ |
second break | QUANTITY | 0.69+ |
USA | LOCATION | 0.67+ |
a thousand | QUANTITY | 0.65+ |
a minute | QUANTITY | 0.63+ |
Hazel Cassady | TITLE | 0.61+ |
Hazel cast | ORGANIZATION | 0.61+ |
CAS | ORGANIZATION | 0.54+ |
Hazel | PERSON | 0.53+ |
nanosecond | QUANTITY | 0.51+ |
Analyst Predictions 2023: The Future of Data Management
(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)
SUMMARY :
and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Curt Monash | PERSON | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Christian Kleinerman | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Tony Bear | PERSON | 0.99+ |
25% | QUANTITY | 0.99+ |
2021 | DATE | 0.99+ |
last year | DATE | 0.99+ |
65% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
five-year | QUANTITY | 0.99+ |
TigerGraph | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two services | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
RisingWave Labs | ORGANIZATION | 0.99+ |