Ed Walsh and Thomas Hazel, ChaosSearch | JSON

>>Hi, Brian, this is Dave Volante. Welcome to this cube conversation with Thomas Hazel was the founder and CTO of chaos surgeon. I'm also joined by ed Walsh. Who's the CEO Thomas. Good to see you. >>Great to be here. >>Explain Jason. First of all, what >>Jason, Jason has a powerful data representation, a data source. Uh, but let's just say that we try to drive value out of it. It gets complicated. Uh, I can search. We activate customers, data lakes. So, you know, customers stream their Jason data to this, uh, cloud stores that we activate. Now, the trick is the complexity of a Jason data structure. You can do all these complexity of representation. Now here's the problem putting that representation into a elastic search database or relational databases, very problematic. So what people choose to do is they pick and choose what they want and or they just stored as a blob. And so I said, what if, what if we create a new index technology that could store it as a full representation, but dynamically in a, we call our data refinery published access to all the permutations that you may want, where if you do a full on flatten, your flattening of its Jason, one row theoretically could be put into a million rows and relational data sort of explode, >>But then it gets really expensive. But so, but everybody says they have Jason support, every database vendor that I talked to, it's a big announcement. We now support Jason. What's the deal. >>Exactly. So you take your relational database with all those relational constructs and you have a proprietary Jason API to pick and choose. So instead of picking, choosing upfront, now you're picking, choosing in the backend where you really want us the power of the relational analysis of that Jaison data. And that's where chaos comes in, where we expand those data streams we do in a relational way. So all that tooling you've been built to know and love. Now you can access to it. So if you're doing proprietary APIs or Jason data, you're not using Looker, you're not using Tableau. You're doing some type of proprietary, probably emailing now on the backend. >>Okay. So you say all the tools that you've trained, everybody on you can't really use them. You got to build some custom stuff and okay, so, so, so maybe bring that home then in terms of what what's the money, why do the suits care about this stuff? >>The reason this is so important is think about anything, cloud native Kubernetes, your different applications. What you're doing in Mongo is all Jason is it's very powerful but painful, but if you're not keeping the data, what people are doing a data scientist is, or they're just doing leveling, they're saying I'm going to only keep the first four things. So think about it's Kubernetes, it's your app logs. They're trying to figure out for black Friday, what happens? It's Lilly saying, Hey, every minute they'll cut a new log. You're able to say, listen, these are the users that were in that system for an hour. And here's a different things. They do. The fact of the matter is if you cut it off, you lose all that fidelity, all that data. So it's really important that to have. So if you're trying to figure out either what happened for security, what happened for on a performance, or if you're trying to figure out, Hey, I'm VP of product or growth, how do I cross sell things? >>You need to know what everyone's doing. If you're not handling Jason natively, like we're doing either your, it keeps on expanding on black Friday. All of a sudden the logs get huge. And the next day it's not, but it's really powerful data that you need to harness for business values. It's, what's going to drive growth. It's what's going to do the digital transformation. So without the technology, you're kind of blind. And to be honest, you don't know. Cause a data scientist is kind of deleted the data on you. So this is big for the business and digital transformation, but also it was such a pain. The data scientists in DBS were forced to just basically make it simple. So it didn't blow up their system. We allow them to keep it simple, but yes, >>Both power. It reminds me if you like, go on vacation, you got your video camera. Somebody breaks into your house. You go back to Lucas and see who and that the data's gone. The video's gone because it didn't, you didn't, you weren't able to save it cause it's too >>Expensive. Well, it's funny. This is the first day source. That's driving the design of the database because of all the value we should be designed the database around the information. It stores not the structure and how it's been organized. And so our viewpoint is you get to choose your structure yet contain all that content. So if a vendor >>It says to kind of, I'm a customer then says, Hey, we got Jason support. What questions should I ask to really peel the onion? >>Well, particularly relational. Is it a relational access to that data? Now you could say, oh, I've ETL does Jason into it. But chances are the explosion of Jason permutations of one row to a million. They're probably not doing the full representation. So from our viewpoint is either you're doing a blob type access to proprietary Jason APIs or you're picking and choosing those, the choices say that is the market thought. However, what if you could take all the vegetation and design your schema based on how you want to consume it versus how you could store it. And that's a big difference with, >>So I should be asking how, how do I consume this data? Are you ETL? Bring it in how much data explosion is going to occur. Once I do this, and you're saying for chaos, search the answer to those questions. >>The answer is, again, our philosophy simply stream your data into your cloud object, storage, your data lake and with our index technology and our data refinery. You get to create views, dynamic the incident, whether it's a terabyte or petabyte, and describe how you want your data because consumed in a relational way or an elastic search way, both are consumable through our data refinery, which is >>For us. The refinery gives you the view. So what happens if someone wants a different view, I want to actually unpack different columns or different matrices. You able to do that in a virtual view, it's available immediately over petabytes of data. You don't have that episode where you come back, look at the video camera. There's no data there left. So that's, >>We do appreciate the time and the explanation on really understanding Jason. Thank you. All right. And thank you for watching this cube conversation. This is Dave Volante. We'll see you next time.

Published Date : Nov 2 2021

SUMMARY :

Good to see you. First of all, what where if you do a full on flatten, your flattening of its Jason, one row theoretically What's the deal. So you take your relational database with all those relational constructs and you have a proprietary You got to build some custom The fact of the matter is if you cut it off, you lose all that And to be honest, you don't know. It reminds me if you like, go on vacation, you got your video camera. And so our viewpoint is you It says to kind of, I'm a customer then says, Hey, we got Jason support. However, what if you could take all the vegetation and design your schema based on how you want to Bring it in how much data explosion is going to occur. whether it's a terabyte or petabyte, and describe how you want your data because consumed in a relational way You don't have that episode where you come back, look at the video camera. And thank you for watching this cube conversation.

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Brian	PERSON	0.99+
Jason	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Lilly	PERSON	0.99+
Ed Walsh	PERSON	0.99+
JSON	PERSON	0.99+
Thomas	PERSON	0.99+
first day	QUANTITY	0.99+
black Friday	EVENT	0.99+
an hour	QUANTITY	0.98+
both	QUANTITY	0.97+
Both	QUANTITY	0.97+
ed Walsh	PERSON	0.97+
Tableau	TITLE	0.95+
first four things	QUANTITY	0.94+
Kubernetes	TITLE	0.93+
one row	QUANTITY	0.92+
Mongo	ORGANIZATION	0.9+
Jason	ORGANIZATION	0.89+
ChaosSearch	ORGANIZATION	0.89+
a million	QUANTITY	0.88+
next day	DATE	0.86+
Jason	TITLE	0.81+
First	QUANTITY	0.74+
million rows	QUANTITY	0.73+
ETL	ORGANIZATION	0.7+
petabytes	QUANTITY	0.69+
Looker	ORGANIZATION	0.66+
DBS	ORGANIZATION	0.58+
Jaison	PERSON	0.52+
Lucas	PERSON	0.49+

How to Make a Data Fabric Smart A Technical Demo With Jess Jowdy

(inspirational music) (music ends) >> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities, including data exploration business intelligence, natural language processing and machine learning directly within the fabric makes it faster and easier for organizations to gain new insights and power intelligence predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi, yeah, thank you so much for having me. And so for this demo, we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements, and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo, and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see, and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be, for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software, or adverse reaction warnings from a clinical risk grouping application, and so much more. So I'm really going to be simulating a patient logging in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here, and I'm going to be looking for information where the last name of this patient is Simmons, and their medical record number or their patient identifier in the system is 32345. And so as you can see, I have this single JSON payload that showed up here of, just, relevant clinical information for my patient whose last name is Simmons, all within a single response. So fantastic, right? Typically though, when we see responses that look like this there is an assumption that this service is interacting with a single backend system, and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture, we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here, we have our data fabric coordinator which is going to be in charge of this refinement and analysis, those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service, and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end, we do also support full life cycle API management within this platform. When you're dealing with APIs, I always like to make a little shout out on this, that you really want to make sure you have enough, like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what context. >> Can I just interrupt you for a second, Jess? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you could have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry, and API securities are like, really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So, there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So, the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product, and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So, that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of the security. >> And that's been designed in, it's not a bolt on as they like to say. >> Absolutely. >> Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly, each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like Fire. Interactions with a homegrown enterprise data warehouse for instance, may use SQL. For a cloud-based solutions managed by a vendor, they may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and applications. And I'm about to log out, so I'm going to (chuckles) keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources, and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is, it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST, or SOAP, or SQL, or FTP, regardless of that protocol, there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as in healthcare we have HL7, we have Fire, we have CCDs, across the industry, JSON is, you know, really hitting a market strong now, and XML payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel, I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example, communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR, I'm leveraging a standard healthcare messaging format known as Fire, which is a REST based protocol. And then when I'm working with my health record management system, I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly, and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN, and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out-of-the box or black box approach to be able to develop things that are specific to their data fabric, or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you not only get an opportunity to view how we're establishing these connections or how we're building out these processes, but you have the opportunity to inject your own kind of processing, your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out-of-the-box code that is provided in this data fabric platform from IRIS, combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out-of-the-box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. (laughs) >> It's a lot here. You know, actually- >> I can pause. >> If I could, if we just want to sort of play that back. So we went to the connect and the collect phase. >> Yes, we're going into refine. So it's a good place to stop. >> So before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know, the ability to bring in different dev tools. We heard about Fire, which of course big in healthcare. And that's the standard, and then SQL for traditional kind of structured data, and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely. And I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection, into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refinement. >> We're going into refinement. Exciting. (chuckles) So how do we actually do refinement? Where does refinement happen? And how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator, or stands for Smart Data Fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information, it's aggregating it, and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like. And as you can see, it follows a flow chart like structure. So there's a start, there is an end, and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce, or we make this data fabric a bit smarter, and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection, we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging, 'cause you need to be able to know, you know, if there was an issue, where did that issue happen in which connected process, and how did it affect the other processes that are related to it? In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric, to when data was sent back out from that smart data fabric. So I didn't record the time, but I bet if you recorded the time it was this time that we sent that request in and you can see my patient's name and their medical record number here, and you can see that that instigated four different calls to four different systems, and they're represented by these arrows going out. So we sent something to chart script, to our health record management system, to our clinical risk grouping application, into my EMR through their Fire server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems, and we bundle them together. And from my Fire lovers, here's our Fire bundle that we got back from our Fire server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping, or errors that were thrown, alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Sure, who did what when where, what did the machine do what went wrong, and where did that go wrong? Right at your fingertips. >> Right. And I'm a visual person so a bunch of log files to me is not the most helpful. While being able to see this happened at this time in this location, gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric, is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information it's transforming that data, in a format that your consumer's not going to understand. It's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? It all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well, we can keep going. I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this, but essentially if we go back to our coordinator here, we can see here's that original, that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here, which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric, but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at, and we're running it through a machine learning model that exists within the smart data fabric pipeline, and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world, is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL-like syntax to be able to construct and execute these predictions. So, it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge, right? Because it directly affects the cost for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment, or, you know, as an outpatient perhaps, to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day, which is what makes this so exciting. >> Awesome demo. >> Thank you! >> Jess, are you cool if people want to get in touch with you? Can they do that? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy, and we'd love to hear from you. I always love talking about this topic so we'd be happy to engage on that. >> Great stuff. Thank you Jessica, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment, we're going to dig into the use cases where data fabric is driving business value. Stay right there. (inspirational music) (music fades)

Published Date : Feb 22 2023

SUMMARY :

and she's going to show And to that end, we do also So you were showing hundreds of these APIs depending in the healthcare industry, So can I even see this as they like to say. that are specific to their data fabric, Yeah, I'll pause. It's a lot here. So we went to the connect So it's a good place to stop. So before we get So that platform needs to All right, so now we're that are related to it? Right at your fingertips. I need to actually troubleshoot a problem. of being able to create of clients that are using this technology Anything else you want to show us? So in this scenario, we're and the patient, you know. And that really brings So you can find me on Thank you Jessica, appreciate it. in the next segment,

ENTITIES

Entity	Category	Confidence
Joe Lichtenberg	PERSON	0.99+
Jessica Jowdy	PERSON	0.99+
Jessica	PERSON	0.99+
Jess Jowdy	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Scott	PERSON	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
Jess	PERSON	0.99+
32345	OTHER	0.99+
hundreds	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
each	QUANTITY	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
third segment	QUANTITY	0.98+
Fire	COMMERCIAL_ITEM	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.97+
each data	QUANTITY	0.97+
one	QUANTITY	0.97+
single	QUANTITY	0.95+
single response	QUANTITY	0.94+
single backend system	QUANTITY	0.92+
two more	QUANTITY	0.92+
four different segments	QUANTITY	0.89+
APIs	QUANTITY	0.88+
one step	QUANTITY	0.88+
four	QUANTITY	0.85+
Healthcare Field Engineering	ORGANIZATION	0.82+
JSON	TITLE	0.8+
single payload	QUANTITY	0.8+
second	QUANTITY	0.79+
one payload	QUANTITY	0.76+
next 30 days	DATE	0.76+
IRIS	TITLE	0.75+
Fire	TITLE	0.72+
Postman	TITLE	0.71+
every	QUANTITY	0.68+
four different calls	QUANTITY	0.66+
Jes	PERSON	0.66+
a second	QUANTITY	0.61+
services	QUANTITY	0.6+
evelopers	PERSON	0.58+
Postman	ORGANIZATION	0.54+
HL7	OTHER	0.4+

How to Make a Data Fabric "Smart": A Technical Demo With Jess Jowdy

>> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities including data exploration, business intelligence natural language processing, and machine learning directly within the fabric, makes it faster and easier for organizations to gain new insights and power intelligence, predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi. Yeah, thank you so much for having me. And so for this demo we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software or adverse reaction warnings from a clinical risk grouping application and so much more. So I'm really going to be assimilating a patient logging on in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here and I'm going to be looking for information where the last name of this patient is Simmons and their medical record number their patient identifier in the system is 32345. And so as you can see I have this single JSON payload that showed up here of just relevant clinical information for my patient whose last name is Simmons all within a single response. So fantastic, right? Typically though when we see responses that look like this there is an assumption that this service is interacting with a single backend system and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here we have our data fabric coordinator which is going to be in charge of this refinement and analysis those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end we do also support full lifecycle API management within this platform. When you're dealing with APIs I always like to make a little shout out on this that you really want to make sure you have enough like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what contact. >> Can I just interrupt you for a second? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you can have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry and API securities are really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of security. >> And that's been designed in, >> Absolutely, yes. it's not a bolt-on as they like to say. Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like FIRE, interactions with a homegrown enterprise data warehouse for instance may use SQL for a cloud-based solutions managed by a vendor. They may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and and applications. And I'm about to log out so I'm going to keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST or SOAP or SQL or FTP regardless of that protocol there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as, in healthcare we have H7, we have FIRE we have CCDs across the industry. JSON is, you know, really hitting a market strong now and XML, payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR I'm leveraging a standard healthcare messaging format known as FIRE, which is a rest based protocol. And then when I'm working with my health record management system I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So let's, why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out of the box or black box approach to be able to develop things that are specific to their data fabric or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you cannot, you not only get an opportunity to view how we're establishing these connections or how we're building out these processes but you have the opportunity to inject your own kind of processing your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out of the box code that is provided in this data fabric platform from IRIS combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out of the box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. >> It's a lot here. You know, actually, if I could >> I can pause. >> If I just want to sort of play that back. So we went through the connect and the collect phase. >> And the collect, yes, we're going into refine. So it's a good place to stop. >> Yeah, so before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know the ability to bring in different dev tools. We heard about FIRE, which of course big in healthcare. >> Absolutely. >> And that's the standard and then SQL for traditional kind of structured data and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely, and I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refine. >> We're going into refinement, exciting. So how do we actually do refinement? Where does refinement happen and how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator or stands for smart data fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information it's aggregating it and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like and as you can see it follows a flow chart like structure. So there's a start, there is an end and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL Logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce or we make this data fabric a bit smarter and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging 'cause you need to be able to know, you know, if there was an issue, where did that issue happen, in which connected process and how did it affect the other processes that are related to it. In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric to when data was sent back out from that smart data fabric. So I didn't record the time but I bet if you recorded the time it was this time that we sent that request in. And you can see my patient's name and their medical record number here and you can see that that instigated four different calls to four different systems and they're represented by these arrows going out. So we sent something to chart script to our health record management system, to our clinical risk grouping application into my EMR through their FIRE server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems and we bundle them together. And for my FIRE lovers, here's our FIRE bundle that we got back from our FIRE server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping or errors that were thrown alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Etcher, who did what, when, where what did the machine do? What went wrong and where did that go wrong? >> Exactly. >> Right in your fingertips. >> Right, and I'm a visual person so a bunch of log files to me is not the most helpful. Well, being able to see this happened at this time in this location gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information, it's transforming that data, in a format that your consumer's not going to understand it's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? This all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well we can keep going. 'Cause right now, I mean we can, oh, we're at 18 minutes. God help us. You can cut some of this. (laughs) I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this but essentially if we go back to our coordinator here we can see here's that original that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at and we're running it through a machine learning model that exists within the smart data fabric pipeline and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL like syntax to be able to construct and execute these predictions. So it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge. >> Yes. >> Right, because it directly affects the cost of for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment or you know, as an outpatient perhaps to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely, absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day which is what makes this so exciting. >> Awesome demo. >> Thank you. >> Fantastic people, are you cool? If people want to get in touch with you? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy and we'd love to hear from you. I always love talking about this topic, so would be happy to engage on that. >> Great stuff, thank you Jess, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment we're going to dig into the use cases where data fabric is driving business value. Stay right there.

Published Date : Feb 15 2023

SUMMARY :

for organizations to gain new insights And to that end we do also So you were showing hundreds of these APIs in the healthcare industry So the way that we handle that it's not a bolt-on as they like to say. that data fabric to ensure that we're able It's a lot here. So we went through the So it's a good place to stop. the ability to bring And so you have a rich collection So that platform needs to we're going into refine. that are related to it. so a bunch of log files to of being able to create this technology to support Anything else you want to show us? So in this scenario, we're Well that readmission and the patient, you know. to that smart data fabric So you can find me on you Jess, appreciate it. because in the next segment

ENTITIES

Entity	Category	Confidence
Jessica Jowdy	PERSON	0.99+
Joe Lichtenberg	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Jess Jowdy	PERSON	0.99+
Scott	PERSON	0.99+
Jess	PERSON	0.99+
18 minutes	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
32345	OTHER	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
each	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
third segment	QUANTITY	0.99+
Etcher	ORGANIZATION	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.98+
one	QUANTITY	0.98+
JSON	TITLE	0.96+
each data source	QUANTITY	0.96+
single	QUANTITY	0.95+
one step	QUANTITY	0.94+
one step	QUANTITY	0.94+
single backend	QUANTITY	0.92+
single response	QUANTITY	0.9+
two more	QUANTITY	0.85+
single payload	QUANTITY	0.84+
SQL Logic	TITLE	0.84+
a second	QUANTITY	0.83+
IRIS	TITLE	0.83+
four different segments	QUANTITY	0.82+
Postman	PERSON	0.78+
FIRE	TITLE	0.77+
SOAP	TITLE	0.76+
four different applications	QUANTITY	0.74+
one stop	QUANTITY	0.74+
Postman	TITLE	0.73+
one payload	QUANTITY	0.72+
each of	QUANTITY	0.71+
REST	TITLE	0.7+
Healthcare Field Engineering	ORGANIZATION	0.67+
next 30 days	DATE	0.65+
four	QUANTITY	0.63+
these APIs	QUANTITY	0.62+
second	QUANTITY	0.54+
God	PERSON	0.53+
every	QUANTITY	0.53+
services	QUANTITY	0.51+
H7	COMMERCIAL_ITEM	0.5+
application	QUANTITY	0.48+
FIRE	ORGANIZATION	0.38+
XML	TITLE	0.38+

Itamar Ankorion, Qlik & Peter MacDonald, Snowflake | AWS re:Invent 2022

(upbeat music) >> Hello, welcome back to theCUBE's AWS RE:Invent 2022 Coverage. I'm John Furrier, host of theCUBE. Got a great lineup here, Itamar Ankorion SVP Technology Alliance at Qlik and Peter McDonald, vice President, cloud partnerships and business development Snowflake. We're going to talk about bringing SAP data to life, for joint Snowflake, Qlik and AWS Solution. Gentlemen, thanks for coming on theCUBE Really appreciate it. >> Thank you. >> Thank you, great meeting you John. >> Just to get started, introduce yourselves to the audience, then going to jump into what you guys are doing together, unique relationship here, really compelling solution in cloud. Big story about applications and scale this year. Let's introduce yourselves. Peter, we'll start with you. >> Great. I'm Peter MacDonald. I am vice president of Cloud Partners and business development here at Snowflake. On the Cloud Partner side, that means I manage AWS relationship along with Microsoft and Google Cloud. What we do together in terms of complimentary products, GTM, co-selling, things like that. Importantly, working with other third parties like Qlik for joint solutions. On business development, it's negotiating custom commercial partnerships, large companies like Salesforce and Dell, smaller companies at most for our venture portfolio. >> Thanks Peter and hi John. It's great to be back here. So I'm Itamar Ankorion and I'm the senior vice president responsible for technology alliances here at Qlik. With that, own strategic alliances, including our key partners in the cloud, including Snowflake and AWS. I've been in the data and analytics enterprise software market for 20 plus years, and my main focus is product management, marketing, alliances, and business development. I joined Qlik about three and a half years ago through the acquisition of Attunity, which is now the foundation for Qlik data integration. So again, we focus in my team on creating joint solution alignment with our key partners to provide more value to our customers. >> Great to have both you guys, senior executives in the industry on theCUBE here, talking about data, obviously bringing SAP data to life is the theme of this segment, but this reinvent, it's all about the data, big data end-to-end story, a lot about data being intrinsic as the CEO says on stage around in the organizations in all aspects. Take a minute to explain what you guys are doing as from a company standpoint. Snowflake and Qlik and the solutions, why here at AWS? Peter, we'll start with you at Snowflake, what you guys do as a company, your mission, your focus. >> That was great, John. Yeah, so here at Snowflake, we focus on the data platform and until recently, data platforms required expensive on-prem hardware appliances. And despite all that expense, customers had capacity constraints, inexpensive maintenance, and had limited functionality that all impeded these organizations from reaching their goals. Snowflake is a cloud native SaaS platform, and we've become so successful because we've addressed these pain points and have other new special features. For example, securely sharing data across both the organization and the value chain without copying the data, support for new data types such as JSON and structured data, and also advance in database data governance. Snowflake integrates with complimentary AWS services and other partner products. So we can enable holistic solutions that include, for example, here, both Qlik and AWS SageMaker, and comprehend and bring those to joint customers. Our customers want to convert data into insights along with advanced analytics platforms in AI. That is how they make holistic data-driven solutions that will give them competitive advantage. With Snowflake, our approach is to focus on customer solutions that leverage data from existing systems such as SAP, wherever they are in the cloud or on-premise. And to do this, we leverage partners like Qlik native US to help customers transform their businesses. We provide customers with a premier data analytics platform as a result. Itamar, why don't you talk about Qlik a little bit and then we can dive into the specific SAP solution here and some trends >> Sounds great, Peter. So Qlik provides modern data integration and analytics software used by over 38,000 customers worldwide. Our focus is to help our customers turn data into value and help them close the gap between data all the way through insight and action. We offer click data integration and click data analytics. Click data integration helps to automate the data pipelines to deliver data to where they want to use them in real-time and make the data ready for analytics and then Qlik data analytics is a robust platform for analytics and business intelligence has been a leader in the Gartner Magic Quadrant for over 11 years now in the market. And both of these come together into what we call Qlik Cloud, which is our SaaS based platform. So providing a more seamless way to consume all these services and accelerate time to value with customer solutions. In terms of partnerships, both Snowflake and AWS are very strategic to us here at Qlik, so we have very comprehensive investment to ensure strong joint value proposition to we can bring to our mutual customers, everything from aligning our roadmaps through optimizing and validating integrations, collaborating on best practices, packaging joint solutions like the one we'll talk about today. And with that investment, we are an elite level, top level partner with Snowflake. We fly that our technology is Snowflake-ready across the entire product set and we have hundreds of joint customers together and with AWS we've also partnered for a long time. We're here to reinvent. We've been here with the first reinvent since the inaugural one, so it kind of gives you an idea for how long we've been working with AWS. We provide very comprehensive integration with AWS data analytics services, and we have several competencies ranging from data analytics to migration and modernization. So that's our focus and again, we're excited about working with Snowflake and AWS to bring solutions together to market. >> Well, I'm looking forward to unpacking the solutions specifically, and congratulations on the continued success of both your companies. We've been following them obviously for a very long time and seeing the platform evolve beyond just SaaS and a lot more going on in cloud these days, kind of next generation emerging. You know, we're seeing a lot of macro trends that are going to be powering some of the things we're going to get into real quickly. But before we get into the solution, what are some of those power dynamics in the industry that you're seeing in trends specifically that are impacting your customers that are taking us down this road of getting more out of the data and specifically the SAP, but in general trends and dynamics. What are you hearing from your customers? Why do they care? Why are they going down this road? Peter, we'll start with you. >> Yeah, I'll go ahead and start. Thanks. Yeah, I'd say we continue to see customers being, being very eager to transform their businesses and they know they need to leverage technology and data to do so. They're also increasingly depending upon the cloud to bring that agility, that elasticity, new functionality necessary to react in real-time to every evolving customer needs. You look at what's happened over the last three years, and boy, the macro environment customers, it's all changing so fast. With our partnerships with AWS and Qlik, we've been able to bring to market innovative solutions like the one we're announcing today that spans all three companies. It provides a holistic solution and an integrated solution for our customer. >> Itamar let's get into it, you've been with theCUBE, you've seen the journey, you have your own journey, many, many years, you've seen the waves. What's going on now? I mean, what's the big wave? What's the dynamic powering this trend? >> Yeah, in a nutshell I'll call it, it's all about time. You know, it's time to value and it's about real-time data. I'll kind of talk about that a bit. So, I mean, you hear a lot about the data being the new oil, but it's definitely, we see more and more customers seeing data as their critical enabler for innovation and digital transformation. They look for ways to monetize data. They look as the data as the way in which they can innovate and bring different value to the customers. So we see customers want to use more data so to get more value from data. We definitely see them wanting to do it faster, right, than before. And we definitely see them looking for agility and automation as ways to accelerate time to value, and also reduce overall costs. I did mention real-time data, so we definitely see more and more customers, they want to be able to act and make decisions based on fresh data. So yesterday's data is just not good enough. >> John: Yeah. >> It's got to be down to the hour, down to the minutes and sometimes even lower than that. And then I think we're also seeing customers look to their core business systems where they have a lot of value, like the SAP, like mainframe and thinking, okay, our core data is there, how can we get more value from this data? So that's key things we see all the time with customers. >> Yeah, we did a big editorial segment this year on, we called data as code. Data as code is kind of a riff on infrastructure as code and you start to see data becoming proliferating into all aspects, fresh data. It's not just where you store it, it's how you share it, it's how you turn it into an application intrinsically involved in all aspects. This is the big theme this year and that's driving all the conversations here at RE:Invent. And I'm guaranteeing you, it's going to happen for another five and 10 years. It's not stopping. So I got to get into the solution, you guys mentioned SAP and you've announced the solution by Qlik, Snowflake and AWS for your customers using SAP. Can you share more about this solution? What's unique about it? Why is it important and why now? Peter, Itamar, we'll start with you first. >> Let me jump in, this is really, I'll jump because I'm excited. We're very excited about this solution and it's also a solution by the way and again, we've seen proven customer success with it. So to your point, it's ready to scale, it's starting, I think we're going to see a lot of companies doing this over the next few years. But before we jump to the solution, let me maybe take a few minutes just to clarify the need, why we're seeing, why we're seeing customers jump to do this. So customers that use SAP, they use it to manage the core of their business. So think order processing, management, finance, inventory, supply chain, and so much more. So if you're running SAP in your company, that data creates a great opportunity for you to drive innovation and modernization. So what we see customers want to do, they want to do more with their data and more means they want to take SAP with non-SAP data and use it together to drive new insights. They want to use real-time data to drive real-time analytics, which they couldn't do to date. They want to bring together descriptive with predictive analytics. So adding machine learning in AI to drive more value from the data. And naturally they want to do it faster. So find ways to iterate faster on their solutions, have freedom with the data and agility. And I think this is really where cloud data platforms like Snowflake and AWS, you know, bring that value to be able to drive that. Now to do that you need to unlock the SAP data, which is a lot of also where Qlik comes in because typical challenges these customers run into is the complexity, inherent in SAP data. Tens of thousands of tables, proprietary formats, complex data models, licensing restrictions, and more than, you have performance issues, they usually run into how do we handle the throughput, the volumes while maintaining lower latency and impact. Where do we find knowledge to really understand how to get all this done? So these are the things we've looked at when we came together to create a solution and make it unique. So when you think about its uniqueness, because we put together a lot, and I'll go through three, four key things that come together to make this unique. First is about data delivery. How do you have the SAP data delivery? So how do you get it from ECC, from HANA from S/4HANA, how do you deliver the data and the metadata and how that integration well into Snowflake. And what we've done is we've focused a lot on optimizing that process and the continuous ingestion, so the real-time ingestion of the data in a way that works really well with the Snowflake system, data cloud. Second thing is we looked at SAP data transformation, so once the data arrives at Snowflake, how do we turn it into being analytics ready? So that's where data transformation and data worth automation come in. And these are all elements of this solution. So creating derivative datasets, creating data marts, and all of that is done by again, creating an optimized integration that pushes down SQL based transformations, so they can be processed inside Snowflake, leveraging its powerful engine. And then the third element is bringing together data visualization analytics that can also take all the data now that in organizing inside Snowflake, bring other data in, bring machine learning from SageMaker, and then you go to create a seamless integration to bring analytic applications to life. So these are all things we put together in the solution. And maybe the last point is we actually took the next step with this and we created something we refer to as solution accelerators, which we're really, really keen about. Think about this as prepackaged templates for common business analytic needs like order to cash, finance, inventory. And we can either dig into that a little more later, but this gets the next level of value to the customers all built into this joint solution. >> Yeah, I want to get to the accelerators, but real quick, Peter, your reaction to the solution, what's unique about it? And obviously Snowflake, we've been seeing the progression data applications, more developers developing on top of Snowflake, data as code kind of implies developer ecosystem. This is kind of interesting. I mean, you got partnering with Qlik and AWS, it's kind of a developer-like thinking real solution. What's unique about this SAP solution that's, that's different than what customers can get anywhere else or not? >> Yeah, well listen, I think first of all, you have to start with the idea of the solution. This are three companies coming together to build a holistic solution that is all about, you know, creating a great opportunity to turn SAP data into value this is Itamar was talking about, that's really what we're talking about here and there's a lot of technology underneath it. I'll talk more about the Snowflake technology, what's involved here, and then cover some of the AWS pieces as well. But you know, we're focusing on getting that value out and accelerating time to value for our joint customers. As Itamar was saying, you know, there's a lot of complexity with the SAP data and a lot of value there. How can we manage that in a prepackaged way, bringing together best of breed solutions with proven capabilities and bringing this to market quickly for our joint customers. You know, Snowflake and AWS have been strong partners for a number of years now, and that's not only on how Snowflake runs on top of AWS, but also how we integrate with their complementary analytics and then all products. And so, you know, we want to be able to leverage those in addition to what Qlik is bringing in terms of the data transformations, bringing data out of SAP in the visualization as well. All very critical. And then we want to bring in the predictive analytics, AWS brings and what Sage brings. We'll talk about that a little bit later on. Some of the technologies that we're leveraging are some of our latest cutting edge technologies that really make things easier for both our partners and our customers. For example, Qlik leverages Snowflakes recently released Snowpark for Python functionality to push down those data transformations from clicking the Snowflake that Itamar's mentioning. And while we also leverage Snowpark for integrations with Amazon SageMaker, but there's a lot of great new technology that just makes this easy and compelling for customers. >> I think that's the big word, easy button here for what may look like a complex kind of integration, kind of turnkey, really, really compelling example of the modern era we're living in, as we always say in theCUBE. You mentioned accelerators, SAP accelerators. Can you give an example of how that works with the technology from the third party providers to deliver this business value Itamar, 'cause that was an interesting comment. What's the example? Give an example of this acceleration. >> Yes, certainly. I think this is something that really makes this truly, truly unique in the industry and again, a great opportunity for customers. So we kind talked earlier about there's a lot of things that need to be done with SP data to turn it to value. And these accelerator, as the name suggests, are designed to do just that, to kind of jumpstart the process and reduce the time and the risk involved in such project. So again, these are pre-packaged templates. We basically took a lot of knowledge, and a lot of configurations, best practices about to get things done and we put 'em together. So think about all the steps, it includes things like data extraction, so already knowing which tables, all the relevant tables that you need to get data from in the contexts of the solution you're looking for, say like order to cash, we'll get back to that one. How do you continuously deliver that data into Snowflake in an in efficient manner, handling things like data type mappings, metadata naming conventions and transformations. The data models you build all the way to data mart definitions and all the transformations that the data needs to go through moving through steps until it's fully analytics ready. And then on top of that, even adding a library of comprehensive analytic dashboards and integrations through machine learning and AI and put all of that in a way that's in pre-integrated and tested to work with Snowflake and AWS. So this is where again, you get this entire recipe that's ready. So take for example, I think I mentioned order to cash. So again, all these things I just talked about, I mean, for those who are not familiar, I mean order to cash is a critical business process for every organization. So especially if you're in retail, manufacturing, enterprise, it's a big... This is where, you know, starting with booking a sales order, following by fulfilling the order, billing the customer, then managing the accounts receivable when the customer actually pays, right? So this all process, you got sales order fulfillment and the billing impacts customer satisfaction, you got receivable payments, you know, the impact's working capital, cash liquidity. So again, as a result this order to cash process is a lifeblood for many businesses and it's critical to optimize and understand. So the solution accelerator we created specifically for order to cash takes care of understanding all these aspects and the data that needs to come with it. So everything we outline before to make the data available in Snowflake in a way that's really useful for downstream analytics, along with dashboards that are already common for that, for that use case. So again, this enables customers to gain real-time visibility into their sales orders, fulfillment, accounts receivable performance. That's what the Excel's are all about. And very similarly, we have another one for example, for finance analytics, right? So this will optimize financial data reporting, helps customers get insights into P&L, financial risk of stability or inventory analytics that helps with, you know, improve planning and inventory management, utilization, increased efficiencies, you know, so in supply chain. So again, these accelerators really help customers get a jumpstart and move faster with their solutions. >> Peter, this is the easy button we just talked about, getting things going, you know, get the ball rolling, get some acceleration. Big part of this are the three companies coming together doing this. >> Yeah, and to build on what Itamar just said that the SAP data obviously has tremendous value. Those sales orders, distribution data, financial data, bringing that into Snowflake makes it easily accessible, but also it enables it to be combined with other data too, is one of the things that Snowflake does so well. So you can get a full view of the end-to-end process and the business overall. You know, for example, I'll just take one, you know, one example that, that may not come to mind right away, but you know, looking at the impact of weather conditions on supply chain logistics is relevant and material and have interest to our customers. How do you bring those different data sets together in an easy way, bringing the data out of SAP, bringing maybe other data out of other systems through Qlik or through Snowflake, directly bringing data in from our data marketplace and bring that all together to make it work. You know, fundamentally organizational silos and the data fragmentation exist otherwise make it really difficult to drive modern analytics projects. And that in turn limits the value that our customers are getting from SAP data and these other data sets. We want to enable that and unleash. >> Yeah, time for value. This is great stuff. Itamar final question, you know, what are customers using this? What do you have? I'm sure you have customers examples already using the solution. Can you share kind of what these examples look like in the use cases and the value? >> Oh yeah, absolutely. Thank you. Happy to. We have customers across different, different sectors. You see manufacturing, retail, energy, oil and gas, CPG. So again, customers in those segments, typically sectors typically have SAP. So we have customers in all of them. A great example is like Siemens Energy. Siemens Energy is a global provider of gas par services. You know, over what, 28 billion, 30 billion in revenue. 90,000 employees. They operate globally in over 90 countries. So they've used SAP HANA as a core system, so it's running on premises, multiple locations around the world. And what they were looking for is a way to bring all these data together so they can innovate with it. And the thing is, Peter mentioned earlier, not just the SAP data, but also bring other data from other systems to bring it together for more value. That includes finance data, these logistics data, these customer CRM data. So they bring data from over 20 different SAP systems. Okay, with Qlik data integration, feeding that into Snowflake in under 20 minutes, 24/7, 365, you know, days a year. Okay, they get data from over 20,000 tables, you know, over million, hundreds of millions of records daily going in. So it is a great example of the type of scale, scalability, agility and speed that they can get to drive these kind of innovation. So that's a great example with Siemens. You know, another one comes to mind is a global manufacturer. Very similar scenario, but you know, they're using it for real-time executive reporting. So it's more like feasibility to the production data as well as for financial analytics. So think, think, think about everything from audit to texts to innovate financial intelligence because all the data's coming from SAP. >> It's a great time to be in the data business again. It keeps getting better and better. There's more data coming. It's not stopping, you know, it's growing so fast, it keeps coming. Every year, it's the same story, Peter. It's like, doesn't stop coming. As we wrap up here, let's just get customers some information on how to get started. I mean, obviously you're starting to see the accelerators, it's a great program there. What a great partnership between the two companies and AWS. How can customers get started to learn about the solution and take advantage of it, getting more out of their SAP data, Peter? >> Yeah, I think the first place to go to is talk to Snowflake, talk to AWS, talk to our account executives that are assigned to your account. Reach out to them and they will be able to educate you on the solution. We have packages up very nicely and can be deployed very, very quickly. >> Well gentlemen, thank you so much for coming on. Appreciate the conversation. Great overview of the partnership between, you know, Snowflake and Qlik and AWS on a joint solution. You know, getting more out of the SAP data. It's really kind of a key, key solution, bringing SAP data to life. Thanks for coming on theCUBE. Appreciate it. >> Thank you. >> Thank you John. >> Okay, this is theCUBE coverage here at RE:Invent 2022. I'm John Furrier, your host of theCUBE. Thanks for watching. (upbeat music)

Published Date : Dec 1 2022

SUMMARY :

bringing SAP data to life, great meeting you John. then going to jump into what On the Cloud Partner side, and I'm the senior vice and the solutions, and the value chain and accelerate time to value that are going to be powering and data to do so. What's the dynamic powering this trend? You know, it's time to value all the time with customers. and that's driving all the and it's also a solution by the way I mean, you got partnering and bringing this to market of the modern era we're living in, that the data needs to go through getting things going, you know, Yeah, and to build in the use cases and the value? agility and speed that they can get It's a great time to be to educate you on the solution. key solution, bringing SAP data to life. Okay, this is theCUBE

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Dell	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Siemens	ORGANIZATION	0.99+
Peter MacDonald	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter McDonald	PERSON	0.99+
Qlik	ORGANIZATION	0.99+
28 billion	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Tens	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Siemens Energy	ORGANIZATION	0.99+
20 plus years	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
third element	QUANTITY	0.99+
First	QUANTITY	0.99+
three	QUANTITY	0.99+
Itamar	PERSON	0.99+
over 20,000 tables	QUANTITY	0.99+
both	QUANTITY	0.99+
90,000 employees	QUANTITY	0.99+
first	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
Cloud Partners	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
over 38,000 customers	QUANTITY	0.99+
under 20 minutes	QUANTITY	0.99+
10 years	QUANTITY	0.99+
five	QUANTITY	0.99+
Excel	TITLE	0.99+
one	QUANTITY	0.99+
over 11 years	QUANTITY	0.98+
Snowpark	TITLE	0.98+
Second thing	QUANTITY	0.98+

John Kreisa, Couchbase | AWS re:Invent 2022

(upbeat music) >> Good morning and welcome back to fabulous Las Vegas, Nevada. We're here at AWS re:Invent with wall-to-wall coverage all day long on theCUBE. My name is Savannah Peterson and I am joined this morning by the beautiful Lisa Martin. Lisa, good morning. >> Good morning. Good. >> How you feeling day three? >> Day three is we are going to be shot out of a cannon today. The amount of content coming at you from theCUBE today- >> Get ready, you all. >> Us two gals, is a lot. We're going to have some great conversations. >> And we're starting with a really great one with a Cube Alumni to the max. You've been on the show multiple times. >> John: Yeah. >> Very excited to welcome John, the CMO of Couchbase. Welcome. How you doing this morning? >> Thanks. I'm doing great. Great to be here with you. >> How do you feel about the show so far? What's your pulse? >> The show has been great. I say the energy is great. The traffic at our booth, the conversations that we're having, both with prospective customers and even just partners, right? They're all here. The ecosystem is here >> And everyone's finally back in person and it feels so good. >> John: It does. >> So, we're going to dig in a little bit but just in case the audience isn't familiar, tell us about Couchbase. >> Sure. Couchbase is a publicly traded database company. We have a cloud database platform called Capella which is hosted on AWS and GCP. It is used for building mission-critical applications. So, we have great customers, we're building apps that really matter and are using to drive their business. So, we're very excited about that. 30% of the Fortune 100 are Couchbase customers. >> Nice. Talk a little bit about the AWS relationship. >> Mm-hm. Yeah, so we have a great AWS relationship. In fact, yesterday we announced a deepening of that relationship, a strategic collaboration agreement. We're very excited. It's a multi-year agreement. It's focused on go-to market, from a sales and marketing standpoint. We're going to target, you know, various verticals and, you know, really generate joint business between the two of us. So, it's a deepening of a already strong relationship and we're really excited about that. >> Savannah: Yeah. Go ahead. >> What are some of the industry verticals that you're going to be tackling together? >> Well, gaming for one, right? Manufacturing, the workloads that Couchbase is good for are these mission-critical workloads are ones that are really suited for us to be used with AWS. So, we've done some work with them already in those areas and I'm sure we'll be digging in even deeper. >> That's exciting. Speaking of digging in deeper, tell us a little bit more about Capella. >> Capella. It's a cloud databases services I mentioned. We launched it last October and we are super excited by the uptake, the interest that we're seeing. We have a free 30 day trial, so, you know, people can come and try it and get their hands dirty just getting experience with the product and then, you know, become a customer after that. And we're seeing very strong interest from our existing customer base as well. So, we're really excited about how things are going. >> Talk about Capella and the latest release and how it's really enabling Couchbase to invest deeper into the developer experience. >> Yeah, so, at the end of October, we announced a revamp of our user interface, our user experience for Capella really focused on developers. And what we've done is make it so that it's familiar to developers, right? It's a GitHub-like experience. So, developer comes in, they're very familiar, of course, with GitHub, they are familiar with how the Couchbase Capella interface will work. And so that's something that, you know, we've really invested, in fact, we've invested in developers quite a bit. We announced a Couchbase community hub and a Couchbase ambassadors program, both focused on developers and getting out there and building our community. >> A community is a big topic that we've been talking about at all the conferences this year. We're all back in person, in community. How often are you communicating with your community to get feedback on what that experience should be like? >> Yeah, I mean, we actually have a Discord server, so we're in constant communication. (Savannah laughing) >> Savannah: Yes. (John laughing) 24/7. (laughing) >> Basically, you know, we have staff who's dedicated to making sure that the users on there are getting their answers and giving us feedback on the experience. The ambassadors are somebody who have a really strong relationship, who get early insight and give us feedback before we even release a product. So, it gives us a chance to really test-drive it with core developers and get the insight we need before we get it in the market. >> Yeah. It matters so much. You can build it, but they won't come if it's not fantastic. >> John: Exactly. >> Lisa: Right. >> Let's shift a little bit and talk about customers. How, and price, how do you guys compare? >> Customers and? >> And price, your price performance? >> Price, oh. So, customers, we also announced this week a joint customer Arthrex with AWS. Arthrex is a orthopedics medical devices company and they use our Edge capabilities along with running Couchbase on AWS. So, you think of the kinds of surgeries that orthopedic surgeons do, it's scopes and they are often inside. So, what it does is it collects the data, the video data and all of that on a medical devices and then brings it back to a centralized app for the doctors to use sort of in post when they're actually doing further medical recommendations. >> Savannah: It's so cool. >> So, it's cool, the thing about it is it can work whether it's online or offline, it's one of the reasons that Arthrex selected us because the fact that it can, you know, often sometimes there's not connectivity in the operating room, I'd say deep inside of a hospital. So, these devices work regardless and then when they get connectivity, it sinks back to that centralized service. So, it's one of the main reasons that they selected us. >> That's outstanding. You know, one of the things that John Furrier, you know, John, well, you guys go way back. >> John: Way back. >> He had a sit down with Adam Selensky, oh, about 10 days or so ago. He gets an exclusive with the CEO of AWS every pre re:Invent. And one of the things that Adam said is that the role or the title, data analyst, is going to go away, in that every role will have responsibilities of analyzing data. And I always think of that in terms of operations, marketing, finance, sales, but you just brought up physicians as data analysts in their jobs, right? Probably not, we're thinking about it in that way. >> Yeah. >> But it's so interesting how data is really being democratized. >> John: Yeah. >> And how Couchbase is an enabler of that in an operating room. >> John: Yeah, yeah, yeah. >> That's amazing. >> It's a great story. There's many others and I think, you know, we have embedded operational analytics in Couchbase Capella, and, you know, in our offerings in general. So, what that does is allows us to do real-time, highly personalized applications based on that analytics that are coming in real-time from the data from the applications. And so that's something that's actually driving a highly interactive user experience, one that's very personalized and customized. And that's one of the things that our customers really like about what we do. >> It's fascinating. I never thought about it from a medical device perspective. >> Lisa: No, no. >> John: No. >> My gosh is if doctors don't have enough cognitive burden load. >> John: I know. >> You know, right? Like, they don't need to be a data analyst. I would much rather they were just good at the surgery part. That's a piece of the puzzle I need them to do. Yeah, for sure. That's a fascinating customer example. Can you share any other joint AWS examples with us? >> Joint AW- I mean, there's many in the gaming area where, because Couchbase is memory-first architecture, we deliver very, very interactive user experiences and we're used a lot for session management, user ID management in the gaming space, specifically with AWS. It's an area we've done some joint work already and had a lot of success, you know, with small and large gaming companies. >> Yeah. It looks like you also, according to my notes here, we've got things in travel and hospitality as well. >> Yes. Also Carnival Cruises is a great example. We enable their on-ship, on-board experience, highly customized, everybody wears a device called a medallion, and as they move around the ship, it knows where they are and it's able to provide customized services. You walk up to a bar, you have your favorite drink, it can be hit the bar when you land there. >> I'll take that. >> How about that? (laugh) >> That's outstanding. >> Isn't that great? >> Can we carry that onto the AWS show floor? >> Exactly. >> Or Starbucks order? >> Yeah, yeah. Yes, please. Yes, please. Well, another thing that's so interesting these days, is that every company has to be a data company. Say they have to be a software company. They have to be a data company. You just gave some great examples. Hospitality, gaming, healthcare, where that data democratization has to happen. >> John: Yeah. >> Businesses has to transform. But one of the things that Adam also told John is that CIOs, CEOs are coming to him not wanting to talk about technology but about transformation. >> Yeah. >> Huge topic. >> And that's a journey where every customer is at different levels. >> Yeah. >> How is Couchbase helping businesses transform and where are your customer conversations these days? >> Yeah, yeah, yeah. So, I mean, the transformation of the business is a major topic of conversation. So, we completely agree with that. How Couchbase helps is, you know, in our database, one of the things we have is the SQL engine. And so as people are looking to move and modernize their infrastructure, if they're moving off of, or from like a technology that's principally based on SQL but doesn't give all the flexibility of a JSON database or document database like we do, we actually enable them to get more easily onto our platform so that they can start that transformation. And then it's a, you know, it's a journey of how they want to transform their business and it's really focused on how do they better serve their customers and clients, whether it's internal or external? >> It really matters. I mean, and that ease of use as well as the transformation journey. It takes a long time for people to adapt. So, every piece of that puzzle, every Lego being quicker or easier, more intuitive, like you said, with the user experience, we can tell you're very thoughtful. How does this improve the total cost of ownership for your customers? >> That's one of the things that we announced along with that developer changes, was a new storage engine underneath Couchbase Capella. And it's 10 X more dense storage. And what that means is fewer servers. So, fewer servers is a much better cost of ownership story. That plus just the performance of the platform itself, we find, you know, against competition, we can do things on say six nodes that take 18 nodes for others. >> Lisa: Oh wow. >> And we have a great consolidation story as well because we have, it's a multi-modal database, meaning that it has SQL engine, document database, full tech search, eventing and analytics, all these pieces on one common data layer. So, you can actually consolidate off of other technologies onto one, onto Couchbase, and that actually saves you money. So, that's a great story for us. >> There's got to be a sustainability element to that as well? >> Yeah, I mean it's, obviously, if you're using less, using fewer servers, there's a kind of power consumption aspect of it as well. Absolutely. >> Are you finding that a lot of customers and companies we talk to these days have in their RFPs, they must only work with vendors who have an actual ESG program? Are you finding more customers coming to you saying, how can you help us dial down our carbon emissions? >> John: Yeah. >> Savannah: Great question. >> We've got a sustainability program that we've got to meet, we've got commitments to our customers. >> John: Yeah. >> Is that something that's really now kind of a hard and fast requirement? >> We're hearing it, we're definitely hearing it. I wouldn't say it's, you know, massively pervasive but I would say it's a growing component of, as you said, RFPs. And it's something that we feel like we have a great story for. And so, you know, it's something that helps when we get into those conversations, we can clearly articulate how we can provide that value and how we meet some of those needs that they have. >> Yeah, that's awesome. So, we have a bit of a challenge, new to the show at re:Invent. >> John: Mm-hm. >> Where we are prompting you to give us your 30 second Instagram Reel sizzle highlight. Don't worry, I'm not actually timing you, but your thought leadership hot-take on the most important theme or takeaway from this year's show. >> From the conference here. I would say that, and I think this was talked about a little bit by AWS as well, but the convergence of analytics and operational data, you know, through the applications is one that we're certainly seeing as well. It's the reason we have analytics in our database. But as I walk around and look at it, I see that very much as a common theme as well, in terms of what other vendors are saying and just the conversations we're having. So for me, that's one of the things I think would be a takeaway from this show. >> Yeah. Embedded analytics, real-time, everybody wants to know what's going on, in context. >> Yeah. That's right. >> Right now, not last week, not what we're processing from last month. >> Exactly. >> I mean, right? (cross-talking) >> So, I can react and take advantage or take an action if I have to. >> Exactly. And then deliver that personalized experience that we all expect these days. >> Oh, yes. >> I'll take that medallion- >> It's about the medallion. I was like, okay. >> You up with that, John? >> We'll get right on it. >> Lisa: All right. (laughs) >> About this. So, what's next for Couchbase? >> John: Well- >> I know you got the partnership, you've got all this exciting momentum. >> So, we're excited heading into next year. We're going to continue to innovate on Capella, right? Continue to deliver more value, lean into our developer community that we have. We're investing heavily, not just from a product standpoint but from a company standpoint in terms of, you know, our community meetups and some of those things. We have a big community-focused event coming up in March called Connect, Couchbase Connect. So, that's something that we'll, you know, continue to drive. That'll be a major theme for us next year. Cloud and developers and, you know, continuing to enable that ecosystem. >> Lisa: Excellent. >> I just had a Microsoft moment where I saw you saying, "Cloud developers," on stage. (Lisa and Savannah laughing) >> I'm not going Steve Ballmer on you. (all laughing) >> Pardon. I was trying to get someone to sing yesterday. I was hoping you were my Ballmer dance. Oh, man. Well, this has been a really great way to start the day. John, thank you so much for being on the show with us, seriously. And it's great that you keep coming back. I'm glad we haven't scared you off. (John laughing) >> Never. >> Savannah: We will have you anytime. >> Thank you. >> And thank you all for tuning in for yet another fantastic day of all day live coverage here from AWS re:Invent. We are in Sin City, having a fabulous time with Lisa Martin. I'm Savannah Peterson. This is theCUBE and we are the leader in high-tech technology coverage. (upbeat music) (upbeat music fades)

Published Date : Nov 30 2022

SUMMARY :

by the beautiful Lisa Martin. Good morning. at you from theCUBE today- We're going to have some You've been on the show multiple times. How you doing this morning? Great to be here with you. I say the energy is great. and it feels so good. but just in case the So, we have great customers, the AWS relationship. We're going to target, you Manufacturing, the Speaking of digging in deeper, the product and then, you know, and the latest release And so that's something that, you know, about at all the conferences this year. Yeah, I mean, we actually Savannah: Yes. get the insight we need come if it's not fantastic. How, and price, how do you guys compare? for the doctors to use sort of in post because the fact that it can, you know, You know, one of the is that the role or the But it's so interesting how data of that in an operating room. And that's one of the things I never thought about it from My gosh is if doctors don't have enough That's a piece of the and had a lot of success, you know, and hospitality as well. it can be hit the bar when you land there. They have to be a data company. But one of the things that Adam And that's a journey one of the things we So, every piece of that puzzle, we find, you know, against competition, So, you can actually consolidate consumption aspect of it as well. program that we've got to meet, And it's something that we feel So, we have a bit of a challenge, Where we are prompting you to give us and just the conversations we're having. in context. not what we're processing from last month. So, I can react and take that we all expect these days. It's about the medallion. Lisa: All right. So, what's I know you got the partnership, So, that's something that we'll, you know, where I saw you saying, I'm not going Steve Ballmer on you. And it's great that you keep coming back. have you anytime. And thank you all for tuning in

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Adam	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Lisa	PERSON	0.99+
Adam Selensky	PERSON	0.99+
Savannah	PERSON	0.99+
John Kreisa	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Sin City	LOCATION	0.99+
John Furrier	PERSON	0.99+
March	DATE	0.99+
30 day	QUANTITY	0.99+
Arthrex	ORGANIZATION	0.99+
two	QUANTITY	0.99+
last month	DATE	0.99+
Steve Ballmer	PERSON	0.99+
one	QUANTITY	0.99+
30%	QUANTITY	0.99+
next year	DATE	0.99+
yesterday	DATE	0.99+
last week	DATE	0.99+
Couchbase	ORGANIZATION	0.99+
30 second	QUANTITY	0.99+
Ballmer	PERSON	0.99+
Las Vegas, Nevada	LOCATION	0.98+
Capella	ORGANIZATION	0.98+
last October	DATE	0.98+
today	DATE	0.98+
18 nodes	QUANTITY	0.98+
this week	DATE	0.98+
both	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.97+
this year	DATE	0.97+
GitHub	ORGANIZATION	0.97+
SQL	TITLE	0.97+
Lego	ORGANIZATION	0.96+
six nodes	QUANTITY	0.96+

Shireesh Thota, SingleStore & Hemanth Manda, IBM | AWS re:Invent 2022

>>Good evening everyone and welcome back to Sparkly Sin City, Las Vegas, Nevada, where we are here with the cube covering AWS Reinvent for the 10th year in a row. John Furrier has been here for all 10. John, we are in our last session of day one. How does it compare? >>I just graduated high school 10 years ago. It's exciting to be, here's been a long time. We've gotten a lot older. My >>Got your brain is complex. You've been a lot in there. So fast. >>Graduated eight in high school. You know how it's No. All good. This is what's going on. This next segment, wrapping up day one, which is like the the kickoff. The Mondays great year. I mean Tuesdays coming tomorrow big days. The announcements are all around the kind of next gen and you're starting to see partnering and integration is a huge part of this next wave cuz API's at the cloud, next gen cloud's gonna be deep engineering integration and you're gonna start to see business relationships and business transformation scale a horizontally, not only across applications but companies. This has been going on for a while, covering it. This next segment is gonna be one of those things that we're gonna look at as something that's gonna happen more and more on >>Yeah, I think so. It's what we've been talking about all day. Without further ado, I would like to welcome our very exciting guest for this final segment, trust from single store. Thank you for being here. And we also have him on from IBM Data and ai. Y'all are partners. Been partners for about a year. I'm gonna go out on a limb only because their legacy and suspect that a few people, a few more people might know what IBM does versus what a single store does. So why don't you just give us a little bit of background so everybody knows what's going on. >>Yeah, so single store is a relational database. It's a foundational relational systems, but the thing that we do the best is what we call us realtime analytics. So we have these systems that are legacy, which which do operations or analytics. And if you wanted to bring them together, like most of the applications want to, it's really a big hassle. You have to build an ETL pipeline, you'd have to duplicate the data. It's really faulty systems all over the place and you won't get the insights really quickly. Single store is trying to solve that problem elegantly by having an architecture that brings both operational and analytics in one place. >>Brilliant. >>You guys had a big funding now expanding men. Sequel, single store databases, 46 billion again, databases. We've been saying this in the queue for 12 years have been great and recently not one database will rule the world. We know that. That's, everyone knows that databases, data code, cloud scale, this is the convergence now of all that coming together where data, this reinvent is the theme. Everyone will be talking about end to end data, new kinds of specialized services, faster performance, new kinds of application development. This is the big part of why you guys are working together. Explain the relationship, how you guys are partnering and engineering together. >>Yeah, absolutely. I think so ibm, right? I think we are mainly into hybrid cloud and ai and one of the things we are looking at is expanding our ecosystem, right? Because we have gaps and as opposed to building everything organically, we want to partner with the likes of single store, which have unique capabilities that complement what we have. Because at the end of the day, customers are looking for an end to end solution that's also business problems. And they are very good at real time data analytics and hit staff, right? Because we have transactional databases, analytical databases, data lakes, but head staff is a gap that we currently have. And by partnering with them we can essentially address the needs of our customers and also what we plan to do is try to integrate our products and solutions with that so that when we can deliver a solution to our customers, >>This is why I was saying earlier, I think this is a a tell sign of what's coming from a lot of use cases where people are partnering right now you got the clouds, a bunch of building blocks. If you put it together yourself, you can build a durable system, very stable if you want out of the box solution, you can get that pre-built, but you really can't optimize. It breaks, you gotta replace it. High level engineering systems together is a little bit different, not just buying something out of the box. You guys are working together. This is kind of an end to end dynamic that we're gonna hear a lot more about at reinvent from the CEO ofs. But you guys are doing it across companies, not just with aws. Can you guys share this new engineering business model use case? Do you agree with what I'm saying? Do you think that's No, exactly. Do you think John's crazy, crazy? I mean I all discourse, you got out of the box, engineer it yourself, but then now you're, when people do joint engineering project, right? They're different. Yeah, >>Yeah. No, I mean, you know, I think our partnership is a, is a testament to what you just said, right? When you think about how to achieve realtime insights, the data comes into the system and, and the customers and new applications want insights as soon as the data comes into the system. So what we have done is basically build an architecture that enables that we have our own storage and query engine indexing, et cetera. And so we've innovated in our indexing in our database engine, but we wanna go further than that. We wanna be able to exploit the innovation that's happening at ibm. A very good example is, for instance, we have a native connector with Cognos, their BI dashboards right? To reason data very natively. So we build a hyper efficient system that moves the data very efficiently. A very other good example is embedded ai. >>So IBM of course has built AI chip and they have basically advanced quite a bit into the embedded ai, custom ai. So what we have done is, is as a true marriage between the engineering teams here, we make sure that the data in single store can natively exploit that kind of goodness. So we have taken their libraries. So if you have have data in single store, like let's imagine if you have Twitter data, if you wanna do sentiment analysis, you don't have to move the data out model, drain the model outside, et cetera. We just have the pre-built embedded AI libraries already. So it's a, it's a pure engineering manage there that kind of opens up a lot more insights than just simple analytics and >>Cost by the way too. Moving data around >>Another big theme. Yeah. >>And latency and speed is everything about single store and you know, it couldn't have happened without this kind of a partnership. >>So you've been at IBM for almost two decades, don't look it, but at nearly 17 years in how has, and maybe it hasn't, so feel free to educate us. How has, how has IBM's approach to AI and ML evolved as well as looking to involve partnerships in the ecosystem as a, as a collaborative raise the water level together force? >>Yeah, absolutely. So I think when we initially started ai, right? I think we are, if you recollect Watson was the forefront of ai. We started the whole journey. I think our focus was more on end solutions, both horizontal and vertical. Watson Health, which is more vertically focused. We were also looking at Watson Assistant and Watson Discovery, which were more horizontally focused. I think it it, that whole strategy of the world period of time. Now we are trying to be more open. For example, this whole embedable AI that CICE was talking about. Yeah, it's essentially making the guts of our AI libraries, making them available for partners and ISVs to build their own applications and solutions. We've been using it historically within our own products the past few years, but now we are making it available. So that, how >>Big of a shift is that? Do, do you think we're seeing a more open and collaborative ecosystem in the space in general? >>Absolutely. Because I mean if you think about it, in my opinion, everybody is moving towards AI and that's the future. And you have two option. Either you build it on your own, which is gonna require significant amount of time, effort, investment, research, or you partner with the likes of ibm, which has been doing it for a while, right? And it has the ability to scale to the requirements of all the enterprises and partners. So you have that option and some companies are picking to do it on their own, but I believe that there's a huge amount of opportunity where people are looking to partner and source what's already available as opposed to investing from the scratch >>Classic buy versus build analysis for them to figure out, yeah, to get into the game >>And, and, and why reinvent the wheel when we're all trying to do things at, at not just scale but orders of magnitude faster and and more efficiently than we were before. It, it makes sense to share, but it's, it is, it does feel like a bit of a shift almost paradigm shift in, in the culture of competition versus how we're gonna creatively solve these problems. There's room for a lot of players here, I think. And yeah, it's, I don't >>Know, it's really, I wanted to ask if you don't mind me jumping in on that. So, okay, I get that people buy a bill I'm gonna use existing or build my own. The decision point on that is, to your point about the path of getting the path of AI is do I have the core competency skills, gap's a big issue. So, okay, the cube, if you had ai, we'd take it cuz we don't have any AI engineers around yet to build out on all the linguistic data we have. So we might use your ai but I might say this to then and we want to have a core competency. How do companies get that core competency going while using and partnering with, with ai? What you guys, what do you guys see as a way for them to get going? Because I think some people probably want to have core competency of >>Ai. Yeah, so I think, again, I think I, I wanna distinguish between a solution which requires core competency. You need expertise on the use case and you need expertise on your industry vertical and your customers versus the foundational components of ai, which are like, which are agnostic to the core competency, right? Because you take the foundational piece and then you further train it and define it for your specific use case. So we are not saying that we are experts in all the industry verticals. What we are good at is like foundational components, which is what we wanna provide. Got it. >>Yeah, that's the hard deep yes. Heavy lift. >>Yeah. And I can, I can give a color to that question from our perspective, right? When we think about what is our core competency, it's about databases, right? But there's a, some biotic relationship between data and ai, you know, they sort of like really move each other, right? You >>Need, they kind of can't have one without the other. You can, >>Right? And so the, the question is how do we make sure that we expand that, that that relationship where our customers can operationalize their AI applications closer to the data, not move the data somewhere else and do the modeling and then training somewhere else and dealing with multiple systems, et cetera. And this is where this kind of a cross engineering relationship helps. >>Awesome. Awesome. Great. And then I think companies are gonna want to have that baseline foundation and then start hiring in learning. It's like driving the car. You get the keys when you're ready to go. >>Yeah, >>Yeah. Think I'll give you a simple example, right? >>I want that turnkey lifestyle. We all do. Yeah, >>Yeah. Let me, let me just give you a quick analogy, right? For example, you can, you can basically make the engines and the car on your own or you can source the engine and you can make the car. So it's, it's basically an option that you can decide. The same thing with airplanes as well, right? Whether you wanna make the whole thing or whether you wanna source from someone who is already good at doing that piece, right? So that's, >>Or even create a new alloy for that matter. I mean you can take it all the way down in that analogy, >>Right? Is there a structural change and how companies are laying out their architecture in this modern era as we start to see this next let gen cloud emerge, teams, security teams becoming much more focused data teams. Its building into the DevOps into the developer pipeline, seeing that trend. What do you guys see in the modern data stack kind of evolution? Is there a data solutions architect coming? Do they exist yet? Is that what we're gonna see? Is it data as code automation? How do you guys see this landscape of the evolving persona? >>I mean if you look at the modern data stack as it is defined today, it is too detailed, it's too OSes and there are way too many layers, right? There are at least five different layers. You gotta have like a storage you replicate to do real time insights and then there's a query layer, visualization and then ai, right? So you have too many ETL pipelines in between, too many services, too many choke points, too many failures, >>Right? Etl, that's the dirty three letter word. >>Say no to ETL >>Adam Celeste, that's his quote, not mine. We hear that. >>Yeah. I mean there are different names to it. They don't call it etl, we call it replication, whatnot. But the point is hassle >>Data is getting more hassle. More >>Hassle. Yeah. The data is ultimately getting replicated in the modern data stack, right? And that's kind of one of our thesis at single store, which is that you'd have to converge not hyper specialize and conversation and convergence is possible in certain areas, right? When you think about operational analytics as two different aspects of the data pipeline, it is possible to bring them together. And we have done it, we have a lot of proof points to it, our customer stories speak to it and that is one area of convergence. We need to see more of it. The relationship with IBM is sort of another step of convergence wherein the, the final phases, the operation analytics is coming together and can we take analytics visualization with reports and dashboards and AI together. This is where Cognos and embedded AI comes into together, right? So we believe in single store, which is really conversions >>One single path. >>A shocking, a shocking tie >>Back there. So, so obviously, you know one of the things we love to joke about in the cube cuz we like to goof on the old enterprise is they solve complexity by adding more complexity. That's old. Old thinking. The new thinking is put it under the covers, abstract the way the complexities and make it easier. That's right. So how do you guys see that? Because this end to end story is not getting less complicated. It's actually, I believe increasing and complication complexity. However there's opportunities doing >>It >>More faster to put it under the covers or put it under the hood. What do you guys think about the how, how this new complexity gets managed or in this new data world we're gonna be coming in? >>Yeah, so I think you're absolutely right. It's the world is becoming more complex, technology is becoming more complex and I think there is a real need and it's not just from coming from us, it's also coming from the customers to simplify things. So our approach around AI is exactly that because we are essentially providing libraries, just like you have Python libraries, there are libraries now you have AI libraries that you can go infuse and embed deeply within applications and solutions. So it becomes integrated and simplistic for the customer point of view. From a user point of view, it's, it's very simple to consume, right? So that's what we are doing and I think single store is doing that with data, simplifying data and we are trying to do that with the rest of the portfolio, specifically ai. >>It's no wonder there's a lot of synergy between the two companies. John, do you think they're ready for the Instagram >>Challenge? Yes, they're ready. Uhoh >>Think they're ready. So we're doing a bit of a challenge. A little 32nd off the cuff. What's the most important takeaway? This could be your, think of it as your thought leadership sound bite from AWS >>2023 on Instagram reel. I'm scrolling. That's the Instagram, it's >>Your moment to stand out. Yeah, exactly. Stress. You look like you're ready to rock. Let's go for it. You've got that smile, I'm gonna let you go. Oh >>Goodness. You know, there is, there's this quote from astrophysics, space moves matter, a matter tells space how to curve. They have that kind of a relationship. I see the same between AI and data, right? They need to move together. And so AI is possible only with right data and, and data is meaningless without good insights through ai. They really have that kind of relationship and you would see a lot more of that happening in the future. The future of data and AI are combined and that's gonna happen. Accelerate a lot faster. >>Sures, well done. Wow. Thank you. I am very impressed. It's tough hacks to follow. You ready for it though? Let's go. Absolutely. >>Yeah. So just, just to add what is said, right, I think there's a quote from Rob Thomas, one of our leaders at ibm. There's no AI without ia. Essentially there's no AI without information architecture, which essentially data. But I wanna add one more thing. There's a lot of buzz around ai. I mean we are talking about simplicity here. AI in my opinion is three things and three things only. Either you use AI to predict future for forecasting, use AI to automate things. It could be simple, mundane task, it would be complex tasks depending on how exactly you want to use it. And third is to optimize. So predict, automate, optimize. Anything else is buzz. >>Okay. >>Brilliantly said. Honestly, I think you both probably hit the 32nd time mark that we gave you there. And the enthusiasm loved your hunger on that. You were born ready for that kind of pitch. I think they both nailed it for the, >>They nailed it. Nailed it. Well done. >>I I think that about sums it up for us. One last closing note and opportunity for you. You have a V 8.0 product coming out soon, December 13th if I'm not mistaken. You wanna give us a quick 15 second preview of that? >>Super excited about this. This is one of the, one of our major releases. So we are evolving the system on multiple dimensions on enterprise and governance and programmability. So there are certain features that some of our customers are aware of. We have made huge performance gains in our JSON access. We made it easy for people to consume, blossom on OnPrem and hybrid architectures. There are multiple other things that we're gonna put out on, on our site. So it's coming out on December 13th. It's, it's a major next phase of our >>System. And real quick, wasm is the web assembly moment. Correct. And the new >>About, we have pioneers in that we, we be wasm inside the engine. So you could run complex modules that are written in, could be C, could be rushed, could be Python. Instead of writing the the sequel and SQL as a store procedure, you could now run those modules inside. I >>Wanted to get that out there because at coupon we covered that >>Savannah Bay hot topic. Like, >>Like a blanket. We covered it like a blanket. >>Wow. >>On that glowing note, Dre, thank you so much for being here with us on the show. We hope to have both single store and IBM back on plenty more times in the future. Thank all of you for tuning in to our coverage here from Las Vegas in Nevada at AWS Reinvent 2022 with John Furrier. My name is Savannah Peterson. You're watching the Cube, the leader in high tech coverage. We'll see you tomorrow.

Published Date : Nov 29 2022

SUMMARY :

John, we are in our last session of day one. It's exciting to be, here's been a long time. So fast. The announcements are all around the kind of next gen So why don't you just give us a little bit of background so everybody knows what's going on. It's really faulty systems all over the place and you won't get the This is the big part of why you guys are working together. and ai and one of the things we are looking at is expanding our ecosystem, I mean I all discourse, you got out of the box, When you think about how to achieve realtime insights, the data comes into the system and, So if you have have data in single store, like let's imagine if you have Twitter data, if you wanna do sentiment analysis, Cost by the way too. Yeah. And latency and speed is everything about single store and you know, it couldn't have happened without this kind and maybe it hasn't, so feel free to educate us. I think we are, So you have that option and some in, in the culture of competition versus how we're gonna creatively solve these problems. So, okay, the cube, if you had ai, we'd take it cuz we don't have any AI engineers around yet You need expertise on the use case and you need expertise on your industry vertical and Yeah, that's the hard deep yes. you know, they sort of like really move each other, right? You can, And so the, the question is how do we make sure that we expand that, You get the keys when you're ready to I want that turnkey lifestyle. So it's, it's basically an option that you can decide. I mean you can take it all the way down in that analogy, What do you guys see in the modern data stack kind of evolution? I mean if you look at the modern data stack as it is defined today, it is too detailed, Etl, that's the dirty three letter word. We hear that. They don't call it etl, we call it replication, Data is getting more hassle. When you think about operational analytics So how do you guys see that? What do you guys think about the how, is exactly that because we are essentially providing libraries, just like you have Python libraries, John, do you think they're ready for the Instagram Yes, they're ready. A little 32nd off the cuff. That's the Instagram, You've got that smile, I'm gonna let you go. and you would see a lot more of that happening in the future. I am very impressed. I mean we are talking about simplicity Honestly, I think you both probably hit the 32nd time mark that we gave you there. They nailed it. I I think that about sums it up for us. So we are evolving And the new So you could run complex modules that are written in, could be C, We covered it like a blanket. On that glowing note, Dre, thank you so much for being here with us on the show.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
December 13th	DATE	0.99+
Shireesh Thota	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Adam Celeste	PERSON	0.99+
Rob Thomas	PERSON	0.99+
46 billion	QUANTITY	0.99+
12 years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
three things	QUANTITY	0.99+
15 second	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
Python	TITLE	0.99+
10th year	QUANTITY	0.99+
two companies	QUANTITY	0.99+
third	QUANTITY	0.99+
32nd time	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
32nd	QUANTITY	0.99+
single store	QUANTITY	0.99+
Tuesdays	DATE	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.98+
10 years ago	DATE	0.98+
SingleStore	ORGANIZATION	0.98+
Single store	QUANTITY	0.98+
Hemanth Manda	PERSON	0.98+
Dre	PERSON	0.97+
eight	QUANTITY	0.96+
two option	QUANTITY	0.96+
day one	QUANTITY	0.96+
one more thing	QUANTITY	0.96+
one database	QUANTITY	0.95+
two different aspects	QUANTITY	0.95+
Mondays	DATE	0.95+
Instagram	ORGANIZATION	0.95+
IBM Data	ORGANIZATION	0.94+
10	QUANTITY	0.94+
about a year	QUANTITY	0.94+
CICE	ORGANIZATION	0.93+
three letter	QUANTITY	0.93+
today	DATE	0.93+
one place	QUANTITY	0.93+
Watson	TITLE	0.93+
One last	QUANTITY	0.92+
Cognos	ORGANIZATION	0.91+
Watson Assistant	TITLE	0.91+
nearly 17 years	QUANTITY	0.9+
Watson Health	TITLE	0.89+
Las Vegas, Nevada	LOCATION	0.89+
aws	ORGANIZATION	0.86+
one area	QUANTITY	0.86+
SQL	TITLE	0.86+
One single path	QUANTITY	0.85+
two decades	QUANTITY	0.8+
five different layers	QUANTITY	0.8+
Invent 2022	EVENT	0.77+
JSON	TITLE	0.77+

Peter MacDonald & Itamar Ankorion | AWS re:Invent 2022

Published Date : Nov 23 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Siemens	ORGANIZATION	0.99+
Peter MacDonald	PERSON	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter McDonald	PERSON	0.99+
Itamar Ankorion	PERSON	0.99+
Qlik	ORGANIZATION	0.99+
28 billion	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Tens	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Siemens Energy	ORGANIZATION	0.99+
20 plus years	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
First	QUANTITY	0.99+
three	QUANTITY	0.99+
Itamar	PERSON	0.99+
over 20,000 tables	QUANTITY	0.99+
both	QUANTITY	0.99+
90,000 employees	QUANTITY	0.99+
first	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
Cloud Partners	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
over 38,000 customers	QUANTITY	0.99+
under 20 minutes	QUANTITY	0.99+
10 years	QUANTITY	0.99+
five	QUANTITY	0.99+
Excel	TITLE	0.99+
one	QUANTITY	0.99+
over 11 years	QUANTITY	0.98+
Snowpark	TITLE	0.98+
Second thing	QUANTITY	0.98+

The Truth About MySQL HeatWave

>>When Oracle acquired my SQL via the Sun acquisition, nobody really thought the company would put much effort into the platform preferring to focus all the wood behind its leading Oracle database, Arrow pun intended. But two years ago, Oracle surprised many folks by announcing my SQL Heatwave a new database as a service with a massively parallel hybrid Columbia in Mary Mary architecture that brings together transactional and analytic data in a single platform. Welcome to our latest database, power panel on the cube. My name is Dave Ante, and today we're gonna discuss Oracle's MySQL Heat Wave with a who's who of cloud database industry analysts. Holgar Mueller is with Constellation Research. Mark Stammer is the Dragon Slayer and Wikibon contributor. And Ron Westfall is with Fu Chim Research. Gentlemen, welcome back to the Cube. Always a pleasure to have you on. Thanks for having us. Great to be here. >>So we've had a number of of deep dive interviews on the Cube with Nip and Aggarwal. You guys know him? He's a senior vice president of MySQL, Heatwave Development at Oracle. I think you just saw him at Oracle Cloud World and he's come on to describe this is gonna, I'll call it a shock and awe feature additions to to heatwave. You know, the company's clearly putting r and d into the platform and I think at at cloud world we saw like the fifth major release since 2020 when they first announced MySQL heat wave. So just listing a few, they, they got, they taken, brought in analytics machine learning, they got autopilot for machine learning, which is automation onto the basic o l TP functionality of the database. And it's been interesting to watch Oracle's converge database strategy. We've contrasted that amongst ourselves. Love to get your thoughts on Amazon's get the right tool for the right job approach. >>Are they gonna have to change that? You know, Amazon's got the specialized databases, it's just, you know, the both companies are doing well. It just shows there are a lot of ways to, to skin a cat cuz you see some traction in the market in, in both approaches. So today we're gonna focus on the latest heat wave announcements and we're gonna talk about multi-cloud with a native MySQL heat wave implementation, which is available on aws MySQL heat wave for Azure via the Oracle Microsoft interconnect. This kind of cool hybrid action that they got going. Sometimes we call it super cloud. And then we're gonna dive into my SQL Heatwave Lake house, which allows users to process and query data across MyQ databases as heatwave databases, as well as object stores. So, and then we've got, heatwave has been announced on AWS and, and, and Azure, they're available now and Lake House I believe is in beta and I think it's coming out the second half of next year. So again, all of our guests are fresh off of Oracle Cloud world in Las Vegas. So they got the latest scoop. Guys, I'm done talking. Let's get into it. Mark, maybe you could start us off, what's your opinion of my SQL Heatwaves competitive position? When you think about what AWS is doing, you know, Google is, you know, we heard Google Cloud next recently, we heard about all their data innovations. You got, obviously Azure's got a big portfolio, snowflakes doing well in the market. What's your take? >>Well, first let's look at it from the point of view that AWS is the market leader in cloud and cloud services. They own somewhere between 30 to 50% depending on who you read of the market. And then you have Azure as number two and after that it falls off. There's gcp, Google Cloud platform, which is further way down the list and then Oracle and IBM and Alibaba. So when you look at AWS and you and Azure saying, hey, these are the market leaders in the cloud, then you start looking at it and saying, if I am going to provide a service that competes with the service they have, if I can make it available in their cloud, it means that I can be more competitive. And if I'm compelling and compelling means at least twice the performance or functionality or both at half the price, I should be able to gain market share. >>And that's what Oracle's done. They've taken a superior product in my SQL heat wave, which is faster, lower cost does more for a lot less at the end of the day and they make it available to the users of those clouds. You avoid this little thing called egress fees, you avoid the issue of having to migrate from one cloud to another and suddenly you have a very compelling offer. So I look at what Oracle's doing with MyQ and it feels like, I'm gonna use a word term, a flanking maneuver to their competition. They're offering a better service on their platforms. >>All right, so thank you for that. Holger, we've seen this sort of cadence, I sort of referenced it up front a little bit and they sat on MySQL for a decade, then all of a sudden we see this rush of announcements. Why did it take so long? And and more importantly is Oracle, are they developing the right features that cloud database customers are looking for in your view? >>Yeah, great question, but first of all, in your interview you said it's the edit analytics, right? Analytics is kind of like a marketing buzzword. Reports can be analytics, right? The interesting thing, which they did, the first thing they, they, they crossed the chasm between OTP and all up, right? In the same database, right? So major engineering feed very much what customers want and it's all about creating Bellevue for customers, which, which I think is the part why they go into the multi-cloud and why they add these capabilities. And they certainly with the AI capabilities, it's kind of like getting it into an autonomous field, self-driving field now with the lake cost capabilities and meeting customers where they are, like Mark has talked about the e risk costs in the cloud. So that that's a significant advantage, creating value for customers and that's what at the end of the day matters. >>And I believe strongly that long term it's gonna be ones who create better value for customers who will get more of their money From that perspective, why then take them so long? I think it's a great question. I think largely he mentioned the gentleman Nial, it's largely to who leads a product. I used to build products too, so maybe I'm a little fooling myself here, but that made the difference in my view, right? So since he's been charged, he's been building things faster than the rest of the competition, than my SQL space, which in hindsight we thought was a hot and smoking innovation phase. It kind of like was a little self complacent when it comes to the traditional borders of where, where people think, where things are separated between OTP and ola or as an example of adjacent support, right? Structured documents, whereas unstructured documents or databases and all of that has been collapsed and brought together for building a more powerful database for customers. >>So I mean it's certainly, you know, when, when Oracle talks about the competitors, you know, the competitors are in the, I always say they're, if the Oracle talks about you and knows you're doing well, so they talk a lot about aws, talk a little bit about Snowflake, you know, sort of Google, they have partnerships with Azure, but, but in, so I'm presuming that the response in MySQL heatwave was really in, in response to what they were seeing from those big competitors. But then you had Maria DB coming out, you know, the day that that Oracle acquired Sun and, and launching and going after the MySQL base. So it's, I'm, I'm interested and we'll talk about this later and what you guys think AWS and Google and Azure and Snowflake and how they're gonna respond. But, but before I do that, Ron, I want to ask you, you, you, you can get, you know, pretty technical and you've probably seen the benchmarks. >>I know you have Oracle makes a big deal out of it, publishes its benchmarks, makes some transparent on on GI GitHub. Larry Ellison talked about this in his keynote at Cloud World. What are the benchmarks show in general? I mean, when you, when you're new to the market, you gotta have a story like Mark was saying, you gotta be two x you know, the performance at half the cost or you better be or you're not gonna get any market share. So, and, and you know, oftentimes companies don't publish market benchmarks when they're leading. They do it when they, they need to gain share. So what do you make of the benchmarks? Have their, any results that were surprising to you? Have, you know, they been challenged by the competitors. Is it just a bunch of kind of desperate bench marketing to make some noise in the market or you know, are they real? What's your view? >>Well, from my perspective, I think they have the validity. And to your point, I believe that when it comes to competitor responses, that has not really happened. Nobody has like pulled down the information that's on GitHub and said, Oh, here are our price performance results. And they counter oracles. In fact, I think part of the reason why that hasn't happened is that there's the risk if Oracle's coming out and saying, Hey, we can deliver 17 times better query performance using our capabilities versus say, Snowflake when it comes to, you know, the Lakehouse platform and Snowflake turns around and says it's actually only 15 times better during performance, that's not exactly an effective maneuver. And so I think this is really to oracle's credit and I think it's refreshing because these differentiators are significant. We're not talking, you know, like 1.2% differences. We're talking 17 fold differences, we're talking six fold differences depending on, you know, where the spotlight is being shined and so forth. >>And so I think this is actually something that is actually too good to believe initially at first blush. If I'm a cloud database decision maker, I really have to prioritize this. I really would know, pay a lot more attention to this. And that's why I posed the question to Oracle and others like, okay, if these differentiators are so significant, why isn't the needle moving a bit more? And it's for, you know, some of the usual reasons. One is really deep discounting coming from, you know, the other players that's really kind of, you know, marketing 1 0 1, this is something you need to do when there's a real competitive threat to keep, you know, a customer in your own customer base. Plus there is the usual fear and uncertainty about moving from one platform to another. But I think, you know, the traction, the momentum is, is shifting an Oracle's favor. I think we saw that in the Q1 efforts, for example, where Oracle cloud grew 44% and that it generated, you know, 4.8 billion and revenue if I recall correctly. And so, so all these are demonstrating that's Oracle is making, I think many of the right moves, publishing these figures for anybody to look at from their own perspective is something that is, I think, good for the market and I think it's just gonna continue to pay dividends for Oracle down the horizon as you know, competition intens plots. So if I were in, >>Dave, can I, Dave, can I interject something and, and what Ron just said there? Yeah, please go ahead. A couple things here, one discounting, which is a common practice when you have a real threat, as Ron pointed out, isn't going to help much in this situation simply because you can't discount to the point where you improve your performance and the performance is a huge differentiator. You may be able to get your price down, but the problem that most of them have is they don't have an integrated product service. They don't have an integrated O L T P O L A P M L N data lake. Even if you cut out two of them, they don't have any of them integrated. They have multiple services that are required separate integration and that can't be overcome with discounting. And the, they, you have to pay for each one of these. And oh, by the way, as you grow, the discounts go away. So that's a, it's a minor important detail. >>So, so that's a TCO question mark, right? And I know you look at this a lot, if I had that kind of price performance advantage, I would be pounding tco, especially if I need two separate databases to do the job. That one can do, that's gonna be, the TCO numbers are gonna be off the chart or maybe down the chart, which you want. Have you looked at this and how does it compare with, you know, the big cloud guys, for example, >>I've looked at it in depth, in fact, I'm working on another TCO on this arena, but you can find it on Wiki bod in which I compared TCO for MySEQ Heat wave versus Aurora plus Redshift plus ML plus Blue. I've compared it against gcps services, Azure services, Snowflake with other services. And there's just no comparison. The, the TCO differences are huge. More importantly, thefor, the, the TCO per performance is huge. We're talking in some cases multiple orders of magnitude, but at least an order of magnitude difference. So discounting isn't gonna help you much at the end of the day, it's only going to lower your cost a little, but it doesn't improve the automation, it doesn't improve the performance, it doesn't improve the time to insight, it doesn't improve all those things that you want out of a database or multiple databases because you >>Can't discount yourself to a higher value proposition. >>So what about, I wonder ho if you could chime in on the developer angle. You, you followed that, that market. How do these innovations from heatwave, I think you used the term developer velocity. I've heard you used that before. Yeah, I mean, look, Oracle owns Java, okay, so it, it's, you know, most popular, you know, programming language in the world, blah, blah blah. But it does it have the, the minds and hearts of, of developers and does, where does heatwave fit into that equation? >>I think heatwave is gaining quickly mindshare on the developer side, right? It's not the traditional no sequel database which grew up, there's a traditional mistrust of oracles to developers to what was happening to open source when gets acquired. Like in the case of Oracle versus Java and where my sql, right? And, but we know it's not a good competitive strategy to, to bank on Oracle screwing up because it hasn't worked not on Java known my sequel, right? And for developers, it's, once you get to know a technology product and you can do more, it becomes kind of like a Swiss army knife and you can build more use case, you can build more powerful applications. That's super, super important because you don't have to get certified in multiple databases. You, you are fast at getting things done, you achieve fire, develop velocity, and the managers are happy because they don't have to license more things, send you to more trainings, have more risk of something not being delivered, right? >>So it's really the, we see the suite where this best of breed play happening here, which in general was happening before already with Oracle's flagship database. Whereas those Amazon as an example, right? And now the interesting thing is every step away Oracle was always a one database company that can be only one and they're now generally talking about heat web and that two database company with different market spaces, but same value proposition of integrating more things very, very quickly to have a universal database that I call, they call the converge database for all the needs of an enterprise to run certain application use cases. And that's what's attractive to developers. >>It's, it's ironic isn't it? I mean I, you know, the rumor was the TK Thomas Curian left Oracle cuz he wanted to put Oracle database on other clouds and other places. And maybe that was the rift. Maybe there was, I'm sure there was other things, but, but Oracle clearly is now trying to expand its Tam Ron with, with heatwave into aws, into Azure. How do you think Oracle's gonna do, you were at a cloud world, what was the sentiment from customers and the independent analyst? Is this just Oracle trying to screw with the competition, create a little diversion? Or is this, you know, serious business for Oracle? What do you think? >>No, I think it has lakes. I think it's definitely, again, attriting to Oracle's overall ability to differentiate not only my SQL heat wave, but its overall portfolio. And I think the fact that they do have the alliance with the Azure in place, that this is definitely demonstrating their commitment to meeting the multi-cloud needs of its customers as well as what we pointed to in terms of the fact that they're now offering, you know, MySQL capabilities within AWS natively and that it can now perform AWS's own offering. And I think this is all demonstrating that Oracle is, you know, not letting up, they're not resting on its laurels. That's clearly we are living in a multi-cloud world, so why not just make it more easy for customers to be able to use cloud databases according to their own specific, specific needs. And I think, you know, to holder's point, I think that definitely lines with being able to bring on more application developers to leverage these capabilities. >>I think one important announcement that's related to all this was the JSON relational duality capabilities where now it's a lot easier for application developers to use a language that they're very familiar with a JS O and not have to worry about going into relational databases to store their J S O N application coding. So this is, I think an example of the innovation that's enhancing the overall Oracle portfolio and certainly all the work with machine learning is definitely paying dividends as well. And as a result, I see Oracle continue to make these inroads that we pointed to. But I agree with Mark, you know, the short term discounting is just a stall tag. This is not denying the fact that Oracle is being able to not only deliver price performance differentiators that are dramatic, but also meeting a wide range of needs for customers out there that aren't just limited device performance consideration. >>Being able to support multi-cloud according to customer needs. Being able to reach out to the application developer community and address a very specific challenge that has plagued them for many years now. So bring it all together. Yeah, I see this as just enabling Oracles who ring true with customers. That the customers that were there were basically all of them, even though not all of them are going to be saying the same things, they're all basically saying positive feedback. And likewise, I think the analyst community is seeing this. It's always refreshing to be able to talk to customers directly and at Oracle cloud there was a litany of them and so this is just a difference maker as well as being able to talk to strategic partners. The nvidia, I think partnerships also testament to Oracle's ongoing ability to, you know, make the ecosystem more user friendly for the customers out there. >>Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able to be best of breed. That's the kind of surprising thing that I'm hearing about, about heatwave. I want to, I want to talk about Lake House because when I think of Lake House, I think data bricks, and to my knowledge data bricks hasn't been in the sites of Oracle yet. Maybe they're next, but, but Oracle claims that MySQL, heatwave, Lakehouse is a breakthrough in terms of capacity and performance. Mark, what are your thoughts on that? Can you double click on, on Lakehouse Oracle's claims for things like query performance and data loading? What does it mean for the market? Is Oracle really leading in, in the lake house competitive landscape? What are your thoughts? >>Well, but name in the game is what are the problems you're solving for the customer? More importantly, are those problems urgent or important? If they're urgent, customers wanna solve 'em. Now if they're important, they might get around to them. So you look at what they're doing with Lake House or previous to that machine learning or previous to that automation or previous to that O L A with O ltp and they're merging all this capability together. If you look at Snowflake or data bricks, they're tacking one problem. You look at MyQ heat wave, they're tacking multiple problems. So when you say, yeah, their queries are much better against the lake house in combination with other analytics in combination with O ltp and the fact that there are no ETLs. So you're getting all this done in real time. So it's, it's doing the query cross, cross everything in real time. >>You're solving multiple user and developer problems, you're increasing their ability to get insight faster, you're having shorter response times. So yeah, they really are solving urgent problems for customers. And by putting it where the customer lives, this is the brilliance of actually being multicloud. And I know I'm backing up here a second, but by making it work in AWS and Azure where people already live, where they already have applications, what they're saying is, we're bringing it to you. You don't have to come to us to get these, these benefits, this value overall, I think it's a brilliant strategy. I give Nip and Argo wallet a huge, huge kudos for what he's doing there. So yes, what they're doing with the lake house is going to put notice on data bricks and Snowflake and everyone else for that matter. Well >>Those are guys that whole ago you, you and I have talked about this. Those are, those are the guys that are doing sort of the best of breed. You know, they're really focused and they, you know, tend to do well at least out of the gate. Now you got Oracle's converged philosophy, obviously with Oracle database. We've seen that now it's kicking in gear with, with heatwave, you know, this whole thing of sweets versus best of breed. I mean the long term, you know, customers tend to migrate towards suite, but the new shiny toy tends to get the growth. How do you think this is gonna play out in cloud database? >>Well, it's the forever never ending story, right? And in software right suite, whereas best of breed and so far in the long run suites have always won, right? So, and sometimes they struggle again because the inherent problem of sweets is you build something larger, it has more complexity and that means your cycles to get everything working together to integrate the test that roll it out, certify whatever it is, takes you longer, right? And that's not the case. It's a fascinating part of what the effort around my SQL heat wave is that the team is out executing the previous best of breed data, bringing us something together. Now if they can maintain that pace, that's something to to, to be seen. But it, the strategy, like what Mark was saying, bring the software to the data is of course interesting and unique and totally an Oracle issue in the past, right? >>Yeah. But it had to be in your database on oci. And but at, that's an interesting part. The interesting thing on the Lake health side is, right, there's three key benefits of a lakehouse. The first one is better reporting analytics, bring more rich information together, like make the, the, the case for silicon angle, right? We want to see engagements for this video, we want to know what's happening. That's a mixed transactional video media use case, right? Typical Lakehouse use case. The next one is to build more rich applications, transactional applications which have video and these elements in there, which are the engaging one. And the third one, and that's where I'm a little critical and concerned, is it's really the base platform for artificial intelligence, right? To run deep learning to run things automatically because they have all the data in one place can create in one way. >>And that's where Oracle, I know that Ron talked about Invidia for a moment, but that's where Oracle doesn't have the strongest best story. Nonetheless, the two other main use cases of the lake house are very strong, very well only concern is four 50 terabyte sounds long. It's an arbitrary limitation. Yeah, sounds as big. So for the start, and it's the first word, they can make that bigger. You don't want your lake house to be limited and the terabyte sizes or any even petabyte size because you want to have the certainty. I can put everything in there that I think it might be relevant without knowing what questions to ask and query those questions. >>Yeah. And you know, in the early days of no schema on right, it just became a mess. But now technology has evolved to allow us to actually get more value out of that data. Data lake. Data swamp is, you know, not much more, more, more, more logical. But, and I want to get in, in a moment, I want to come back to how you think the competitors are gonna respond. Are they gonna have to sort of do a more of a converged approach? AWS in particular? But before I do, Ron, I want to ask you a question about autopilot because I heard Larry Ellison's keynote and he was talking about how, you know, most security issues are human errors with autonomy and autonomous database and things like autopilot. We take care of that. It's like autonomous vehicles, they're gonna be safer. And I went, well maybe, maybe someday. So Oracle really tries to emphasize this, that every time you see an announcement from Oracle, they talk about new, you know, autonomous capabilities. It, how legit is it? Do people care? What about, you know, what's new for heatwave Lakehouse? How much of a differentiator, Ron, do you really think autopilot is in this cloud database space? >>Yeah, I think it will definitely enhance the overall proposition. I don't think people are gonna buy, you know, lake house exclusively cause of autopilot capabilities, but when they look at the overall picture, I think it will be an added capability bonus to Oracle's benefit. And yeah, I think it's kind of one of these age old questions, how much do you automate and what is the bounce to strike? And I think we all understand with the automatic car, autonomous car analogy that there are limitations to being able to use that. However, I think it's a tool that basically every organization out there needs to at least have or at least evaluate because it goes to the point of it helps with ease of use, it helps make automation more balanced in terms of, you know, being able to test, all right, let's automate this process and see if it works well, then we can go on and switch on on autopilot for other processes. >>And then, you know, that allows, for example, the specialists to spend more time on business use cases versus, you know, manual maintenance of, of the cloud database and so forth. So I think that actually is a, a legitimate value proposition. I think it's just gonna be a case by case basis. Some organizations are gonna be more aggressive with putting automation throughout their processes throughout their organization. Others are gonna be more cautious. But it's gonna be, again, something that will help the overall Oracle proposition. And something that I think will be used with caution by many organizations, but other organizations are gonna like, hey, great, this is something that is really answering a real problem. And that is just easing the use of these databases, but also being able to better handle the automation capabilities and benefits that come with it without having, you know, a major screwup happened and the process of transitioning to more automated capabilities. >>Now, I didn't attend cloud world, it's just too many red eyes, you know, recently, so I passed. But one of the things I like to do at those events is talk to customers, you know, in the spirit of the truth, you know, they, you know, you'd have the hallway, you know, track and to talk to customers and they say, Hey, you know, here's the good, the bad and the ugly. So did you guys, did you talk to any customers my SQL Heatwave customers at, at cloud world? And and what did you learn? I don't know, Mark, did you, did you have any luck and, and having some, some private conversations? >>Yeah, I had quite a few private conversations. The one thing before I get to that, I want disagree with one point Ron made, I do believe there are customers out there buying the heat wave service, the MySEQ heat wave server service because of autopilot. Because autopilot is really revolutionary in many ways in the sense for the MySEQ developer in that it, it auto provisions, it auto parallel loads, IT auto data places it auto shape predictions. It can tell you what machine learning models are going to tell you, gonna give you your best results. And, and candidly, I've yet to meet a DBA who didn't wanna give up pedantic tasks that are pain in the kahoo, which they'd rather not do and if it's long as it was done right for them. So yes, I do think people are buying it because of autopilot and that's based on some of the conversations I had with customers at Oracle Cloud World. >>In fact, it was like, yeah, that's great, yeah, we get fantastic performance, but this really makes my life easier and I've yet to meet a DBA who didn't want to make their life easier. And it does. So yeah, I've talked to a few of them. They were excited. I asked them if they ran into any bugs, were there any difficulties in moving to it? And the answer was no. In both cases, it's interesting to note, my sequel is the most popular database on the planet. Well, some will argue that it's neck and neck with SQL Server, but if you add in Mariah DB and ProCon db, which are forks of MySQL, then yeah, by far and away it's the most popular. And as a result of that, everybody for the most part has typically a my sequel database somewhere in their organization. So this is a brilliant situation for anybody going after MyQ, but especially for heat wave. And the customers I talk to love it. I didn't find anybody complaining about it. And >>What about the migration? We talked about TCO earlier. Did your t does your TCO analysis include the migration cost or do you kind of conveniently leave that out or what? >>Well, when you look at migration costs, there are different kinds of migration costs. By the way, the worst job in the data center is the data migration manager. Forget it, no other job is as bad as that one. You get no attaboys for doing it. Right? And then when you screw up, oh boy. So in real terms, anything that can limit data migration is a good thing. And when you look at Data Lake, that limits data migration. So if you're already a MySEQ user, this is a pure MySQL as far as you're concerned. It's just a, a simple transition from one to the other. You may wanna make sure nothing broke and every you, all your tables are correct and your schema's, okay, but it's all the same. So it's a simple migration. So it's pretty much a non-event, right? When you migrate data from an O LTP to an O L A P, that's an ETL and that's gonna take time. >>But you don't have to do that with my SQL heat wave. So that's gone when you start talking about machine learning, again, you may have an etl, you may not, depending on the circumstances, but again, with my SQL heat wave, you don't, and you don't have duplicate storage, you don't have to copy it from one storage container to another to be able to be used in a different database, which by the way, ultimately adds much more cost than just the other service. So yeah, I looked at the migration and again, the users I talked to said it was a non-event. It was literally moving from one physical machine to another. If they had a new version of MySEQ running on something else and just wanted to migrate it over or just hook it up or just connect it to the data, it worked just fine. >>Okay, so every day it sounds like you guys feel, and we've certainly heard this, my colleague David Foyer, the semi-retired David Foyer was always very high on heatwave. So I think you knows got some real legitimacy here coming from a standing start, but I wanna talk about the competition, how they're likely to respond. I mean, if your AWS and you got heatwave is now in your cloud, so there's some good aspects of that. The database guys might not like that, but the infrastructure guys probably love it. Hey, more ways to sell, you know, EC two and graviton, but you're gonna, the database guys in AWS are gonna respond. They're gonna say, Hey, we got Redshift, we got aqua. What's your thoughts on, on not only how that's gonna resonate with customers, but I'm interested in what you guys think will a, I never say never about aws, you know, and are they gonna try to build, in your view a converged Oola and o LTP database? You know, Snowflake is taking an ecosystem approach. They've added in transactional capabilities to the portfolio so they're not standing still. What do you guys see in the competitive landscape in that regard going forward? Maybe Holger, you could start us off and anybody else who wants to can chime in, >>Happy to, you mentioned Snowflake last, we'll start there. I think Snowflake is imitating that strategy, right? That building out original data warehouse and the clouds tasking project to really proposition to have other data available there because AI is relevant for everybody. Ultimately people keep data in the cloud for ultimately running ai. So you see the same suite kind of like level strategy, it's gonna be a little harder because of the original positioning. How much would people know that you're doing other stuff? And I just, as a former developer manager of developers, I just don't see the speed at the moment happening at Snowflake to become really competitive to Oracle. On the flip side, putting my Oracle hat on for a moment back to you, Mark and Iran, right? What could Oracle still add? Because the, the big big things, right? The traditional chasms in the database world, they have built everything, right? >>So I, I really scratched my hat and gave Nipon a hard time at Cloud world say like, what could you be building? Destiny was very conservative. Let's get the Lakehouse thing done, it's gonna spring next year, right? And the AWS is really hard because AWS value proposition is these small innovation teams, right? That they build two pizza teams, which can be fit by two pizzas, not large teams, right? And you need suites to large teams to build these suites with lots of functionalities to make sure they work together. They're consistent, they have the same UX on the administration side, they can consume the same way, they have the same API registry, can't even stop going where the synergy comes to play over suite. So, so it's gonna be really, really hard for them to change that. But AWS super pragmatic. They're always by themselves that they'll listen to customers if they learn from customers suite as a proposition. I would not be surprised if AWS trying to bring things closer together, being morely together. >>Yeah. Well how about, can we talk about multicloud if, if, again, Oracle is very on on Oracle as you said before, but let's look forward, you know, half a year or a year. What do you think about Oracle's moves in, in multicloud in terms of what kind of penetration they're gonna have in the marketplace? You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at the, the Microsoft Azure deal. I think that's really interesting. I've, I've called it a little bit of early days of a super cloud. What impact do you think this is gonna have on, on the marketplace? But, but both. And think about it within Oracle's customer base, I have no doubt they'll do great there. But what about beyond its existing install base? What do you guys think? >>Ryan, do you wanna jump on that? Go ahead. Go ahead Ryan. No, no, no, >>That's an excellent point. I think it aligns with what we've been talking about in terms of Lakehouse. I think Lake House will enable Oracle to pull more customers, more bicycle customers onto the Oracle platforms. And I think we're seeing all the signs pointing toward Oracle being able to make more inroads into the overall market. And that includes garnishing customers from the leaders in, in other words, because they are, you know, coming in as a innovator, a an alternative to, you know, the AWS proposition, the Google cloud proposition that they have less to lose and there's a result they can really drive the multi-cloud messaging to resonate with not only their existing customers, but also to be able to, to that question, Dave's posing actually garnish customers onto their platform. And, and that includes naturally my sequel but also OCI and so forth. So that's how I'm seeing this playing out. I think, you know, again, Oracle's reporting is indicating that, and I think what we saw, Oracle Cloud world is definitely validating the idea that Oracle can make more waves in the overall market in this regard. >>You know, I, I've floated this idea of Super cloud, it's kind of tongue in cheek, but, but there, I think there is some merit to it in terms of building on top of hyperscale infrastructure and abstracting some of the, that complexity. And one of the things that I'm most interested in is industry clouds and an Oracle acquisition of Cerner. I was struck by Larry Ellison's keynote, it was like, I don't know, an hour and a half and an hour and 15 minutes was focused on healthcare transformation. Well, >>So vertical, >>Right? And so, yeah, so you got Oracle's, you know, got some industry chops and you, and then you think about what they're building with, with not only oci, but then you got, you know, MyQ, you can now run in dedicated regions. You got ADB on on Exadata cloud to customer, you can put that OnPrem in in your data center and you look at what the other hyperscalers are, are doing. I I say other hyperscalers, I've always said Oracle's not really a hyperscaler, but they got a cloud so they're in the game. But you can't get, you know, big query OnPrem, you look at outposts, it's very limited in terms of, you know, the database support and again, that that will will evolve. But now you got Oracle's got, they announced Alloy, we can white label their cloud. So I'm interested in what you guys think about these moves, especially the industry cloud. We see, you know, Walmart is doing sort of their own cloud. You got Goldman Sachs doing a cloud. Do you, you guys, what do you think about that and what role does Oracle play? Any thoughts? >>Yeah, let me lemme jump on that for a moment. Now, especially with the MyQ, by making that available in multiple clouds, what they're doing is this follows the philosophy they've had the past with doing cloud, a customer taking the application and the data and putting it where the customer lives. If it's on premise, it's on premise. If it's in the cloud, it's in the cloud. By making the mice equal heat wave, essentially a plug compatible with any other mice equal as far as your, your database is concern and then giving you that integration with O L A P and ML and Data Lake and everything else, then what you've got is a compelling offering. You're making it easier for the customer to use. So I look the difference between MyQ and the Oracle database, MyQ is going to capture market more market share for them. >>You're not gonna find a lot of new users for the Oracle debate database. Yeah, there are always gonna be new users, don't get me wrong, but it's not gonna be a huge growth. Whereas my SQL heatwave is probably gonna be a major growth engine for Oracle going forward. Not just in their own cloud, but in AWS and in Azure and on premise over time that eventually it'll get there. It's not there now, but it will, they're doing the right thing on that basis. They're taking the services and when you talk about multicloud and making them available where the customer wants them, not forcing them to go where you want them, if that makes sense. And as far as where they're going in the future, I think they're gonna take a page outta what they've done with the Oracle database. They'll add things like JSON and XML and time series and spatial over time they'll make it a, a complete converged database like they did with the Oracle database. The difference being Oracle database will scale bigger and will have more transactions and be somewhat faster. And my SQL will be, for anyone who's not on the Oracle database, they're, they're not stupid, that's for sure. >>They've done Jason already. Right. But I give you that they could add graph and time series, right. Since eat with, Right, Right. Yeah, that's something absolutely right. That's, that's >>A sort of a logical move, right? >>Right. But that's, that's some kid ourselves, right? I mean has worked in Oracle's favor, right? 10 x 20 x, the amount of r and d, which is in the MyQ space, has been poured at trying to snatch workloads away from Oracle by starting with IBM 30 years ago, 20 years ago, Microsoft and, and, and, and didn't work, right? Database applications are extremely sticky when they run, you don't want to touch SIM and grow them, right? So that doesn't mean that heat phase is not an attractive offering, but it will be net new things, right? And what works in my SQL heat wave heat phases favor a little bit is it's not the massive enterprise applications which have like we the nails like, like you might be only running 30% or Oracle, but the connections and the interfaces into that is, is like 70, 80% of your enterprise. >>You take it out and it's like the spaghetti ball where you say, ah, no I really don't, don't want to do all that. Right? You don't, don't have that massive part with the equals heat phase sequel kind of like database which are more smaller tactical in comparison, but still I, I don't see them taking so much share. They will be growing because of a attractive value proposition quickly on the, the multi-cloud, right? I think it's not really multi-cloud. If you give people the chance to run your offering on different clouds, right? You can run it there. The multi-cloud advantages when the Uber offering comes out, which allows you to do things across those installations, right? I can migrate data, I can create data across something like Google has done with B query Omni, I can run predictive models or even make iron models in different place and distribute them, right? And Oracle is paving the road for that, but being available on these clouds. But the multi-cloud capability of database which knows I'm running on different clouds that is still yet to be built there. >>Yeah. And >>That the problem with >>That, that's the super cloud concept that I flowed and I I've always said kinda snowflake with a single global instance is sort of, you know, headed in that direction and maybe has a league. What's the issue with that mark? >>Yeah, the problem with the, with that version, the multi-cloud is clouds to charge egress fees. As long as they charge egress fees to move data between clouds, it's gonna make it very difficult to do a real multi-cloud implementation. Even Snowflake, which runs multi-cloud, has to pass out on the egress fees of their customer when data moves between clouds. And that's really expensive. I mean there, there is one customer I talked to who is beta testing for them, the MySQL heatwave and aws. The only reason they didn't want to do that until it was running on AWS is the egress fees were so great to move it to OCI that they couldn't afford it. Yeah. Egress fees are the big issue but, >>But Mark the, the point might be you might wanna root query and only get the results set back, right was much more tinier, which been the answer before for low latency between the class A problem, which we sometimes still have but mostly don't have. Right? And I think in general this with fees coming down based on the Oracle general E with fee move and it's very hard to justify those, right? But, but it's, it's not about moving data as a multi-cloud high value use case. It's about doing intelligent things with that data, right? Putting into other places, replicating it, what I'm saying the same thing what you said before, running remote queries on that, analyzing it, running AI on it, running AI models on that. That's the interesting thing. Cross administered in the same way. Taking things out, making sure compliance happens. Making sure when Ron says I don't want to be American anymore, I want to be in the European cloud that is gets migrated, right? So tho those are the interesting value use case which are really, really hard for enterprise to program hand by hand by developers and they would love to have out of the box and that's yet the innovation to come to, we have to come to see. But the first step to get there is that your software runs in multiple clouds and that's what Oracle's doing so well with my SQL >>Guys. Amazing. >>Go ahead. Yeah. >>Yeah. >>For example, >>Amazing amount of data knowledge and, and brain power in this market. Guys, I really want to thank you for coming on to the cube. Ron Holger. Mark, always a pleasure to have you on. Really appreciate your time. >>Well all the last names we're very happy for Romanic last and moderator. Thanks Dave for moderating us. All right, >>We'll see. We'll see you guys around. Safe travels to all and thank you for watching this power panel, The Truth About My SQL Heat Wave on the cube. Your leader in enterprise and emerging tech coverage.

Published Date : Nov 1 2022

SUMMARY :

Always a pleasure to have you on. I think you just saw him at Oracle Cloud World and he's come on to describe this is doing, you know, Google is, you know, we heard Google Cloud next recently, They own somewhere between 30 to 50% depending on who you read migrate from one cloud to another and suddenly you have a very compelling offer. All right, so thank you for that. And they certainly with the AI capabilities, And I believe strongly that long term it's gonna be ones who create better value for So I mean it's certainly, you know, when, when Oracle talks about the competitors, So what do you make of the benchmarks? say, Snowflake when it comes to, you know, the Lakehouse platform and threat to keep, you know, a customer in your own customer base. And oh, by the way, as you grow, And I know you look at this a lot, to insight, it doesn't improve all those things that you want out of a database or multiple databases So what about, I wonder ho if you could chime in on the developer angle. they don't have to license more things, send you to more trainings, have more risk of something not being delivered, all the needs of an enterprise to run certain application use cases. I mean I, you know, the rumor was the TK Thomas Curian left Oracle And I think, you know, to holder's point, I think that definitely lines But I agree with Mark, you know, the short term discounting is just a stall tag. testament to Oracle's ongoing ability to, you know, make the ecosystem Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able So when you say, yeah, their queries are much better against the lake house in You don't have to come to us to get these, these benefits, I mean the long term, you know, customers tend to migrate towards suite, but the new shiny bring the software to the data is of course interesting and unique and totally an Oracle issue in And the third one, lake house to be limited and the terabyte sizes or any even petabyte size because you want keynote and he was talking about how, you know, most security issues are human I don't think people are gonna buy, you know, lake house exclusively cause of And then, you know, that allows, for example, the specialists to And and what did you learn? The one thing before I get to that, I want disagree with And the customers I talk to love it. the migration cost or do you kind of conveniently leave that out or what? And when you look at Data Lake, that limits data migration. So that's gone when you start talking about So I think you knows got some real legitimacy here coming from a standing start, So you see the same And you need suites to large teams to build these suites with lots of functionalities You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at Ryan, do you wanna jump on that? I think, you know, again, Oracle's reporting I think there is some merit to it in terms of building on top of hyperscale infrastructure and to customer, you can put that OnPrem in in your data center and you look at what the So I look the difference between MyQ and the Oracle database, MyQ is going to capture market They're taking the services and when you talk about multicloud and But I give you that they could add graph and time series, right. like, like you might be only running 30% or Oracle, but the connections and the interfaces into You take it out and it's like the spaghetti ball where you say, ah, no I really don't, global instance is sort of, you know, headed in that direction and maybe has a league. Yeah, the problem with the, with that version, the multi-cloud is clouds And I think in general this with fees coming down based on the Oracle general E with fee move Yeah. Guys, I really want to thank you for coming on to the cube. Well all the last names we're very happy for Romanic last and moderator. We'll see you guys around.

ENTITIES

Entity	Category	Confidence
Mark	PERSON	0.99+
Ron Holger	PERSON	0.99+
Ron	PERSON	0.99+
Mark Stammer	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Ron Westfall	PERSON	0.99+
Ryan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Holgar Mueller	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Constellation Research	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
17 times	QUANTITY	0.99+
two	QUANTITY	0.99+
David Foyer	PERSON	0.99+
44%	QUANTITY	0.99+
1.2%	QUANTITY	0.99+
4.8 billion	QUANTITY	0.99+
Jason	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Fu Chim Research	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+

Ed Walsh, ChaosSearch | AWS re:Inforce 2022

(upbeat music) >> Welcome back to Boston, everybody. This is the birthplace of theCUBE. In 2010, May of 2010 at EMC World, right in this very venue, John Furrier called it the chowder and lobster post. I'm Dave Vellante. We're here at RE:INFORCE 2022, Ed Walsh, CEO of ChaosSearch. Doing a drive by Ed. Thanks so much for stopping in. You're going to help me wrap up in our final editorial segment. >> Looking forward to it. >> I really appreciate it. >> Thank you for including me. >> How about that? 2010. >> That's amazing. It was really in this-- >> Really in this building. Yeah, we had to sort of bury our way in, tunnel our way into the Blogger Lounge. We did four days. >> Weekends, yeah. >> It was epic. It was really epic. But I'm glad they're back in Boston. AWS was going to do June in Houston. >> Okay. >> Which would've been awful. >> Yeah, yeah. No, this is perfect. >> Yeah. Thank God they came back. You saw Boston in summer is great. I know it's been hot, And of course you and I are from this area. >> Yeah. >> So how you been? What's going on? I mean, it's a little crazy out there. The stock market's going crazy. >> Sure. >> Having the tech lash, what are you seeing? >> So it's an interesting time. So I ran a company in 2008. So we've been through this before. By the way, the world's not ending, we'll get through this. But it is an interesting conversation as an investor, but also even the customers. There's some hesitation but you have to basically have the right value prop, otherwise things are going to get sold. So we are seeing longer sales cycles. But it's nothing that you can't overcome. But it has to be something not nice to have, has to be a need to have. But I think we all get through it. And then there is some, on the VC side, it's now buckle down, let's figure out what to do which is always a challenge for startup plans. >> In pre 2000 you, maybe you weren't a CEO but you were definitely an executive. And so now it's different and a lot of younger people haven't seen this. You've got interest rates now rising. Okay, we've seen that before but it looks like you've got inflation, you got interest rates rising. >> Yep. >> The consumer spending patterns are changing. You had 6$, $7 gas at one point. So you have these weird crosscurrents, >> Yup. >> And people are thinking, "Okay post-September now, maybe because of the recession, the Fed won't have to keep raising interest rates and tightening. But I don't know what to root for. It's like half full, half empty. (Ed laughing) >> But we haven't been in an environment with high inflation. At least not in my career. >> Right. Right. >> I mean, I got into 92, like that was long gone, right?. >> Yeah. >> So it is a interesting regime change that we're going to have to deal with, but there's a lot of analogies between 2008 and now that you still have to work through too, right?. So, anyway, I don't think the world's ending. I do think you have to run a tight shop. So I think the grow all costs is gone. I do think discipline's back in which, for most of us, discipline never left, right?. So, to me that's the name of the game. >> What do you tell just generally, I mean you've been the CEO of a lot of private companies. And of course one of the things that you do to retain people and attract people is you give 'em stock and it's great and everybody's excited. >> Yeah. >> I'm sure they're excited cause you guys are a rocket ship. But so what's the message now that, Okay the market's down, valuations are down, the trees don't grow to the moon, we all know that. But what are you telling your people? What's their reaction? How do you keep 'em motivated? >> So like anything, you want over communicate during these times. So I actually over communicate, you get all these you know, the Sequoia decks, 2008 and the recent... >> (chuckles) Rest in peace good times, that one right? >> I literally share it. Why? It's like, Hey, this is what's going on in the real world. It's going to affect us. It has almost nothing to do with us specifically, but it will affect us. Now we can't not pay attention to it. It does change how you're going to raise money, so you got to make sure you have the right runway to be there. So it does change what you do, but I think you over communicate. So that's what I've been doing and I think it's more like a student of the game, so I try to share it, and I say some appreciate it others, I'm just saying, this is normal, we'll get through this and this is what happened in 2008 and trust me, once the market hits bottom, give it another month afterwards. Then everyone says, oh, the bottom's in and we're back to business. Valuations don't go immediately back up, but right now, no one knows where the bottom is and that's where kind of the world's ending type of things. >> Well, it's interesting because you talked about, I said rest in peace good times >> Yeah >> that was the Sequoia deck, and the message was tighten up. Okay, and I'm not saying you shouldn't tighten up now, but the difference is, there was this period of two years of easy money and even before that, it was pretty easy money. >> Yeah. >> And so companies are well capitalized, they have runway so it's like, okay, I was talking to Frank Slootman about this now of course there are public companies, like we're not taking the foot off the gas. We're inherently profitable, >> Yeah. >> we're growing like crazy, we're going for it. You know? So that's a little bit of a different dynamic. There's a lot of good runway out there, isn't there? >> But also you look at the different companies that were either born or were able to power through those environments are actually better off. You come out stronger in a more dominant position. So Frank, listen, if you see what Frank's done, it's been unbelievable to watch his career, right?. In fact, he was at Data Domain, I was Avamar so, but look at what he's done since, he's crushed it. Right? >> Yeah. >> So for him to say, Hey, I'm going to literally hit the gas and keep going. I think that's the right thing for Snowflake and a right thing for a lot of people. But for people in different roles, I literally say that you have to take it seriously. What you can't be is, well, Frank's in a different situation. What is it...? How many billion does he have in the bank? So it's... >> He's over a billion, you know, over a billion. Well, you're on your way Ed. >> No, no, no, it's good. (Dave chuckles) Okay, I want to ask you about this concept that we've sort of we coined this term called Supercloud. >> Sure. >> You could think of it as the next generation of multi-cloud. The basic premises that multi-cloud was largely a symptom of multi-vendor. Okay. I've done some M&A, I've got some Shadow IT, spinning up, you know, Shadow clouds, projects. But it really wasn't a strategy to have a continuum across clouds. And now we're starting to see ecosystems really build, you know, you've used the term before, standing on the shoulders of giants, you've used that a lot. >> Yep. >> And so we're seeing that. Jerry Chen wrote a seminal piece on Castles in The Cloud, so we coined this term SuperCloud to connote this abstraction layer that hides the underlying complexities and primitives of the individual clouds and then adds value on top of it and can adjudicate and manage, irrespective of physical location, Supercloud. >> Yeah. >> Okay. What do you think about that concept?. How does it maybe relate to some of the things that you're seeing in the industry? >> So, standing on shoulders of giants, right? So I always like to do hard tech either at big company, small companies. So we're probably your definition of a Supercloud. We had a big vision, how to literally solve the core challenge of analytics at scale. How are you going to do that? You're not going to build on your own. So literally we're leveraging the primitives, everything you can get out of the Amazon cloud, everything get out of Google cloud. In fact, we're even looking at what it can get out of this Snowflake cloud, and how do we abstract that out, add value to it? That's where all our patents are. But it becomes a simplified approach. The customers don't care. Well, they care where their data is. But they don't care how you got there, they just want to know the end result. So you simplify, but you gain the advantages. One thing's interesting is, in this particular company, ChaosSearch, people try to always say, at some point the sales cycle they say, no way, hold on, no way that can be fast no way, or whatever the different issue. And initially we used to try to explain our technology, and I would say 60% was explaining the public, cloud capabilities and then how we, harvest those I guess, make them better add value on top and what you're able to get is something you couldn't get from the public clouds themselves and then how we did that across public clouds and then extracted it. So if you think about that like, it's the Shoulders of giants. But what we now do, literally to avoid that conversation because it became a lengthy conversation. So, how do you have a platform for analytics that you can't possibly overwhelm for ingest. All your messy data, no pipelines. Well, you leverage things like S3 and EC2, and you do the different security things. You can go to environments say, you can't possibly overrun me, I could not say that. If I didn't literally build on the shoulders giants of all these public clouds. But the value. So if you're going to do hard tech as a startup, you're going to build, you're going to be the principles of Supercloud. Maybe they're not the same size of Supercloud just looking at Snowflake, but basically, you're going to leverage all that, you abstract it out and that's where you're able to have a lot of values at that. >> So let me ask you, so I don't know if there's a strict definition of Supercloud, We sort of put it out to the community and said, help us define it. So you got to span multiple clouds. It's not just running in each cloud. There's a metadata layer that kind of understands where you're pulling data from. Like you said you can pull data from Snowflake, it sounds like we're not running on Snowflake, correct? >> No, complimentary to them in their different customers. >> Yeah. Okay. >> They want to build on top of a data platform, data apps. >> Right. And of course they're going cross cloud. >> Right. >> Is there a PaaS layer in there? We've said there's probably a Super PaaS layer. You're probably not doing that, but you're allowing people to bring their own, bring your own PaaS sort of thing maybe. >> So we're a little bit different but basically we publish open APIs. We don't have a user interface. We say, keep the user interface. Again, we're solving the challenge of analytics at scale, we're not trying to retrain your analytics, either analysts or your DevOps or your SOV or your Secop team. They use the tools they already use. Elastic search APIs, SQL APIs. So really they program, they build applications on top of us, Equifax is a good example. Case said it coming out later on this week, after 18 months in production but, basically they're building, we provide the abstraction layer, the quote, I'm going to kill it, Jeff Tincher, who owns all of SREs worldwide, said to the effect of, Hey I'm able to rethink what I do for my data pipelines. But then he also talked about how, that he really doesn't have to worry about the data he puts in it. We deal with that. And he just has to, just query on the other side. That simplicity. We couldn't have done that without that. So anyway, what I like about the definition is, if you were going to do something harder in the world, why would you try to rebuild what Amazon, Google and Azure or Snowflake did? You're going to add things on top. We can still do intellectual property. We're still doing patents. So five grand patents all in this. But literally the abstraction layer is the simplification. The end users do not want to know that complexity, even though they ask the questions. >> And I think too, the other attribute is it's ecosystem enablement. Whereas I think, >> Absolutely >> in general, in the Multicloud 1.0 era, the ecosystem wasn't thinking about, okay, how do I build on top and abstract that. So maybe it is Multicloud 2.0, We chose to use Supercloud. So I'm wondering, we're at the security conference, >> RE: INFORCE is there a security Supercloud? Maybe Snyk has the developer Supercloud or maybe Okta has the identity Supercloud. I think CrowdStrike maybe not. Cause CrowdStrike competes with Microsoft. So maybe, because Microsoft, what's interesting, Merritt Bear was just saying, look, we don't show up in the spending data for security because we're not charging for most of our security. We're not trying to make a big business. So that's kind of interesting, but is there a potential for the security Supercloud? >> So, I think so. But also, I'll give you one thing I talked to, just today, at least three different conversations where everyone wants to log data. It's a little bit specific to us, but basically they want to do the security data lake. The idea of, and Snowflake talks about this too. But the idea of putting all the data in one repository and then how do you abstract out and get value from it? Maybe not the perfect, but it becomes simple to do but hard to get value out. So the different players are going to do that. That's what we do. We're able to, once you land it in your S3 or it doesn't matter, cloud of choice, simple storage, we allow you to get after that data, but we take the primitives and hide them from you. And all you do is query the data and we're spinning up stateless computer to go after it. So then if I look around the floor. There's going to be a bunch of these players. I don't think, why would someone in this floor try to recreate what Amazon or Google or Azure had. They're going to build on top of it. And now the key thing is, do you leave it in standard? And now we're open APIs. People are building on top of my open APIs or do you try to put 'em in a walled garden? And they're in, now your Supercloud. Our belief is, part of it is, it needs to be open access and let you go after it. >> Well. And build your applications on top of it openly. >> They come back to snowflake. That's what Snowflake's doing. And they're basically saying, Hey come into our proprietary environment. And the benefit is, and I think both can win. There's a big market. >> I agree. But I think the benefit of Snowflake's is, okay, we're going to have federated governance, we're going to have data sharing, you're going to have access to all the ecosystem players. >> Yep. >> And as everything's going to be controlled and you know what you're getting. The flip side of that is, Databricks is the other end >> Yeah. >> of that spectrum, which is no, no, you got to be open. >> Yeah. >> So what's going to happen, well what's happening clearly, is Snowflake's saying, okay we've got Snowpark. we're going to allow Python, we're going to have an Apache Iceberg. We're going to have open source tooling that you can access. By the way, it's not going to be as good as our waled garden where the flip side of that is you get Databricks coming at it from a data science and data engineering perspective. And there's a lot of gaps in between, aren't there? >> And I think they both win. Like for instance, so we didn't do Snowpark integration. But we work with people building data apps on top of Snowflake or data bricks. And what we do is, we can add value to that, or what we've done, again, using all the Supercloud stuff we're done. But we deal with the unstructured data, the four V's coming at you. You can't pipeline that to save. So we actually could be additive. As they're trying to do like a security data cloud inside of Snowflake or do the same thing in Databricks. That's where we can play. Now, we play with them at the application level that they get some data from them and some data for us. But I believe there's a partnership there that will do it inside their environment. To us they're just another large scaler environment that my customers want to get after data. And they want me to abstract it out and give value. >> So it's another repository to you. >> Yeah. >> Okay. So I think Snowflake recently added support for unstructured data. You chose not to do Snowpark because why? >> Well, so the way they're doing the unstructured data is not bad. It's JSON data. Basically, This is the dilemma. Everyone wants their application developers to be flexible, move fast, securely but just productivity. So you get, give 'em flexibility. The problem with that is analytics on the end want to be structured to be performant. And this is where Snowflake, they have to somehow get that raw data. And it's changing every day because you just let the developers do what they want now, in some structured base, but do what you need to do your business fast and securely. So it completely destroys. So they have large customers trying to do big integrations for this messy data. And it doesn't quite work, cause you literally just can't make the pipelines work. So that's where we're complimentary do it. So now, the particular integration wasn't, we need a little bit deeper integration to do that. So we're integrating, actually, at the data app layer. But we could, see us and I don't, listen. I think Snowflake's a good actor. They're trying to figure out what's best for the customers. And I think we just participate in that. >> Yeah. And I think they're trying to figure out >> Yeah. >> how to grow their ecosystem. Because they know they can't do it all, in fact, >> And we solve the key thing, they just can't do certain things. And we do that well. Yeah, I have SQL but that's where it ends. >> Yeah. >> I do the messy data and how to play with them. >> And when you talk to one of their founders, anyway, Benoit, he comes on the cube and he's like, we start with simple. >> Yeah. >> It reminds me of the guy's some Pure Storage, that guy Coz, he's always like, no, if it starts to get too complicated. So that's why they said all right, we're not going to start out trying to figure out how to do complex joins and workload management. And they turn that into a feature. So like you say, I think both can win. It's a big market. >> I think it's a good model. And I love to see Frank, you know, move. >> Yeah. I forgot So you AVMAR... >> In the day. >> You guys used to hate each other, right? >> No, no, no >> No. I mean, it's all good. >> But the thing is, look what he's done. Like I wouldn't bet against Frank. I think it's a good message. You can see clients trying to do it. Same thing with Databricks, same thing with BigQuery. We get a lot of same dynamic in BigQuery. It's good for a lot of things, but it's not everything you need to do. And there's ways for the ecosystem to play together. >> Well, what's interesting about BigQuery is, it is truly cloud native, as is Snowflake. You know, whereas Amazon Redshift was sort of Parexel, it's cobbled together now. It's great engineering, but BigQuery gets a lot of high marks. But again, there's limitations to everything. That's why companies like yours can exist. >> And that's why.. so back to the Supercloud. It allows me as a company to participate in that because I'm leveraging all the underlying pieces. Which we couldn't be doing what we're doing now, without leveraging the Supercloud concepts right, so... >> Ed, I really appreciate you coming by, help me wrap up today in RE:INFORCE. Always a pleasure seeing you, my friend. >> Thank you. >> All right. Okay, this is a wrap on day one. We'll be back tomorrow. I'll be solo. John Furrier had to fly out but we'll be following what he's doing. This is RE:INFORCE 2022. You're watching theCUBE. I'll see you tomorrow.

Published Date : Jul 26 2022

SUMMARY :

John Furrier called it the How about that? It was really in this-- Yeah, we had to sort of bury our way in, But I'm glad they're back in Boston. No, this is perfect. And of course you and So how you been? But it's nothing that you can't overcome. but you were definitely an executive. So you have these weird crosscurrents, because of the recession, But we haven't been in an environment Right. that was long gone, right?. I do think you have to run a tight shop. the things that you do But what are you telling your people? 2008 and the recent... So it does change what you do, and the message was tighten up. the foot off the gas. So that's a little bit But also you look at I literally say that you you know, over a billion. Okay, I want to ask you about this concept you know, you've used the term before, of the individual clouds and to some of the things So I always like to do hard tech So you got to span multiple clouds. No, complimentary to them of a data platform, data apps. And of course people to bring their own, the quote, I'm going to kill it, And I think too, the other attribute is in the Multicloud 1.0 era, for the security Supercloud? And now the key thing is, And build your applications And the benefit is, But I think the benefit of Snowflake's is, you know what you're getting. which is no, no, you got to be open. that you can access. You can't pipeline that to save. You chose not to do Snowpark but do what you need to do they're trying to figure out how to grow their ecosystem. And we solve the key thing, I do the messy data And when you talk to So like you say, And I love to see Frank, you know, move. So you AVMAR... it's all good. but it's not everything you need to do. there's limitations to everything. so back to the Supercloud. Ed, I really appreciate you coming by, I'll see you tomorrow.

ENTITIES

Entity	Category	Confidence
Jeff Tincher	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Boston	LOCATION	0.99+
2008	DATE	0.99+
Jerry Chen	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Ed Walsh	PERSON	0.99+
Frank	PERSON	0.99+
Frank Slootman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Houston	LOCATION	0.99+
2010	DATE	0.99+
tomorrow	DATE	0.99+
Benoit	PERSON	0.99+
Ed	PERSON	0.99+
60%	QUANTITY	0.99+
Dave	PERSON	0.99+
ChaosSearch	ORGANIZATION	0.99+
June	DATE	0.99+
May of 2010	DATE	0.99+
BigQuery	TITLE	0.99+
Castles in The Cloud	TITLE	0.99+
September	DATE	0.99+
Data Domain	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
today	DATE	0.99+
$7	QUANTITY	0.99+
each cloud	QUANTITY	0.99+
both	QUANTITY	0.99+
over a billion	QUANTITY	0.99+
Multicloud 2.0	TITLE	0.99+
four days	QUANTITY	0.99+
M&A	ORGANIZATION	0.98+
one repository	QUANTITY	0.98+
Python	TITLE	0.98+
Databricks	ORGANIZATION	0.98+
Merritt Bear	PERSON	0.98+
Supercloud	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.97+
SQL	TITLE	0.97+
EC2	TITLE	0.97+
one	QUANTITY	0.96+
Fed	ORGANIZATION	0.96+
S3	TITLE	0.96+
five grand patents	QUANTITY	0.96+
Snowpark	ORGANIZATION	0.96+
Multicloud 1.0	TITLE	0.95+
billion	QUANTITY	0.94+
Avamar	ORGANIZATION	0.93+
EMC World	LOCATION	0.93+
Snowflake	PERSON	0.93+
one point	QUANTITY	0.93+
Supercloud	TITLE	0.93+
Equifax	ORGANIZATION	0.92+
92	QUANTITY	0.91+
Super PaaS	TITLE	0.91+
Snowflake	TITLE	0.89+

Priya Rajagopal | Supercloud22

(upbeat music) >> Okay, we're now going to try and stretch our minds a little bit and stretch Supercloud to the edge. Supercloud, as we've been discussing today and reporting through various breaking analyses, is a term we use to describe a continuous experience across clouds, or even on-prem, that adds new value on top of hyperscale infrastructure. Priya Rajagopal is the director of product management at Couchbase. She's a developer, a software architect, co-creator on a number of patents as well as being an expert on edge, IoT, and mobile computing technologies. And we're going to talk about edge requirements. Priya, you've been around software engineering and mobile and edge technologies your entire career, and now you're responsible for bringing enterprise class database technology to the edge and IoT environments, synchronizing. So, when you think about the edge, the near edge, the far edge, what are the fundamental assumptions that you have to make with regards to things like connectivity, bandwidth, security, and any other technical considerations when you think about software architecture for these environments? >> Sure, sure. First off, Dave, thanks for having me here. It's really exciting to be here again, my second time. And thank you for that kind introduction. So, quickly to get back to your question. When it comes to architecting for the edge our principle is prepare for the worst and hope for the best. Because, really, when it comes to edge computing, it's sort of the edge cases that come to bite you. You mentioned connectivity, bandwidth, security. I have a few more. Starting with connectivity, as you import on low network connectivity, think offshore oil rigs, cruise ships, or even retail settings, when you want to have business continuity, most of the time you've got an internet connection, but then when there is disruption, then you lose business continuity. Then when it comes to bandwidth, the notion or the approach we take is that bandwidth is always limited or it's at a premium. Data plans can go up through the roof, depending on the volume of data. Think medical clinics in rural areas. When it comes to security, edge poses unique challenges because you're moving away from this world garden, central cloud-based environment, and now everything is accessible over the internet. And the internet really is inherently untrustworthy. Every bit of data that is written or read by an application needs to be authenticated, needs to be authorized. The entire path needs to be secured end-to-end. It needs to be encrypted. That's confidentiality. Also the persistence of data itself. It needs to be encrypted on disk. Now, one of the advantages of edge computing or distributing data is that the impacted edge environment can be isolated away without impacting the other edge location. Looking at the classic retail architecture, if you've got retail use case, if you've got a a retail store where there's a security breach, you need to have a provision of isolating that store so that you don't bring down services for the other stores. When it comes to edge computing, you have to think about those aspects of security. Any of these locations could be breached. And if one of them is breached, how do you control that? So, that's to answer those three key topics that you brought up. But there are other considerations. One is data governance. That's a huge challenge. Because we are a database company at Couchbase, we think of database, data governance, compliance, privacy. All that is very paramount to our customers. It's not just about enforcing policies right now. We are talking about not enforcing policies in a central location, but you have to do it in a distributed fashion because one of the benefits of edge computing is, as you probably very well know, is the benefits it brings when it comes to data privacy, governance policies. You can enforce that at a granular scale because data doesn't have to ever leave the edge. But again, I talked about this in the context of security, there needs to be a way to control this data at the edge. You have to govern the data when it is at the edge remotely. Some of the other challenges when thinking about the edge is, of course, volume, scale, think IoT, mobile devices, classic far edge type scenarios. And I think the other criteria that we have to keep in mind when we are architecting a platform for this kind of computing paradigm is the heterogeneity of the edge itself. It's no longer a uniform set of compute and storage resources that are available at your disposal. You've got a variety of IoT devices. You've got mobile devices, different processing capabilities, different storage capabilities. When it comes to edge data centers, it's not uniform in terms of what services are available. Do they have a load balancer? Do they have a firewall? Can I deploy a firewall? These are all some key architectural considerations when it comes to actually architecting a solution for the edge. >> Great. Thank you for that awesome setup. Talking about stretching to the edge this idea of Supercloud that connote that single logical layer that spans across multiple clouds. It can include on on-prem, but a critical criterion is that the developer, and, of course, the user experience, is identical or substantially similar. Let's say identical. Let's say identical, irrespective of physical location. Priya, is that vision technically achievable today in the world of database. And if so, can you describe the architectural elements that make it possible to perform well and have low latency and the security and other criteria that you just mentioned? What's the technical enablers? Is it just good software. Is it architecture? Help us understand that. >> Sure. You brought up two aspects. You mentioned user experience, and then you mentioned from a developer standpoint, what does it take? And I'd like to address the two separately. They are very tightly related, but I'd like to address them separately. Just focusing on the easier of the two when it comes to user experience, what are the factors that impact user experience? You're talking about reliability of service. Always on, always available applications. It doesn't matter where the data is coming from. Whether the data is coming from my device, it's sourced from an on-prem data center, or if it is from the edge of the cloud, it's from a central cloud data center, from an end-user perspective, all they care about is that their application is available. The next is, of course, responsiveness. Users are getting increasingly impatient. Do you want to reduce wait times to service? You want something which is extremely fast. They're looking for immersive applications or immersive experiences, AR, VR, mixed reality use cases. Then something which is very critical, and what you just touched upon, is this sort of seamless experience. Like this omnichannel, as we talk about in the context of retail kind of experience, Or what I like to refer to as park and pick up reference. You park, you start your application, running your application, you start a transaction on one device, you park it, pick it up on another device. Or in case of retail, you walk into a store, you pick it up from there. So, there's a park and pick up. Seamless mobility of data is extremely critical. In the context of a database, when we talk about responsiveness, two key, the KPIs are latency, bandwidth. And latency is really the round trip time from the time it takes to make a request for data, and the response comes back. The factors that impact latency are, of course, the type of the network itself, but also the proximity of the data source to the point of consumption. And so the more number of hubs that the data packets have to take to reach from the source to its destination, then you're going to incur a lot of latency. And when it comes to bandwidth, we are talking about the capacity of the network. How much data can be shot through the pipe? And, of course, when edge computing, large number of clients. I talked about scale, the volume of devices. And when you're talking about all of them concurrently connected, then you're going to have network congestion which impacts bandwidth which, in turn, impacts performance. And so when it comes to how do you architect a solution for that, if you completely remove the reliance on network to the extent possible, then you get the highest guarantees when it comes to responsiveness, availability, reliability. Because your application is always going to be on. In order to do that, if you have the database and the data processing components co-located with the application that needs it, that would give you the best experience. But, of course, you want to bring it as close. A lot of times, it's not possible to end with that data within your application itself. And that's where you have options of your an on-prem data center, the edge of the cloud, max end and so on. So the closer you bring the data, you're going to get the better experience. Now, that's all great. But then when it comes to something to achieve a vision of Supercloud, when we talked about, "Hey, one way from a developer standpoint, I have one API to set up this connection to a server, but then behind the scenes, my data could be resident anywhere." How do you achieve something like that? And so, a critical aspect of the solution is data synchronization. I talked about data storage as a database, data storage database, that's a critical aspect of what database is really where the data is persisted, data processing, the APIs to access and query the data. But another really critical aspect of distributing a database is the data synchronization technology. And so once all the islands of data, whether it is on the device, whether it's an on-prem data center, whether it's the edge of the cloud, or whether it is a regional data center, once all those databases are kept in sync, then it's a question of when connectivity to one of those data centers goes down, then there needs to be a seamless switch to another data center. And today, at least when it comes to Couchbase, a lot of our customers do employ global load balancers which can automatically detect. So, from a perspective of an application, it's just one URL end point. But then when one of those services goes down or data centers goes down, we have active failover and standby. And so the load balance automatically redirects all the traffic to the backup data center. And of course, for that to happen, those two data centers need to be in sync. And that's critical. Did that answer your question? >> Yeah, let me jump in here. Thank you again for that. I want to unpack some of those, and I want use the example of Couchbase Light, which, as the name implies, a mobile version of Couchbase. I'm interested in a number of things that you said. You talked about, in some cases, you want to get data from the most proximate location. Is there a some kind of metadata intelligence that you have access to? I'm interested in how you do the synchronization. How do you deal with conflict resolution and recovery if something goes wrong? You're talking about distributed database challenges. How do you approach all that? >> Wow, great question. And probably one that I could occupy the entire session for, but I'll try and keep it brief and try and answer most of the points that you touched upon. So, we talked about distributed database and data sync. But here's the other challenge. A lot of these distributed locations can actually be disconnected. So, we've just exacerbated this whole notion of data sync. And that's what we call offline first, not just we call, what is typically referred to as offline first sync. But the ability for an application to run in a completely disconnected mode, but then when there is network connectivity, the data is synced back to the backend data servers. In order for this to happen, you need a sync protocol (indistinct). Since you asked in the context of Couchbase, our sync protocol, it's a web sockets, extremely lightweight data synchronization protocol that's resilient to network disruption. So, what this means is I could have hundreds of thousands of clients that are connected to a data center, and they could be at various stages of disconnect. And you have a field application, and then you are veering in and out of pockets of network connectivity, so network is disrupted, and then network connectivity is restored. Our sync protocol has got a built-in checkpoint mechanism that allows the two replicating points to have a handshake of what is the previous sync point, and only data from that previous sync point is sent to that specific client. And in order to achieve that you mentioned Couchbase Light, which is, of course, our embedded database for mobile, desktop and any embedded platform. But the one that handles the data synchronization is our Sync Gateway. So, we got a component, Sync Gateway, that sits with our Couchbase server, and that's responsible for securely syncing the data and implementing this protocol with Couchbase Light. You talked about conflict resolution. And it's great that you mentioned that. Because when it comes to data sync, a lot of times folks think, "Oh well, how hard can that be?" I mean, you request for some data, and you pull down the a data, and that's great. And that's the happy path. When all of the clients are connected, when there is reliable network connectivity, that's great. But we are, of course, talking about unreliable network connectivity and resiliency to network disruptions. And also the fact that you have lots of concurrently connected clients, all of them potentially updating the same piece of data. That's when you have a conflict, When two or more clients are updating the same, clients or writers. You could have the writes coming in from the clients. You could have the writes coming in from the backend systems. Either way, multiple writers do the same piece of data. That's when you have conflicts. Now, when it comes to, so, a little bit to explain how conflict resolution is handled within our data sync protocol in Couchbase, it would help to understand a little bit about what kind of database we are, how is data itself stored within our database. So, Couchbase Light is a NoSql JSON document store, which means everything is stored as JSON documents. And so every time there is a write, an update to a document, let's say you start with an initial version of the document, the document is created. Every time there is a mutation to a document, you have a new revision to that document. So, as you build in more rights or more mutations to that document, you build out what's called a revision tree. And so when does a conflict happen? Conflict happens when there is a branch in the tree. So, you've got two writers, writing to the same revision, then you get a branch, and that's what is a conflict. We have a way of detecting those conflicts automatically. That's conflict detection. So, now we know there's a conflict, but we have to resolve it. And within Couchbase, you have two options. You don't have to do anything about it. The system has built-in automatic conflict resolution heuristics built in. So, it's going to check, pick a winning revision. And so we use a bunch of criteria, and we pick a winning revision. So, if two writers are updating the same revision of the document, version of the document, we pick a winner. But then that seemed to work from our experience, 80% of the use cases. But then for the remaining 20%, applications would like to have more control over how the winner of the conflict is picked. And for that, applications can implement a custom conflict resolver. So, we'll automatically detect the conflicting revisions and send these conflicting revisions over to the application via a callback, and the application has access to the entire document body of the two revisions and can use whatever criteria needs to merge >> So, that's policy based in that example? >> Yes. >> Yeah, yeah, okay. >> So you can have user policy based, or you can have the automatic heuristics. >> Okay, I got to wrap because we're out of time, but I want to run this scenario by you. One of the risks to the Supercloud Nirvana that we always talk about is this notion of a new architecture emerging at the edge, far edge really, 'cause they're highly-distributed environments. They're low power, tons of data. And this idea of AI inferencing at the edge, a lot of the AI today is done in modeling in the cloud. You think about ARM processors in these new low-cost devices and massive processing power eventually overwhelming the economics. And then that's seeping back into the enterprise and disrupting it. Now, you still get the problem of federated governance and security, and that's probably going to be more centralized slash federated. But, in one minute, do you see that AI inferencing real-time taking off at the edge? Where is that on the S-curve? >> Oh, absolutely right. When it comes to IoT applications, it's all about massive volumes of data generated at the edge. You talked about the economics doesn't add up. Now you need to actually, the data needs to be actioned at some point. And if you have to transfer all of that over the internet for analysis, the responsiveness, you're going to lose that. You're not going to get that real-time responsiveness and availability. The edge is the perfect location. And a lot of this data is temporal in nature. So, you don't want that to be sent back to the cloud for long-term persistence, but instead you want that to be actioned close as possible to the source itself. And when you talk about, there are, of course, the really small microcontrollers and so on. Even there, you can actually have some local processing done, like tiny ML models, but then mobile devices, when you talk about those, as you're very well aware, these are extremely capable. They're capable of running neural, they have neural network processors. And so they can do a lot of processing locally itself. But then when you want to have an aggregated view within the edge, you want to process that data in an IoT gateway and only send the aggregated data back to the cloud for long-term analytics and persistence. >> Yeah, this is something we're watching, and I think could be highly disruptive, and it's hard to predict. Priya, I got to go. Thanks so much for coming on the "theCube." Really appreciate your time. >> Yeah, thank you. >> All right, you're watching "Supercloud 22." We'll be right back right after this short break. (upbeat music)

Published Date : Jul 25 2022

SUMMARY :

Priya Rajagopal is the most of the time you've is that the developer, that the data packets have to take that you have access to? most of the points that you touched upon. or you can have the automatic heuristics. One of the risks to the Supercloud Nirvana the data needs to be and it's hard to predict. after this short break.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Priya	PERSON	0.99+
Priya Rajagopal	PERSON	0.99+
two	QUANTITY	0.99+
two writers	QUANTITY	0.99+
80%	QUANTITY	0.99+
two revisions	QUANTITY	0.99+
Couchbase	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
second time	QUANTITY	0.99+
two aspects	QUANTITY	0.99+
two options	QUANTITY	0.99+
one	QUANTITY	0.99+
one minute	QUANTITY	0.99+
First	QUANTITY	0.98+
two data centers	QUANTITY	0.98+
one device	QUANTITY	0.98+
NoSql	TITLE	0.98+
JSON	TITLE	0.98+
today	DATE	0.98+
One	QUANTITY	0.97+
Couchbase	TITLE	0.97+
two key	QUANTITY	0.97+
Supercloud	ORGANIZATION	0.96+
Couchbase Light	TITLE	0.93+
three key topics	QUANTITY	0.93+
two replicating points	QUANTITY	0.91+
Supercloud 22	TITLE	0.87+
hundreds of thousands	QUANTITY	0.86+
one URL	QUANTITY	0.84+
first	QUANTITY	0.82+
Supercloud22	TITLE	0.77+
clients	QUANTITY	0.71+
one way	QUANTITY	0.69+
single logical layer	QUANTITY	0.69+
Supercloud Nirvana	ORGANIZATION	0.63+
more	QUANTITY	0.59+
them	QUANTITY	0.53+
Sync	OTHER	0.5+
theCube	TITLE	0.38+

Kam Amir, Cribl | HPE Discover 2022

>> TheCUBE presents HPE Discover 2022 brought to you by HPE. >> Welcome back to theCUBE's coverage of HPE Discover 2022. We're here at the Venetian convention center in Las Vegas Dave Vellante for John Furrier. Cam Amirs here is the director of technical alliances at Cribl'. Cam, good to see you. >> Good to see you too. >> Cribl'. Cool name. Tell us about it. >> So let's see. Cribl' has been around now for about five years selling products for the last two years. Fantastic company, lots of growth, started there 2020 and we're roughly 400 employees now. >> And what do you do? Tell us more. >> Yeah, sure. So I run the technical alliances team and what we do is we basically look to build integrations into platforms such as HPE GreenLake and Ezmeral. And we also work with a lot of other companies to help get data from various sources into their destinations or, you know other enrichments of data in that data pipeline. >> You know, you guys have been on theCUBE. Clint's been on many times, Ed Bailey was on our startup showcase. You guys are successful in this overfunded observability space. So, so you guys have a unique approach. Tell us about why you guys are successful in the product and some of the things you've been doing there. >> Yeah, absolutely. So our product is very complimentary to a lot of the technologies that already exist. And I used to joke around that everyone has these like pretty dashboards and reports but they completely glaze over the fact that it's not easy to get the data from those sources to their destinations. So for us, it's this capability with Cribl' Stream to get that data easily and repeatably into these destinations. >> Yeah. You know, Cam, you and I are both at the Snowflake Summit to John's point. They were like a dozen observability companies there. >> Oh yeah. >> And really beginning to be a crowded space. So explain what value you bring to that ecosystem. >> Yeah, sure. So the ecosystem that we see there is there are a lot of people that are kind of sticking to like effectively getting data and showing you dashboards reports about monitoring and things of that sort. For us, the value is how can we help customers kind of accelerate their adoption of these platforms, how to go from like your legacy SIM or your legacy monitoring solution to like the next-gen observability platform or next-gen security platform >> and what you do really well is the integration and bringing those other toolings to, to do that? >> Correct, correct. And we make it repeatable. >> How'd you end up here? >> HP? So we actually had a customer that actually deployed our software on the HPS world platform. And it was kind of a light bulb moment that, okay this is actually a different approach than going to your traditional, you know, AWS, Google, et cetera. So we decided to kind of hunt this down and figure out how we could be a bigger player in this space. >> You saw the data fabric announcement? I'm not crazy about the term, data fabric is an old NetApp term, and then Gartner kind of twisted it. I like data mesh, but anyway, it doesn't matter. We kind of know what it is, but but when you see an announcement like that how do you look at it? You know, what does it mean to to Cribl' and your customers? >> Yeah. So what we've seen is that, so we work with the data fabric team and we're able to kind of route our data to their, as a data lake, so we can actually route the data from, again all these very sources into this data lake and then have it available for whatever customers want to do with it. So one of the big things that I know Clint talks about is we give customers this, we sell choice. So we give them the ability to choose where they want to send their data, whether that's, you know HP's data lake and data fabric or some other object store or some other destination. They have that choice to do so. >> So you're saying that you can stream with any destination the customer wants? What are some examples? What are the popular destinations? >> Yeah so a lot of the popular destinations are your typical object stores. So any of your cloud object stores, whether it be AWS three, Google cloud storage or Azure blob storage. >> Okay. And so, and you can pull data from any source? >> Laughter: I'd be very careful, but absolutely. What we've seen is that a lot of people like to kind of look at traditional data sources like Syslog and they want to get it to us, a next-gen SIM, but to do so it needs to be converted to like a web hook or some sort of API call. And so, or vice versa, they have this brand new Zscaler for example, and they want to get that data into their SIM but there's no way to do it 'cause a SIM only accepts it as a Syslog event. So what we can do is we actually transform the data and make it so that it lands into that SIM in the format that it needs to be and easily make that a repeatable process >> So, okay. So wait, so not as a Syslog event but in whatever format the destination requires? >> Correct, correct. >> Okay. What are the limits on that? I mean, is this- >> Yeah. So what we've seen is that customers will be able to take, for example they'll take this Syslog event, it's unstructured data but they need to put it into say common information model for Splunk or Elastic common schema for Elastic search or just JSON format for Elastic. And so what we can do is we can actually convert those events so that they land in that transformed state, but we can also route a copy of that event in unharmed fashion, to like an S3 bucket for object store for that long term compliance user >> You can route it to any, basically any object store. Is that right? Is that always the sort of target? >> Correct, correct. >> So on the message here at HPE, first of all I'll get to the marketplace point in a second, but it's cloud to edge is kind of their theme. So data streaming sounds expensive. I mean, you know so how do you guys deal with the streaming egress issue? What does that mean to customers? You guys claim that you can save money on that piece. It's a hotly contested discussion point. >> Laughter: So one of the things that we actually just announced in our 350 release yesterday is the capability of getting data from Windows events, or from Windows hosts, I'm sorry. So a product that we also have is called Cribl' Edge. So our capability of being able to collect data from the edge and then transit it out to whether it be an on-prem, or self-hosted deployment of Cribl', or or maybe some sort of other destination object store. What we do is we actually take the data in in transit and reduce the volume of events. So we can do things like remove white space or remove events that are not really needed and compress or optimize that data so that the egress cost to your point are actually lowered. >> And your data reduction approach is, is compression? It's a compression algorithm? >> So it is a combination, yeah, so it's a combination. So there's some people what they'll do is they'll aggregate the events. So sometimes for example, VPC flow logs are very chatty and you don't need to have all those events. So instead you convert those to metrics. So suddenly you reduced those events from, you know high volume events to metrics that are so small and you still get the same value 'cause you still see the trends and everything. And if later on down the road, you need to reinvestigate those events, you can rehydrate that data with Cribl' replay >> And you'll do the streaming in real time, is that right? >> Yeah. >> So Kafka, is that what you would use? Or other tooling? >> Laughter: So we are complimentary to a Kafka deployment. Customer's already deployed and they've invested in Kafka, We can read off of Kafka and feed back into Kafka. >> If not, you can use your tooling? >> If not, we can be replacing that. >> Okay talk about your observations in the multi-cloud hybrid world because hybrid obviously everyone knows it's a steady state now. On public cloud, on premise edge all one thing, cloud operations, DevOps, data as code all the things we talk about. What's the customer view? You guys have a unique position. What's going on in the customer base? How are they looking at hybrid and specifically multi-cloud, is it stitching together multiple hybrids? Or how do you guys work across those landscapes? >> So what we've seen is a lot of customers are in multiple clouds. That's, you know, that's going to happen. But what we've seen is that if they want to egress data from say one cloud to another the way that we've architected our solution is that we have these worker nodes that reside within these hybrid, these other cloud event these other clouds, I should say so that transmitting data, first egress costs are lowered, but being able to have this kind of, easy way to collect the data and also stitch it back together, join it back together, to a single place or single location is one option that we offer customers. Another solution that we've kind of announced recently is Search. So not having to move the data from all these disparate data sources and data lakes and actually just search the data in place. That's another capability that we think is kind of popular in this hybrid approach. >> And talk about now your relationship with HPE you guys obviously had customers that drove you to Greenlake, obviously what's your experience with them and also talk about the marketplace presence. Is that new? How long has that been going on? Have you seen any results? >> Yeah, so we've actually just started our, our journey into this HPE world. So the first thing was obviously the customer's bringing us into this ecosystem and now our capabilities of, I guess getting ready to be on the marketplace. So having a presence on the marketplace has been huge giving us kind of access to just people that don't even know who we are, being that we're, you know a five year old company. So it's really good to have that exposure. >> So you're going to get customers out of this? >> That's the idea. [Laughter] >> Bring in new market, that's the idea of their GreenLake is that partners fill in. What's your impression so far of GreenLake? Because there seems to be great momentum around HP and opening up their channel their sales force, their customer base. >> Yeah. So it's been very beneficial for us, again being a smaller company and we are a channel first company so that obviously helps, you know bring out the word with other channel partners. But HP has been very, you know open arm kind of getting us into the system into the ecosystem and obviously talking, or giving the good word about Cribl' to their customers. >> So, so you'll be monetizing on GreenLake, right? That's the, the goal. >> That's the goal. >> What do you have to do to get into a position? Obviously, you got a relationship you're in the marketplace. Do you have to, you know, write to their API's or do you just have to, is that a checkbox? Describe what you have to do to monetize. >> Sure. So we have to first get validated on the platform. So the validation process validates that we can work on the Ezmeral GreenLake platform. Once that's been completed, then the idea is to have our logo show up on the marketplace. So customers say, Hey, look, I need to have a way to get transit data or do stuff with data specifically around logs, metrics, and traces into my logging solution or my SIM. And then what we do with them on the back end is we'll see this transaction occur right to their API to basically say who this customer is. 'Cause again, the idea is to have almost a zero touch kind of involvement, but we will actually have that information given to us. And then we can actually monetize on top of it. >> And the visualization component will come from the observability vendor. Is that right? Or is that somewhat, do you guys do some of that? >> So the visualization is right now we're basically just the glue that gets the data to the visualization engine. As we kind of grow and progress our search product that's what will probably have more of a visualization component. >> Do you think your customers are going to predominantly use an observability platform for that visualization? I mean, obviously you're going to get there. Are they going to use Grafana? Or some other tool? >> Or yeah, I think a lot of customers, obviously, depending on what data and what they're trying to accomplish they will have that choice now to choose, you know Grafana for their metrics, logs, et cetera or some sort of security product for their security events but same data, two different kind of use cases. And we can help enable that. >> Cam, I want to ask you a question. You mentioned you were at Splunk and Clint, the CEO and co-founder, was at Splunk too. That brings up the question I want to get your perspective on, we're seeing a modern network here with HPE, with Aruba, obviously clouds kind of going next level you got on premises, edge, all one thing, distributed computing basically, cyber security, a data problem that's solved a lot by you guys and people in this business, making sure data available machine learnings are growing and powering AI like you read about. What's changed in this business? Because you know, Splunking logs is kind of old hat you know, and now you got observability. Unification is a big topic. What's changed now? What's different about the market today around data and these platforms and, and tools? What's your perspective on that? >> I think one of the biggest things is people have seen the amount of volume of data that's coming in. When I was at Splunk, when we hit like a one terabyte deal that was a big deal. Now it's kind of standard. You're going to do a terabyte of data per day. So one of the big things I've seen is just the explosion of data growth, but getting value out of that data is very difficult. And that's kind of why we exist because getting all that volume of data is one thing. But being able to actually assert value from it, that's- >> And that's the streaming core product? That's the whole? >> Correct. >> Get data to where it needs to be for whatever application needs whether it's cyber or something else. >> Correct, correct. >> What's the customer uptake? What's the customer base like for you guys now? How many, how many customers you guys have? What are they doing with the data? What are some of the common things you're seeing? >> Yeah. I mean, it's, it's the basic blocking and tackling, we've significantly grown our customer base and they all have the same problem. They come to us and say, look, I just need to get data from here to there. And literally the routing use case is our biggest use case because it's simple and you take someone that's a an expensive engineer and operations engineer instead of having them going and doing the plumbing of data of just getting logs from one source to another, we come in and actually make that a repeatable process and make that easy. And so that's kind of just our very basic value add right from the get go. >> You can automate that, automate that, make it repeatable. Say what's in the name? Where'd the name come from? >> So Cribl', if you look it up, it's actually kind of an old shiv to get to siphon dirt from gold, right? So basically you just, that's kind of what we do. We filter out all the dirt and leave you the gold bits so you can get value. >> It's kind of what we do on theCUBE. >> It's kind of the gold nuggets. Get all these highlights, hitting Twitter, the golden, the gold nuggets. Great to have you on. >> Cam, thanks for, for coming on, explaining that sort of you guys are filling that gap between, Hey all the observability claims, which are all wonderful but then you got to get there. They got to have a route to get there. That's what got to do. Cribl' rhymes with tribble. Dave Vellante for John Furrier covering HPE Discover 2022. You're watching theCUBE. We'll be right back.

Published Date : Jun 29 2022

SUMMARY :

2022 brought to you by HPE. Cam Amirs here is the director Tell us about it. for the last two years. And what do you do? So I run the of the things you've been doing there. that it's not easy to get the data and I are both at the Snowflake So explain what value you So the ecosystem that we we make it repeatable. to your traditional, you You saw the data fabric So one of the big things So any of your cloud into that SIM in the format the destination requires? I mean, is this- but they need to put it into Is that always the sort of target? You guys claim that you can that the egress cost to your And if later on down the road, you need to Laughter: So we are all the things we talk about. So not having to move the data customers that drove you So it's really good to have that exposure. That's the idea. Bring in new market, that's the idea so that obviously helps, you know So, so you'll be monetizing Describe what you have to do to monetize. 'Cause again, the idea is to And the visualization the data to the visualization engine. are going to predominantly use now to choose, you know Cam, I want to ask you a question. So one of the big things I've Get data to where it needs to be And literally the routing use Where'd the name come from? So Cribl', if you look Great to have you on. of you guys are filling

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Ed Bailey	PERSON	0.99+
Splunk	ORGANIZATION	0.99+
Cribl	ORGANIZATION	0.99+
Kam Amir	PERSON	0.99+
Cam Amirs	PERSON	0.99+
HP	ORGANIZATION	0.99+
Clint	PERSON	0.99+
John Furrier	PERSON	0.99+
Aruba	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Elastic	TITLE	0.99+
one terabyte	QUANTITY	0.99+
2020	DATE	0.99+
HPE	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Kafka	TITLE	0.99+
one option	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Cam	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Grafana	ORGANIZATION	0.98+
400 employees	QUANTITY	0.98+
TheCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
Splunk	TITLE	0.98+
one thing	QUANTITY	0.98+
today	DATE	0.98+
Twitter	ORGANIZATION	0.97+
both	QUANTITY	0.97+
first	QUANTITY	0.97+
first thing	QUANTITY	0.96+
Windows	TITLE	0.96+
Cribl	PERSON	0.96+
one source	QUANTITY	0.96+
first company	QUANTITY	0.95+
single location	QUANTITY	0.95+
about five years	QUANTITY	0.95+
S3	TITLE	0.94+
five year old	QUANTITY	0.91+
Syslog	TITLE	0.91+
single place	QUANTITY	0.91+
John	PERSON	0.91+
Cribl	TITLE	0.88+
last two years	DATE	0.84+
NetApp	TITLE	0.83+
GreenLake	ORGANIZATION	0.83+
zero touch	QUANTITY	0.82+
Cribl' Stream	ORGANIZATION	0.81+
Ezmeral	ORGANIZATION	0.8+
two different	QUANTITY	0.78+
a terabyte of data per day	QUANTITY	0.76+
Venetian convention center	LOCATION	0.75+
350 release	QUANTITY	0.75+
Zscaler	TITLE	0.74+
one cloud	QUANTITY	0.7+
Greenlake	ORGANIZATION	0.65+
HPE Discover 2022	EVENT	0.62+

Tony Baer, dbInsight | MongoDB World 2022

>>Welcome back to the big apple, everybody. The Cube's continuous coverage here of MongoDB world 2022. We're at the new Javet center. It's it's quite nice. It was built during the pandemic. I believe on top of a former bus terminal. I'm told by our next guest Tony bear, who's the principal at DB insight of data and database expert, longtime analyst, Tony. Good to see you. Thanks for coming >>On. Thanks >>For having us. You face to face >>And welcome to New York. >>Yeah. Right. >>New York is open for business. >>So, yeah. And actually, you know, it's interesting. We've been doing a lot of these events lately and, and especially the ones in Vegas, it's the first time everybody's been out, you know, face to face, not so much here, you know, people have been out and about a lot of masks >>In, >>In New York city, but, but it's good. And, and this new venue is fantastic >>Much nicer than the old Javits. >>Yeah. And I would say maybe 3000 people here. >>Yeah. Probably, but I think like most conferences right now are kind of, they're going through like a slow ramp up. And like for instance, you know, sapphires had maybe about one third, their normal turnout. So I think that you're saying like one third to one half seems to be the norm right now are still figuring out how we're, how and where we're gonna get back together. Yeah. >>I think that's about right. And, and I, but I do think that that in most of the cases that we've seen, it's exceeded people's expectations at tenants, but anyway sure. Let's talk about Mongo, very interesting company. You know, we've been kind of been watching their progression from just sort of document database and all the features and functions they're adding, you just published a piece this morning in venture beat is time for Mongo to get into analytics. Yes. You know? Yes. One of your favorite topics. Well, can they expand analytics? They seem to be doing that. Let's dig into it. Well, >>They're taking, they've been taking slow. They've been taking baby steps and there's good reason for that because first thing is an operational database. The last thing you wanna do is slow it down with very complex analytics. On the other hand, there's huge value to be had if you would, if you could, you know, turn, let's say a smart, if you can turn, let's say an operational database or a transaction database into a smart transaction database. In other words, for instance, you know, let's say if you're, you're, you're doing, you know, an eCommerce site and a customer has made an order, that's basically been out of the norm. Whether it be like, you know, good or bad, it would be nice. Basically, if at that point you could then have a next best action, which is where analytics comes in. But it's a very lightweight form of analytics. It's not gonna, it's actually, I think probably the best metaphor for this is real time credit scoring. It's not that they're doing your scoring you in real time. It's that the model has been computed offline so that when you come on in real time, it can make a smart decision. >>Got it. Okay. So, and I think it was your article where I, I wrote down some examples. Sure. Operational, you know, use cases, patient data. There's certainly retail. We had Forbes on earlier, right? Obviously, so very wide range of, of use cases for operational will, will Mongo, essentially, in your view, is it positioned to replace traditional R D BMS? >>Well, okay. That's a long that's, that's much, it's >>Sort of a loaded question, but >>That's, that's a very loaded question. I think that for certain cases, I think it will replace R D BMS, but I still, I mean, where I, where I depart from Mongo is I do not believe that they're going to replace all R D D BMSs. I think, for instance, like when you're doing financial transactions, you know, the world has been used to table, you know, you know, columns and rows and tables. That's, it's a natural form for something that's very structured like that. On the other hand, when you take a look, let's say OT data, or you're taking a look at home listings that tends to more naturally represent itself as documents. And so there's a, so it's kind of like documents are the way that let's say you normally see the world. Relational is the way that you would structure the world. >>Okay. Well, I like that. So, but I mean, in the early days, obviously, and even to this day, it's like the target for Mongo has been Oracle. Yeah. Right, right. And so, and then, you know, you talk to a lot of Oracle customers as do I sure. And they are running the most mission, critical applications in the world, and it's like banking and financial and so many. And, and, and, you know, they've kind of carved out that space, but are we, should we be rethinking the definition of, of mission critical? Is that changing? >>Well, number one, I think what we've traditionally associated mission critical systems with is our financial transaction systems and to a less, and also let's say systems that schedule operations. But the fact is there are many forms of operations where for instance, let's say you're in a social network, do you need to have that very latest update? Or, you know, basically, can you go off, let's say like, you know, a server that's eventually consistent. In other words, the, do you absolutely have, you know, it's just like when you go on Twitter, do you naturally see all the latest tweets? It's not the system's not gonna crash for that reason. Whereas let's say if you're doing it, you know, let's say an ATM banking ATM system, that system better be current. So I think there's a delineation. The fact is, is that in a social network, arguably that operational system is mission critical, but it's mission critical in a different way from a, you know, from, let's say a banking system. >>So coming back to this idea of, of this hybrid, I think, you know, I think Gartner calls it H tab hybrid, transactional analytics >>Is changed by >>The minute, right. I mean, you mentioned that in, in your article, but basically it's bringing analytics to transactions bringing those, those roles together. Right. Right. And you're saying with Mongo, it's, it's lightweight now take, you use two other examples in your article, my SQL heat wave. Right. I think you had a Google example as well, DB, those are, you're saying much, much heavier analytics, is that correct? Or >>I we'll put it this way. I think they're because they're coming from a relational background. And because they also are coming from companies that already have, you know, analytic database or data warehouses, if you will, that their analytic, you know, capabilities are gonna be much more fully rounded than what Mongo has at this point. It's not a criticism of a Mongo MongoDB per >>Per, is that by design though? Or ne not necessarily. Is that a function of maturity? >>I think it's function of maturity. Oh, okay. I mean, look, to a certain extent, it's also a function of design in terms of that the document model is a little, it's not impossible to basically model it for analytics, but it takes more, you know, transformation to, to decide which, you know, let's say field in that document is gonna be a column. >>Now, the big thing about some of these other, these hybrid systems is, is eliminating the need for two databases, right? Eliminating the need for, you know, complex ETL. Is, is that a value proposition that will emerge with, with Mongo in your view? >>You know, I, I mean, put it this way. I think that if you take a look at how they've, how Mongo is basically has added more function to its operations, someone talking about analytics here, for instance, adding streaming, you know, adding, adding, search, adding time series, that's a matter of like where they've eliminated the need to do, you know, transformation ETL, but that's not for analytics per se for analytics. I think through, you know, I mean through replication, there's still gonna be some transformation in terms of turning, let's say data, that's, that's formed in a document into something that's represented by columns. There is a form of transformation, you know, so that said, and Mongo is already, you know, it has some NA you know, nascent capability there, but it's all, but this is still like at a rev 1.0 level, you know, I expect a lot more >>Of so refin you, how Amazon says in the fullness of time, all workloads will be in the cloud. And we could certainly debate that. What do we mean by cloud? So, but there's a sort of analog for Mongo that I'll ask you in the fullness of time, will Mongo be in a position to replace data warehouses or data lakes? No. Or, or, or, and we know the answer is no. So that's of course, yeah. But are these two worlds on a quasi collision course? I think they >>More on a convergence course or the collision course, because number one is I said, the first principle and operational database is the last thing you wanna do is slow it down. And to do all this complex modeling that let's say that you would do in a data bricks, or very complex analytics that you would do in a snowflake that is going to get, you know, you know, no matter how much you partition the load, you know, in Atlas, and yes, you can have separate nodes. The fact is you really do not wanna burden the operational database with that. And that's not what it's meant for, but what it is meant for is, you know, can I make a smart decision on the spot? In other words, kinda like close the loop on that. And so therefore there's a, a form of lightweight analytic that you can perform in there. And actually that's also the same principle, you know, on which let's say for instance, you know, my SQL heat wave and Allo DBR based on, they're not, they're predicated on, they're not meant to replace, you know, whether it be exit data or big query, the idea there is to do more of the lightweight stuff, you know, and keep the database, you know, keep the operations, you know, >>Operating. And, but from a practitioner's standpoint, I, I, I can and should isolate you're saying that node, right. That's what they'll do. Sure. How does that affect cuz my understanding is that that the Mon Mongo specifically, but I think document databases generally will have a primary node. Right? And then you can set up secondary nodes, which then you have to think about availability, but, but would that analytic node be sort of fenced off? Is that part of the >>Well, that's actually what they're, they've already, I mean, they already laid the groundwork for it last year, by saying that you can set up separate nodes and dedicate them to analytics and what they've >>As, as a primary, >>Right? Yes, yes. For analytics and what they've added, what they're a, what they are adding this year is the fact to say like that separate node does not have to be the same instance class, you know, as, as, as, as the, >>What, what does that mean? Explain >>That in other words, it's a, you know, you could have BA you know, for instance, you could have a node for operations, that's basically very eye ops intensive, whereas you could have a node let's say for analytics that might be more compute intensive or, or more he, or, or more heavily, you know, configured with, with memory per se. And so the idea here is you can tailor in a node to the workload. So that's, you know what they're saying with, you know, and I forget what they're calling it, but the idea that you can have a different type, you can specify a different type of node, a different type of instance for the analytic node, I think is, you know, is a major step forward >>And that, and that that's enabled by the cloud and architecture. >>Of course. Yes. I mean, we're separating, compute from data is, is, is the starter. And so yeah. Then at that point you can then start to, you know, you know, to go less vanilla. I think, you know, the re you know, the, you know, the, I guess the fruition of this is going to be when they say, okay, you can run your, let's say your operational nodes, you know, dedicated, but we'll let you run your analytic nodes serverless. Can't do it yet, but I've gotta believe that's on the roadmap. >>Yeah. So seq brings a lot of overhead. So you get MQL, but now square this circle for me, cuz now you got Mago talking sequel. >>They had to start doing that some time. I mean, and I it's been a court take I've had from them from the, from the get go, which I said, I understand that you're looking at this as an alternative to SQL and that's perfectly valid, but don't deny the validity of SQL or the reason why we, you know, we need it. The fact is that you have, okay, the number, you know, according to Ty index, JavaScript is the seventh, most popular language. Most SQL follows closely behind at the ninth, most popular language you don't want to cl. And the fact is those people exist in the enterprise and they're, and they're disproportionately concentrated in analytics. I mean, you know, it's getting a little less, so now we're seeing like, you know, basically, you know, Python, the programmatic, but still, you know, a lot of sequel expertise there. It does not make, it makes no sense for Mongo to, to, to ignore or to overlook that audience. I think now they're, you know, you know, they're taking baby steps to start, you know, reaching out to them. >>It's interesting. You see it going both ways. See Oracle announces a Mongo, DB, Mongo. I mean, it's just convergence. You called it not, I love collisions, you know, >>I know it's like, because you thrive on drama and I thrive on can't. We all love each other, but you know, act. But the thing is actually, I've been, I wrote about this. I forget when I think it was like 2014 or 2016. It's when we, I was noticed I was noting basically the, you know, the rise of all these specialized databases and probably Amazon, you know, AWS is probably the best exemplar of that. I've got 15 or 16 or however, number of databases and they're all dedicated purpose. Right. But I also was, you know, basically saw that inevitably there was gonna be some overlap. It's not that all databases were gonna become one and the same we're gonna be, we're gonna become back into like the, you know, into a pan G continent or something like that. But that you're gonna have a relational database that can do JSON and, and a, and a document database that can do relational. I mean, you know, it's, to me, that's a no brainer. >>So I asked Andy Ja one time, I'd love to get your take on this, about those, you know, multiple data stores at the time. They probably had a thousand. I think they're probably up to 15 now, right? Different APIs, different S et cetera. And his response. I said, why don't you make it easier for, for customers and maybe build an abstraction or converge these? And he said, well, it's by design. What if you buy this? And, and what your thoughts are, cuz I, you know, he's a pretty straight shooter. Yeah. It's by design because it allows us as the market moves, we can move with it. And if we, if we give developers access to those low level primitives and APIs, then they can move with, with at market speed. Right. And so that again, by design, now we heard certainly Mongo poo pooing that today they didn't mention, they didn't call out Amazon. Yeah. Oracle has no compunction about specifically calling out Amazon. They do it all the time. What do you make of that? Can't Amazon have its cake and eat it too. In other words, extend some of the functionality of those specific databases without going to the Swiss army. >>I I'll put it this way. You, you kind of tapped in you're, you're sort of like, you know, killing me softly with your song there, which is that, you know, I was actually kind of went on a rant about this, actually know in, you know, come, you know, you know, my year ahead sort of out predictions. And I said, look, cloud folks, it's great that you're making individual SAS, you know, products easy to use. But now that I have to mix and match SAS products, you know, the burden of integration is on my shoulders. Start making my life easier. I think a good, you know, a good example of this would be, you know, for instance, you could take something like, you know, let's say like a Google big query. There's no reason why I can't have a piece of that that might, you know, might be paired, say, you know, say with span or something like that. >>The idea being is that if we're all working off a common, you know, common storage, we, you know, it's in cloud native, we can separate the computer engines. It means that we can use the right engine for the right part of the task. And the thing is that maybe, you know, myself as a consumer, I should not have to be choosing between big query and span. But the thing is, I should be able to say, look, I want to, you know, globally distribute database, but I also wanna do some analytics and therefore behind the scenes, you know, new microservices, it could connect the two wouldn't >>Microsoft synapse be an example of doing that. >>It should be an example. I wish I, I would love to hear more from Microsoft about this. They've been radio silent for about the past two or three years in data. You hardly hear about it, but synapse is actually those actually one of the ideas I had in mind now keep in mind that with synapse, you're not talking about, let's say, you know, I mean, it's, it's obviously a sequel data warehouse. It's not pure spark. It's basically their, it was their curated version of spark, but that's fine. But again, I would love to hear Microsoft talk more about that. They've been very quiet. >>Yeah. You, you, the intent is there to >>Simplify >>It exactly. And create an abstraction. Exactly. Yeah. They have been quiet about it. Yeah. Yeah. You would expect that, that maybe they're still trying to figure it out. So what's your prognosis from Mongo? I mean, since this company IP, you know, usually I, I tell and I tell everybody this, especially my kids, like don't buy a stock at IPO. You'll always get a better chance at a cheaper price to buy it. Yeah. And even though that was true with Mongo, you didn't have a big window. No. Like you did, for instance, with, with Facebook, certainly that's been the case with snowflake and sure. Alibaba, I mean, I name a zillion style was almost universal. Yeah. But, but since that, that, that first, you know, few months, period, this, this company has been on a roll. Right. And it, it obviously has been some volatility, but the execution has been outstanding. >>No question about that. I mean, the thing is, look what I, what I, and I'm just gonna talk on the product side on the sales side. Yeah. But on the product side, from the get go, they made a product that was easy for developers. Whereas let's say someone's giving an example, for instance, Cosmo CB, where to do certain operations. They had to go through multiple services in, you know, including Azure portal with Atlas, it's all within Atlas. So they've really, it's been kinda like design thinking from the start initially with, with the core Mongo DB, you know, you, the on premise, both this predates Atlas, I mean, part of it was that they were coming with a language that developers knew was just Javas script. The construct that they knew, which was JS on. So they started with that home core advantage, but they weren't the only ones doing that. But they did it with tooling that was very intuitive to developers that met developers, where they lived and what I give them, you know, then additional credit for is that when they went to the cloud and it wasn't an immediate thing, Atlas was not an overnight success, but they employed that same design thinking to Atlas, they made Atlas a good cloud experience. They didn't just do a lift and shift the cloud. And so that's why today basically like five or six years later, Atlas's most of their business. >>Yeah. It's what, 60% of the business now. Yeah. And then Dave, on the, on the earning scholar, maybe it wasn't Dave and somebody else in response to question said, yeah, ultimately this is the future will be be 90% of the business. I'm not gonna predict when. So my, my question is, okay, so let's call that the midterm midterm ATLA is gonna be 90% of the business with some exceptions that people just won't move to the cloud. What's next is the edge. A new opportunity is Mongo architecturally suited for the, I mean, it's certainly suited for the right, the home Depot store. Sure. You know, at the edge. Yeah. If you, if you consider that edge, which I guess it is form of edge, but how about the far edge EVs cell towers, you know, far side, real time, AI inferencing, what's the requirement there, can Mongo fit there? Any thoughts >>On that? I think the AI and the inferencing stuff is interesting. It's something which really Mongo has not tackled yet. I think we take the same principle, which is the lightweight stuff. In other words, you'll say, do let's say a classification or a prediction or some sort of prescriptive action in other words, where you're not doing some convolution, neural networking and trying to do like, you know, text, text to voice or, or, or vice versa. Well, you're not trying to do all that really fancy stuff. I think that's, you know, if you're keeping it SIM you know, kinda like the kiss principle, I think that's very much within Mongo's future. I think with the realm they have, they basically have the infrastructure to go out to the edge. I think with the fact that they've embraced GraphQL has also made them a lot more extensible. So I think they certainly do have, you know, I, I do see the edge as being, you know, you know, in, in, you know, in their, in their pathway. I do see basically lightweight analytics and lightweight, let's say machine learning definitely in their >>Future. And, but, and they would, would you agree that they're in a better position to tap that opportunity than say a snowflake or an Oracle now maybe M and a can change that. R D can maybe change that, but fundamentally from an architectural standpoint yeah. Are they in a better position? >>Good question. I think that that Mongo snowflake by virtual fact, I mean that they've been all, you know, all cloud start off with, I think makes it more difficult, not impossible to move out to the edge, but it means that, and I, and know, and I, and I said, they're really starting to making some tentative moves in that direction. I'm looking forward to next week to, you know, seeing what, you know, hearing what we're gonna, what they're gonna be saying about that. But I do think, right. You know, you know, to answer your question directly, I'd say like right now, I'd say Mongo probably has a, you know, has a head start there. >>I'm losing track of time. I could go forever with you. Tony bear DB insight with tons of insights. Thanks so much for coming back with. >>It's only one insight insight, Dave. Good to see you again. All >>Right. Good to see you. Thank you. Okay. Keep it right there. Right back at the Java center, Mongo DB world 2022, you're watching the cube.

Published Date : Jun 7 2022

SUMMARY :

We're at the new Javet center. You face to face and especially the ones in Vegas, it's the first time everybody's been out, you know, And, and this new venue is fantastic And like for instance, you know, sapphires had maybe about one third, their normal turnout. you just published a piece this morning in venture beat is time for Mongo It's that the model has been computed offline so that when you come on in Operational, you know, use cases, patient data. That's a long that's, that's much, it's transactions, you know, the world has been used to table, you know, you know, columns and rows and and then, you know, you talk to a lot of Oracle customers as do I sure. you know, it's just like when you go on Twitter, do you naturally see all the latest tweets? I mean, you mentioned that in, in your article, but basically it's bringing analytics to transactions bringing are coming from companies that already have, you know, analytic database or data warehouses, Per, is that by design though? but it takes more, you know, transformation to, to decide which, you know, Eliminating the need for, you know, complex ETL. I think through, you know, I mean through replication, there's still gonna be some transformation in terms of turning, but there's a sort of analog for Mongo that I'll ask you in the fullness of time, And actually that's also the same principle, you know, on which let's say for instance, And then you can set up secondary nodes, which then you have to think about availability, the fact to say like that separate node does not have to be the same instance class, you know, for the analytic node, I think is, you know, is a major step forward you know, the re you know, the, you know, the, I guess the fruition of this is going to be when they but now square this circle for me, cuz now you got Mago talking sequel. I think now they're, you know, you know, they're taking baby steps to start, you know, reaching out to them. You called it not, I love collisions, you know, I mean, you know, it's, to me, that's a no brainer. I said, why don't you make it easier for, for customers and maybe build an abstraction or converge these? I think a good, you know, a good example of this would be, you know, for instance, you could take something But the thing is, I should be able to say, look, I want to, you know, globally distribute database, let's say, you know, I mean, it's, it's obviously a sequel data warehouse. I mean, since this company IP, you know, usually I, I tell and I tell everybody this, to developers that met developers, where they lived and what I give them, you know, but how about the far edge EVs cell towers, you know, you know, you know, in, in, you know, in their, in their pathway. And, but, and they would, would you agree that they're in a better position to tap that opportunity I mean that they've been all, you know, all cloud start off with, I could go forever with you. Good to see you again. Right back at the Java center, Mongo DB

ENTITIES

Entity	Category	Confidence
Teresa	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Khalid Al Rumaihi	PERSON	0.99+
Phil Soren	PERSON	0.99+
Bahrain	LOCATION	0.99+
Mike	PERSON	0.99+
Dave Volante	PERSON	0.99+
TIBCO	ORGANIZATION	0.99+
General Electric	ORGANIZATION	0.99+
Teresa Carlson	PERSON	0.99+
John Furrier	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Tony	PERSON	0.99+
2016	DATE	0.99+
AWS	ORGANIZATION	0.99+
Pega	ORGANIZATION	0.99+
Khalid	PERSON	0.99+
Tony Baer	PERSON	0.99+
Asia	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
2014	DATE	0.99+
$100 million	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Sunnyvale	LOCATION	0.99+
March 2015	DATE	0.99+
Dave	PERSON	0.99+
Jeff	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
46%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Todd Nielsen	PERSON	0.99+
2017	DATE	0.99+
September	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
July	DATE	0.99+
US	LOCATION	0.99+
Atlas	ORGANIZATION	0.99+
Bahrain Economic Development Board	ORGANIZATION	0.99+
Kuwait	LOCATION	0.99+
Malta	LOCATION	0.99+
Hong Kong	LOCATION	0.99+
Singapore	LOCATION	0.99+
2012	DATE	0.99+
Gulf Cooperation Council	ORGANIZATION	0.99+
So Cal	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
United States	LOCATION	0.99+
Vegas	LOCATION	0.99+
John	PERSON	0.99+
New York	LOCATION	0.99+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Mark Lyons, Dremio | AWS Startup Showcase S2 E2

(upbeat music) >> Hello, everyone and welcome to theCUBE presentation of the AWS startup showcase, data as code. This is season two, episode two of the ongoing series covering the exciting startups from the AWS ecosystem. Here we're talking about operationalizing the data lake. I'm your host, John Furrier, and my guest here is Mark Lyons, VP of product management at Dremio. Great to see you, Mark. Thanks for coming on. >> Hey John, nice to see you again. Thanks for having me. >> Yeah, we were talking before we came on camera here on this showcase we're going to spend the next 20 minutes talking about the new architectures of data lakes and how they expand and scale. But we kind of were reminiscing by the old big data days, and how this really changed. There's a lot of hangovers from (mumbles) kind of fall through, Cloud took over, now we're in a new era and the theme here is data as code. Really highlights that data is now in the developer cycles of operations. So infrastructure is code-led DevOps movement for Cloud programmable infrastructure. Now you got data as code, which is really accelerating DataOps, MLOps, DatabaseOps, and more developer focus. So this is a big part of it. You guys at Dremio have a Cloud platform, query engine and a data tier innovation. Take us through the positioning of Dremio right now. What's the current state of the offering? >> Yeah, sure, so happy to, and thanks for kind of introing into the space that we're headed. I think the world is changing, and databases are changing. So today, Dremio is a full database platform, data lakehouse platform on the Cloud. So we're all about keeping your data in open formats in your Cloud storage, but bringing that full functionality that you would want to access the data, as well as manage the data. All the functionality folks would be used to from NC SQL compatibility, inserts updates, deletes on that data, keeping that data in Parquet files in the iceberg table format, another level of abstraction so that people can access the data in a very efficient way. And going even further than that, what we announced with Dremio Arctic which is in public preview on our Cloud platform, is a full get like experience for the data. So just like you said, data as code, right? We went through waves and source code and infrastructure as code. And now we can treat the data as code, which is amazing. You can have development branches, you can have staging branches, ETL branches, which are separate from production. Developers can do experiments. You can make changes, you can test those changes before you merge back to production and let the consumers see that data. Lots of innovation on the platform, super fast velocity of delivery, and lots of customers adopting it in just in the first month here since we announced Dremio Cloud generally available where the adoption's been amazing. >> Yeah, and I think we're going to dig into the a lot of the architecture, but I want to highlight your point you made about the branching off and taking a branch of Git. This is what developers do, right? The developers use GitHub, Git, they bake branches from code. They build on top of other code. That's open source. This is what's been around for generations. Now for the first time we're seeing data sets being taken out of production to be worked on and coded and tested and even doing look backs or even forward looking analysis. This is data being programmed. This is data as code. This is really, you couldn't get any closer to data as code. >> Yeah. It's all done through metadata by the way. So there's no actual copying of these data sets 'cause in these big data systems, Cloud data lakes and stuff, and these tables are billions of records, trillions of records, super wide, hundreds of columns wide, thousands of columns wide. You have to do this all through metadata operations so you can control what version of the data basically a individual's working with and which version of the data the production systems are seeing because these data sets are too big. You don't want to be moving them. You can't be moving them. You can't be copying them. It's all metadata and manifest files and pointers to basically keep track of what's going on. >> I think this is the most important trend we've seen in a long time, because if you think about what Agile did for developers, okay, speed, DevOps, Cloud scale, now you've got agility in the data side of it where you're basically breaking down the old proprietary, old ways of doing data warehousing, but not killing the functionality of what data warehouses did. Just doing more volume data warehouses where proprietary, not open. They were different use cases. They were single application developers when used data warehouse query, not a lot of volume. But as you get volume, these things are inadequate. And now you've got the new open Agile. Is this Agile data engineering at play here? >> Yeah, I think it totally is. It's bringing it as far forward in as possible. We're talking about making the data engineering process easier and more productive for the data engineer, which ultimately makes the consumers of that data much happier as well as way more experiments can happen. Way more use cases can be tried. If it's not a burden and it doesn't require building a whole new pipeline and defining a schema and adding columns and data types and all this stuff, you can do a lot more with your data much faster. So it's really going to be super impactful to all these businesses out there trying to be data driven, especially when you're looking at data as a code and branching, a branch off, you can de-risk your changes. You're not worried about messing up the production system, messing up that data, having it seen by end user. Some businesses data is their business so that data would be going all the way to a consumer, a third party. And then it gets really scary. There's a lot of risk if you show the wrong credit score to a consumer or you do something like that. So it's really de-risking... >> Even updating machine learning algorithms. So for instance, if the data sets change, you can always be iterating on things like machine learning or learning algorithms. This is kind of new. This is awesome, right? >> I think it's going to change the world because this stuff was so painful to do. The data sets had gotten so much bigger as you know, but we were still doing it in the old way, which was typically moving data around for everyone. It was copying data down, sampling data, moving data, and now we're just basically saying, hey, don't do that anymore. We got to stop moving the data. It doesn't make any sense. >> So I got to ask you Mark, data lakes are growing in popularity. I was originally down on data lakes. I called them data swamps. I didn't think they were going to be as popular because at that time, distributed file systems like Hadoop, and object store in the Cloud were really cool. So what happened between that promise of distributed file systems and object store and data lakes? What made data lakes popular? What made that work in your opinion? >> Yeah, it really comes down to the metadata, which I already mentioned once. But we went through these waves. John you saw we did the EDWs to the data lakes and then the Cloud data warehouses. I think we're at the start of a cycle back to the data lake. And it's because the data lakes this time around with the Apache iceberg table format, with project (mumbles) and what Dremio's working on around metadata, these things aren't going to become data swamps anymore. They're actually going to be functional systems that do inserts updates into leads. You can see all the commits. You can time travel them. And all the files are actually managed and optimized so you have to partition the data. You have to merge small files into larger files. Oh, by the way, this is stuff that all the warehouses have done behind the scenes and all the housekeeping they do, but people weren't really aware of it. And the data lakes the first time around didn't solve all these problems so that those files landing in a distributed file system does become a mess. If you just land JSON, Avro or Parquet files, CSV files into the HDFS, or in S3 compatible, object store doesn't matter, if you're just parking files and you're going to deal with it as schema and read instead of schema and write, you're going to have a mess. If you don't know which tool changed the files, which user deleted a file, updated a file, you will end up with a mess really quickly. So to take care of that, you have to put a table format so everyone's looking at Apache iceberg or the data bricks Delta format, which is an interesting conversation similar to the Parquet and org file format that we saw play out. And then you track the metadata. So you have those manifest files. You know which files change when, which engine, which commit. And you can actually make a functional system that's not going to become a swamp. >> Another trend that's extending on beyond the data lake is other data sources, right? So you have a lot of other data, not just in data lakes so you have to kind of work with that. How do you guys answer the question around some of the mission critical BI dashboards out there on the latency side? A lot of people have been complaining that these mission critical BI dashboards aren't getting the kind of performance as they add more data sources and they try to do more. >> Yeah, that's a great question. Dremio does actually a bunch of interesting things to bring the performance of these systems up because at the end of the day, people want to access their data really quickly. They want the response times of these dashboards to be interactive. Otherwise the data's not interesting if it takes too long to get it. To answer a question, yeah, a couple of things. First of all, from a data source's side, Dremio is very proficient with our Parquet files in an object store, like we just talked about, but it also can access data in other relational systems. So whether that's a Postgres system, whether that's a Teradata system or an Oracle system. That's really useful if you have dimensional data, customer data, not the largest data set in the world, not the fastest moving data set in the world, but you don't want to move it. We can query that where it resides. Bringing in new sources is definitely, we all know that's a key to getting better insights. It's in your data, is joining sources together. And then from a query speed standpoint, there's a lot of things going on here. Everything from kind of Apache, the Apache Avro project, which is in memory format of Parquet and not kind of serialize and de-serialize the data back and forth. As well as what we call reflection, which is basically a re-indexing or pre-computing of the data, but we leave it in Parquet format, in a open format in the customer's account so that you can have aggregates and other things that are really popular in these dashboards pre-computed. So millisecond response, lightning fast, like tricks that a warehouse would do that the warehouses have been doing forever. Right? >> Yeah, more deals coming in. And obviously the architecture we'll get into that now has to handle the growth. And as your customers and practitioners see the volume and the variety and the velocity of the data coming in, how are they adjusting their data strategies to respond to this? Again, Cloud is clearly the answer, not the data warehouse, but what are they doing? What's the strategy adjustment? >> It's interesting when we start talking to folks, I think sometimes it's a really big shift in thinking about data architectures and data strategies when you look at the Dremio approach. It's very different than what most people are doing today around ETL pipelines and then bringing stuff into a warehouse and oh, the warehouse is too overloaded so let's build some cubes and extracts into the next tier of tools to speed up those dashboards for those tools. And Dremio has totally flipped this on a sentence and said, no, let's not do all those things. That's time consuming. It's brittle, it breaks. And actually your agility and the scope of what you can do with your data decreases. You go from all your data and all your data sources to smaller and smaller. We actually call it the perimeter doom and a lot of people look at this and say, yeah, that kind of looks like how we're doing things today. So from a Dremio perspective, it's really about no copy, try to keep as much data in one place, keep it in one open format and less data movement. And that's a very different approach for people. I think they don't realize how much you can accomplish that way. And your latency shrinks down too. Your actual latency from data created to insight is much shorter. And it's not because of the query response time, that latency is mostly because of data movement and copy and all these things. So you really want to shrink your time to insight. It's not about getting a faster query from a few seconds down, it's about changing the architecture. >> The data drift as they say, interesting there. I got to ask you on the personnel side, team side, you got the technical side, you got the non-technical consumers of the data, you got the data science or data engineering is ramping up. We mentioned earlier data engineering being Agile, is a key innovation here. As you got to blend the two personas of technical and non-technical people playing with data, coding with data, we're the bottlenecks in this process today. How can data teams overcome these bottlenecks? >> I think we see a lot of bottlenecks in the process today, a lot of data movement, a lot of change requests, update this dashboard. Oh, well, that dashboard update requires an ETL pipeline update, requires a column to be added to this warehouse. So then you've got these personas, like you said, some more technical, less technical, the data consumers, the data engineers. Well, the data engineers are getting totally overloaded with requests and work. And it's not even super value-add work to the business. It's not really driving big changes in their culture and insights and new new use cases for data. It's turning through kind of small changes, but it's taking too much time. It's taking days, if not weeks for these organizations to manage small changes. And then the data consumers, the less technical folks, they can't get the answers that they want. They're waiting and waiting and waiting and they don't understand why things are so challenging, how things could take so much time. So from a Dremio perspective, it's amazing to watch these organizations unleash their data. Get the data engineers, their productivity up. Stop dealing with some of the last mile ETL and small changes to the data. And Dremio actually says, hey, data consumers, here's a really nice gooey. You don't need to be a SQL expert, well, the tool will write the joints for you. You can click on a column and say, hey, I want to calculate a new field and calculate that field. And it's all done virtually so it's not changing the physical data sets. The actual data engineering team doesn't even really need to care at that point. So you get happier data consumers at the end of the day. They're doing things more self-service. They're learning about the data and the data engineering teams can go do value-add things. They can re-architecture the platform for the future. They can do POCs to test out new technologies that could support new use cases and bring those into the organization. Things that really add value, instead of just churning through backlogs of, hey, can we get a column added or we change... Everyone's doing app development, AB testing, and those developers are king. Those pipelines stream all this data down when the JSON files change. You need agility. And if you don't have that agility, you just get this endless backlog that you never... >> This is data as code in action. You're committing data back into the main brand that's been tested. That's what developers do. So this is really kind of the next step function. I got to put the customer hat on for a second and ask you kind of the pessimist question. Okay, we've had data lakes, I've got data lakes, it's been data lakes around, I got query engines here and there, they're all over the place, what's missing? What's been missing from the architecture to fully realize the potential of a data lakehouse? >> Yeah, I think that's a great question. The customers say exactly that John. They say, "I've got 22 databases, you got to be kidding me. You showed up with another database." Or, hey, let's talk about a Cloud data lake or a data lake. Again, I did the data lake thing. I had a data lake and it wasn't everything I thought it was going to be. >> It was bad. It was data swamp. >> Yeah, so customers really think this way, and you say, well, what's different this time around? Well, the Cloud in the original data lake world, and I'm just going to focus on data lakes, so the original data lake worlds, everything was still direct attached storage, so you had to scale your storage and compute out together. And we built these huge systems. Thousands of thousands of HDFS nodes and stuff. Well, the Cloud brought the separated compute and storage, but data lakes have never seen separated compute and storage until now. We went from the data lake with directed tap storage to the Cloud data warehouse with separated compute and storage. So the Cloud architecture and getting compute and storage separated is a huge shift in the data lake world. And that agility of like, well, I'm only going to apply it, the compute that I need for this question, for this answer right now, and not get 5,000 servers of compute sitting around at some peak moment. Or just 5,000 compute servers because I have five petabytes or 50 petabytes of data that need to be stored in the discs that are attached to them. So I think the Cloud architecture and separating compute and storage is the first thing that's different this time around about data lakes. But then more importantly than that is the metadata tier. Is the data tier and having sufficient metadata to have the functionality that people need on the data lake. Whether that's for governance and compliance standpoints, to actually be able to do a delete on your data lake, or that's for productivity and treating that data as code, like we're talking about today, and being able to time travel it, version it, branch it. And now these data lakes, the data lakes back in the original days were getting to 50 petabytes. Now think about how big these Cloud data lakes could be. Even larger and you can't move that data around so we have to be really intelligent and really smart about the data operations and versioning all that data, knowing which engine touch the data, which person was the last commit and being able to track all that, is ultimately what's going to make this successful. Because if you don't have the governance in place these days with data, the projects are going to fail. >> Yeah, and I think separating the query layer or SQL layer and the data tier is another innovation that you guys have. Also it's a managed Cloud service, Dremio Cloud now. And you got the open source angle too, which is also going to open up more standardization around some of these awesome features like you mentioned the joints, and I think you guys built on top of Parquet and some other cool things. And you got a community developing, so you get the Cloud and community kind of coming together. So it's the real world that is coming to light saying, hey, I need real world applications, not the theory of old school. So what use cases do you see suited for this kind of new way, new architecture, new community, new programability? >> Yeah, I see people doing all sorts of interesting things and I'm sure with what we've introduced with Dremio Arctic and the data is code is going to open up a whole new world of things that we don't even know about today. But generally speaking, we have customers doing very interesting things, very data application things. Like building really high performance data into use cases whether that's a supply chain and manufacturing use case, whether that's a pharma or biotech use case, a banking use case, and really unleashing that data right into an application. We also see a lot of traditional data analytics use cases more in the traditional business intelligence or dashboarding use cases. That stuff is totally achievable, no problems there. But I think the most interesting stuff is companies are really figuring out how to bring that data. When we offer the flexibility that we're talking about, and the agility that we're talking about, you can really start to bring that data back into the apps, into the work streams, into the places where the business gets more value out of it. Not in a dashboard that some person might have access to, or a set of people have access to. So even in the Dremio Cloud announcement, the press release, there was a customer, they're in Europe, it's called Garvis AI and they do AI for supply chains. It's an intelligent application and it's showing customers transparently how they're getting to these predictions. And they stood this all up in a very short period of time, because it's a Cloud product. They don't have to deal with provisioning, management, upgrades. I think they had their stuff going in like 30 minutes or something, like super quick, which is amazing. The data was already there, and a lot of organizations, their data's already in these Cloud storages. And if that's the case... >> If they have data, they're a use case. This is agility. This is agility coming to the data engineering field, making data programmable, enabling the data applications, the data ops for everybody, for coding... >> For everybody. And for so many more use cases at these companies. These data engineering teams, these data platform teams, whether they're in marketing or ad tech or Fiserv or Telco, they have a list. There's a list about a roadmap of use cases that they're waiting to get to. And if they're drowning underwater in the current tooling and barely keeping that alive, and oh, by the way, John, you can't go higher 30 new data engineers tomorrow and bring on the team to get capacity. You have to innovate at the architecture level, to unlock more data use cases because you're not going to go triple your team. That's not possible. >> It's going to unlock a tsunami of value. Because everyone's clogged in the system and it's painful. Right? >> Yeah. >> They've got delays, you've got bottlenecks. you've got people complaining it's hard, scar tissue. So now I think this brings ease of use and speed to the table. >> Yeah. >> I think that's what we're all about, is making the data super easy for everyone. This should be fun and easy, not really painful and really hard and risky. In a lot of these old ways of doing things, there's a lot of risk. You start changing your ETL pipeline. You add a column to the table. All of a sudden, you've got potential risk that things are going to break and you don't even know what's going to break. >> Proprietary, not a lot of volume and usage, and on-premises, open, Cloud, Agile. (John chuckles) Come on, which path? The curtain or the box, what are you going to take? It's a no brainer. >> Which way do you want to go? >> Mark, thanks for coming on theCUBE. Really appreciate it for being part of the AWS startup showcase data as code, great conversation. Data as code is going to enable a next wave of innovation and impact the future of data analytics. Thanks for coming on theCUBE. >> Yeah, thanks John and thanks to the AWS team. A great partnership between AWS and Dremio too. Talk to you soon. >> Keep it right there, more action here on theCUBE. As part of the showcase, stay with us. This is theCUBE, your leader in tech coverage. I'm John Furrier, your host, thanks for watching. (downbeat music)

Published Date : Apr 26 2022

SUMMARY :

of the AWS startup showcase, data as code. Hey John, nice to see you again. and the theme here is data as code. Lots of innovation on the platform, Now for the first time the production systems are seeing in the data side of it for the data engineer, So for instance, if the data sets change, I think it's going to change the world and object store in the And it's because the data extending on beyond the data lake of the data, but we leave and the variety and the the scope of what you can do I got to ask you on the and the data engineering teams kind of the pessimist question. Again, I did the data lake thing. It was data swamp. and really smart about the data operations and the data tier is another and the data is code is going the data engineering field, and bring on the team to get capacity. Because everyone's clogged in the system to the table. is making the data The curtain or the box, and impact the future of data analytics. Talk to you soon. As part of the showcase, stay with us.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
John Furrier	PERSON	0.99+
Mark Lyons	PERSON	0.99+
30 minutes	QUANTITY	0.99+
Telco	ORGANIZATION	0.99+
Mark	PERSON	0.99+
50 petabytes	QUANTITY	0.99+
five petabytes	QUANTITY	0.99+
two personas	QUANTITY	0.99+
5,000 servers	QUANTITY	0.99+
tomorrow	DATE	0.99+
hundreds of columns	QUANTITY	0.99+
22 databases	QUANTITY	0.99+
Dremio	ORGANIZATION	0.99+
trillions of records	QUANTITY	0.99+
Dremio	PERSON	0.99+
Dremio Arctic	ORGANIZATION	0.99+
Fiserv	ORGANIZATION	0.99+
first time	QUANTITY	0.98+
30 new data engineers	QUANTITY	0.98+
billions of records	QUANTITY	0.98+
thousands of columns	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Thousands of thousands	QUANTITY	0.98+
today	DATE	0.97+
one place	QUANTITY	0.97+
Oracle	ORGANIZATION	0.97+
Apache	ORGANIZATION	0.96+
S3	TITLE	0.96+
Git	TITLE	0.96+
Cloud	TITLE	0.95+
Hadoop	TITLE	0.95+
first month	QUANTITY	0.94+
Parquet	TITLE	0.94+
Dremio Cloud	TITLE	0.91+
5,000 compute servers	QUANTITY	0.91+
one	QUANTITY	0.91+
JSON	TITLE	0.89+
First	QUANTITY	0.89+
single application	QUANTITY	0.89+
Garvis	ORGANIZATION	0.88+
GitHub	ORGANIZATION	0.87+
Apache	TITLE	0.82+
episode	QUANTITY	0.79+
Agile	TITLE	0.77+
season two	QUANTITY	0.74+
Agile	ORGANIZATION	0.69+
DevOps	TITLE	0.67+
Startup Showcase S2 E2	EVENT	0.66+
Teradata	ORGANIZATION	0.65+
theCUBE	ORGANIZATION	0.64+

Video Exclusive: Oracle EVP Juan Loaiza Announces Lower Priced Entry Point for ADB

(upbeat music) >> Oracle is in the midst of an acceleration of its product cycles. It really has pushed new capabilities across its database, the database platforms, and of course the cloud in an effort to really maintain its position as the gold standard for cloud database. We've reported pretty extensively on Exadata, most recently the X9M that increased database IOPS and throughput. Organizations running mission critical OLTP, analytics and mix workloads tell us that they've seen meaningfully improved performance and lower costs, which you expect in a technology cycle. I often say if Oracle calls you out by name it's a compliment and it means you've succeeded. So just a couple of weeks ago, Oracle turned up the heat on MongoDB with a Mongo compatible API, in an effort to persuade developers to run applications in a autonomous database and on OCI, Oracle cloud infrastructure. There was a big emphasis by Oracle on acid compliance transactions and automatic scaling as well as access to multiple data types. This caught my attention because in the early days of no SQL, there was a lot of chatter from folks about not needing acid capability in the database anymore. Funny how that comes around. And anyway, you see Oracle investing, they spend money in R&D We've always said that`, they're protecting their moat. Now in social I've seen some criticisms like Oracle still is not adding enough new logos, and Oracle of course will dispute that and give you some examples. But to me what's most impressive is the big name customers that Oracle gets to talk in public. Deutsche Bank, Telephonic, Experian, FedEx, I mean dozens and dozens and dozens. I work with a lot of companies and the quality of the customers Oracle puts in front of analysts like myself is very very high. At the top of the list I would say. And they're big spending customers. And as we said many times when it comes to mission critical workloads, Oracle is the king. And one of the executives behind the success is a longtime Cube alum, Juan Loaiza who's executive vice president of mission critical technologies at Oracle. And we've invited him back on today to talk about some news and Oracle's latest developments and database, Juan welcome back to the show and thanks for coming on today and talking about today's announcement. >> I'm very happy to be here today with you. >> Okay, so what are you announcing and how does this help organizations particularly with those existing Exadata cloud at customer installations? >> Yeah, the big thing we're announcing is our very successful cloud at customer platform. We're extending the capabilities of our autonomous database running on it. And specifically we're allowing much smaller configurations so customers can start small and grow with our autonomous database on our cloud customer platform. >> So let's get into granularity a little bit and double click on this. Can you go over how customers, carve up VM clusters for different workloads? What's the tangible benefit to them? >> Yeah, so it's pretty straightforward. We deploy our Cloud@Customer system anywhere the customer wants it, let's say in their data center. And then through our cloud APIs and GUIs they can carve up into pieces into basically VMs. They can say, Hey, I want a VM with eight CPUs to do this, I want a VM with 20 CPUs to that, I want a 500 CPUVM to do something else. And that's what we call a VM cluster because in Cloud@Customer, it is a highly available environment. So you don't just get one VM, you get a cluster of highly available VMs. So you carve it up. You hand it out to different aspects of a company. You might have development on one, testing on another one, some production sales on one VM, marketing on a different VM. And then you run your databases in there and that's kind of how it works and it's all done completely through our GUI and it's very, very simple 'cause they use it the same cloud APIs and GUIs that we use in the public cloud. It is the same APIs and GUIs that we use in the public cloud. >> Yeah, I was going to say sounds like cloud. So what about prerequisites? What do customers have to do to take advantage of the new capabilities? Can they run it on an Exadata cloud a customer that they installed a couple years ago? Do they have to upgrade the hardware? What migration pain is involved? >> Yeah, there's no pain, so it's just, (coughs) excuse me. I can take their existing system, they get our free software update and they can just deploy autonomous database as a VM in their existing Exadata cloud system. >> Oh nice okay what's the bottom line dollars? Our audience are always interested in cutting costs. It's one of the reasons they're moving to the cloud for example. So how does autonomous database on VM clusters, on Exadata Cloud at Customer? How does it help cut their cost? >> Well, it's pretty straightforward. So previous to this a customer would have to have dedicated a system to either autonomous database or to non autonomous data. So you have to choose one together. So on a system by system basis, you chose I want this thing autonomous, or I don't want it autonomous. Now you carve in the VMs and say for this VM I want that autonomous for that VM I want to run a regular database managed database on there. So lets customers now start small with any size they want. They could start with two CPUs and run an autonomous database and that's all they pay for is the two CPUs that they use. >> Let's talk a little about traction. I mean, I remember we covered the original Exadata announcement quite a long time ago and it's obviously evolved and taken many forms. Look, it's hard to argue that it hasn't been a big success. It has for Oracle and your target customers. Does this announcement make Exadata cloud a customer more attractive for smaller companies. In other words, does it expand the team for ADB? And if so, how? >> Yeah, absolutely. I mean our Exadata cloud platform is extremely successful. We have thousands of deployments, we have on our data platform we have about almost 90% of the global fortune 100 and thousands of smaller customers. In the cloud we have now up to 40% of the global 100 a hundred biggest companies in the world running on that. So it's been extremely successful platform and cloud a customer is super key. A lot of customers can't move their data to the public cloud. So we bring the public cloud to them with our cloud customer offering. And so that's the big customer is the fortune hundred but we have thousands of smaller customers also. And the nice thing about this offering is we can start with literally two CPUs. So we can be a very small customer and still run our autonomous data based on our cloud customer platform. >> Well, everybody cares about security and governance. I mean, especially the big guys, but the little guys that in many ways as well they want the capabilities of the large companies but they can't necessarily afford it. So I want to talk about security in particular governance and it's especially important for mission-critical apps. So how does this all change the security in governance paradigm? What do customers need to know there? >> Yeah, so the beauty of autonomous database which is the thing that we're talking about today is Oracle deals with all the security. So the OS, the hardware, firmware, VMs, the database itself all the interfaces to the VM, to the database all that is it's all done by Oracle. So, which is incredibly important because there's a constant stream of security alerts that are coming out and it's very difficult for customers to keep up with this stuff. I mean, it's hard for us and we have thousands of engineers. And so we take that whole burden away from customers. And you just don't have to think about it, we deal with it. So once you deploy an autonomous database it is always secure because anytime a security alert comes out, we will apply that and we do it in an online fashion also. So it's really, particularly for smaller customers it's even harder because to keep up with all the security that you you need a giant team of security experts and even the biggest customers struggle with that and a small customer's going to really struggle. There's just two, you have to look at the entire stack, all the different components switches, firmware, OS, VMs, database, everything. It's just very difficult to keep up. So we do it all and for small cut, they just can't do it. So really they really need to partner with a company like Oracle that has thousands of engineers that can keep up with this stuff. >> It's true what you say, even large customers this CSOs will tell you that lack of talent, lack of skill sets. They just don't have enough people and so even the big guys can't keep up. Okay, I want you to pitch me as though I'm a developer, which I'm not, but we got a lot of developers in our community. We'll be Cube con next month in Valencia, sell me on why a developer should lean into ADB on Exadata cloud as a customer? >> Yeah, it's very straightforward. So Oracle has the most advanced database in the industry and that's widely recognized by database analysts and experts in the field. Traditionally, it's been hard for a developer to use it because it's been hard to manage. It's been hard to set up, install, configure, patch, back up all that kind of stuff. Autonomous database does it all for you. So as a developer, you can just go into our console, click on creating a database. We ask you four questions, how big, how many CPUs how much storage and say, give your password. And within minutes you have a database. And at that point you can go crazy and just develop. And you don't have to worry about managing the database, patching the database, maintaining the security and the database backing up to all that stuff. You can instantly scale it. You can say, Hey, I want to grow it, you just click a button, take, grow it to much any size you want and you get all the mission critical capabilities. So it works for tiny databases but it is a stock exchange quality in terms of performance, availability, security it's a rock solid database that's super trivial. So what used to be a very complex thing is now completely trivial for a developer. So they get the best of both worlds, they get everything on the database side and it it's trivial for them to use. >> Wow, if you're doing all that stuff for 'em are they going to do on their weekends? Code? (chuckles) >> They should be developing their application and add value to their company that's kind of what they should focus on. And they can be looking at all sorts of new technologies like JSON and the database machine learning in the database graph in the database. So you can build very sophisticated applications because you don't have to worry about the database anymore. >> All right, let's talk about the competition. So it's always a topic I like to bring up with you. From a competitive perspective how is this latest and instantiation of Exadata cloud a customer X9M how's this different from running an AWS database service for instance on outpost, or let's say I want to run SQL server on Azure Stack or whatever Microsoft's calling it these days. Give us the competitive angle here. >> Yeah, there kind of is no real competition. So both Amazon and Microsoft have an at customer solution but they're very primitive. I mean, just to give you an example like Amazon doesn't run any of their premier database offerings at customers. So whether it's Aurora Redshift, doesn't run just plane does not run. It's not that it runs badly or it's got limited, just does not run. They can't run Oracle RDS on premise and same thing with Microsoft. They can't run Azure SQL, which is their premier database on their act customer platform. So that kind of tells you how limited that platform is when even their own premier offerings doesn't run on it. In contrast, we're running Exadata with our premier autonomous database. So it's our premier platform that's in use today by most of the biggest, banks, telecom to retailers et cetera in the world, thousands of smaller customers. So it's super mission critical, super proven with our premier cloud database, which is autonomous theory. So it couldn't be more black and white, this is a case where it's there really is no competition in the cloud of customer space on the database side. >> Okay, but let me follow up on that, Juan, if I may, so, okay. So it took you guys a while to get to the cloud, it's taken them a while to figure it on-prem. I mean, aren't they going to eventually sort of get there? What gives you confidence that you'll be able to to keep ahead? >> Well, there's two things, right? One is we've been doing this for a long time. I mean, that's what Oracle initially started as an on-prem and our Exadata platform has been available for over a decade. And we have a ton of experience on this. We run the biggest banks in the world already, it's not some hope for the future. This is what runs today. And our focus has always been a combination of cloud and on-prem their heart's not really in the on-prem stuff they really like. Amazon's really a public cloud only vendor and you can see from the result, it's not you can say, they can say whatever they want but you can see the results. Their outpost platform has been available for several years now and it still doesn't even run their own products. So you can kind of see how hard they're trying and how much they really care about this market. >> All right, boil it down if you just had a few things that you'd tell someone about why they should run ADB on Exadata cloud at customer, what would you say? >> It's pretty simple, which is it's the world's most sophisticated database made completely simple, that's it? So you get a stock exchange level database, you can start really small and grow and it's completely trivial to run because Oracle is automated everything within our autonomous data we use machine learning and a lot of automation to automate everything around the database. So it's kind of the best of both worlds. The best possible database starts as small as you want and is the simplest database in the world. >> So I probably should have asked you this while I was pushing the competitive question but this may be my last question, I promise. It's the age old debate It rages on, you got specialized databases kind of a right tool for the right job approach. That's clearly where Amazon is headed or what Oracle refers to is converge database. Oracle says its approach is more complete and "simpler." Take us through your thinking on this and the latest positioning so the audience can understand it a bit better. >> Yeah, so apps aren't what they used to business apps, data driven apps aren't what they used to be. They used to be kind of green screens where you just entered data. Now everyone's a very sophisticated app, they want to be have location, they want to have maps, they want to have graph in there. They want to have machine learning, they want machine learning built into the app. So they want JSON they want text, they want text search. So all these capabilities are what a modern app has to support. And so what Oracle's done is we provided a single solution that provides everything you need to build a modern app and it's all integrated together. It's all transactional. You have analytics built into the same thing. You have reporting built into the same thing. So it has everything you need to build a modern app. In contrast, what most of our competitors do is they give you these little solutions, say, okay here you do machine learning over here, you do analytics over there, you do JSON over here, you do spatial over here you do graph over there. And then it's left a developer to put an app together from all these pieces. So it's like getting the pieces of a card and having to assemble it yourself and then maintain it for the rest of your life, which is the even harder part. So one part upgrades, you got to test that. So of other piece upgrade or changes, you got to test that, you got to deal with all the security problems of all these different systems. You have to convert the data, you have to move the data back and forth it's extraordinarily complicated. Our converge database, the data sits in one place and all the algorithms come to the data. It's very simple, it is dramatically simpler. And then autonomous database is what makes managing it trivial. You don't really have to manage anything more because Oracle's automated the whole thing. >> So, Juan, we got a pretty good Cadence going here. I mean I really appreciate you coming on and giving us these little video exclusives. You can tell by again, that Cadence how frequently you guys are making new announcements. So that's great, congrats on yet another announcement. Thanks for coming back in the program appreciate it. >> Yeah, of course we invest heavily in data management. That's our core and we will continue to do that. I mean, we're investing billions of dollars a year and we intend to stay the leaders in this market. >> Great stuff and thank you for watching the Cube, your leader in enterprise tech coverage, this is Dave Vellante we'll see you next time.

Published Date : Mar 16 2022

SUMMARY :

and of course the cloud be here today with you. Yeah, the big thing we're announcing What's the tangible benefit to them? So you don't just get one VM, Do they have to upgrade the hardware? and they can just deploy It's one of the reasons So on a system by system basis, you chose and it's obviously evolved And so that's the big customer I mean, especially the big and even the biggest and so even the big guys can't keep up. and the database backing So you can build very about the competition. So that kind of tells you how limited So it took you guys a and you can see from the result, So it's kind of the best of both worlds. and the latest positioning and all the algorithms come to the data. I mean I really appreciate you coming on and we intend to stay the you for watching the Cube,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
FedEx	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Experian	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Juan	PERSON	0.99+
Deutsche Bank	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Juan Loaiza	PERSON	0.99+
Telephonic	ORGANIZATION	0.99+
20 CPUs	QUANTITY	0.99+
Valencia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
today	DATE	0.99+
two CPUs	QUANTITY	0.99+
two things	QUANTITY	0.99+
two	QUANTITY	0.99+
Cadence	ORGANIZATION	0.99+
four questions	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
thousands of deployments	QUANTITY	0.98+
eight CPUs	QUANTITY	0.98+
both	QUANTITY	0.98+
Azure Stack	TITLE	0.98+
MongoDB	TITLE	0.98+
Azure SQL	TITLE	0.97+
both worlds	QUANTITY	0.97+
JSON	TITLE	0.97+
over a decade	QUANTITY	0.96+
next month	DATE	0.96+
single solution	QUANTITY	0.94+
Aurora Redshift	TITLE	0.94+
one VM	QUANTITY	0.94+
ADB	ORGANIZATION	0.94+
SQL	TITLE	0.94+
thousands of engineers	QUANTITY	0.94+
100	QUANTITY	0.94+
one part	QUANTITY	0.93+
billions of dollars a year	QUANTITY	0.93+
up to 40%	QUANTITY	0.93+
500 CPUVM	QUANTITY	0.92+
one place	QUANTITY	0.92+
couple of weeks ago	DATE	0.92+
couple years ago	DATE	0.87+
dozens	QUANTITY	0.87+
Mongo	TITLE	0.87+
Exadata	ORGANIZATION	0.86+
Cube	ORGANIZATION	0.85+

Video Exclusive: Oracle Lures MongoDB Devs With New API for ADB

(upbeat music) >> Oracle continues to pursue a multi-mode converged database strategy. The premise of this all in one approach is to make life easier for practitioners and developers. And the most recent example is the Oracle database API for MongoDB, which was announced today. Now, Oracle, they're not the first to come out with a MongoDB compatible API, but Oracle hopes to use its autonomous database as a differentiator and further build a moat around OCI, Oracle Cloud Infrastructure. And with us to talk about Oracle's MongoDB compatible API is Gerald Venzl, who's a distinguished Product Manager at Oracle. Gerald was a guest along with Maria Colgan on the CUBE a while back, and we talked about Oracle's converge database and the kind of Swiss army knife strategy, I called it, of databases. This is dramatically different. It's an approach that we see at the opposite end of the the spectrum, for instance, from AWS, who, for example, goes after the world of developers with a different database for every use case. So, kind of picking up from there, Gerald, I wonder if you could talk about how this new MongoDB API adds to your converged model and the whole strategy there. Where does it fit? >> Yeah, thank you very much, Dave and, by the way, thanks for having me on the CUBE again. A pleasure to be here. So, essentially the MongoDB API to build the compatibility that we used with this API is a continuation of the converge database story, as you said before. Which is essentially bringing the many features of the many single purpose databases that people often like and use, together into one technology so that everybody can benefit from it. So as such, this is just a continuation that we have from so many other APIs or standards that we support. Since a long time, we already, of course to SQL because we are relational database from the get go. Also other standard like GraphQL, Sparkle, et cetera that we have. And the MongoDB API, is now essentially just the next step forward to give the developers this API that they've gotten to love and use. >> I wonder if you could talk about from the developer angle, what do they get out of it? Obviously you're appealing to the Mongo developers out there, but you've got this Mongo compatible API you're pouting the autonomous database on OCI. Why aren't they just going to use MongoDB Atlas on whatever cloud, Azure or AWS or Google Cloud platform? >> That's a very good question. We believe that the majority of developers want to just worry about their application, writing the application, and not so much about the database backend that they're using. And especially in cloud with cloud services, the reason why developers choose these services is so that they don't have to manage them. Now, autonomous database brings many topnotch advanced capabilities to database cloud services. We firmly believe that autonomous database is essentially the next generation of cloud services with all the self-driving features built in, and MongoDB developers writing applications against the MongoDB API, should not have to hold out on these capabilities either. It's like no developer likes to tune the database. No developer likes to take a downtime when they have to rescale their database to accommodate a bigger workload. And this is really where we see the benefit here, so for the developer, ideally nothing will change. You have MongoDB compatible API so they can keep on using their tools. They can build the applications the way that they do, but the benefit from the best cloud database service out there not having to worry about any of these package things anymore, that even MongoDB Atlas has a lot of shortcomings still today, as we find. >> Of cos, this is always a moving target The technology business, that's why we love it. So everybody's moving fast and investing and shaking and jiving. But, I want to ask you about, well, by the way, that's so you're hiding the underlying complexity, That's really the big takeaway there. So that's you huge for developers. But take, I was talking before about, the Amazon's approach, right tool for the right job. You got document DB, you got Microsoft with Cosmos, they compete with Mongo and they've been doing so for some time. How does Oracle's API for Mongo different from those offerings and how you going to attract their users to your JSON offering. >> So, you know, for first of all we have to kind of separate slightly document DB and AWS and Cosmos DB in Azure, they have slightly different approaches there. Document DB essentially is, a document store owned by and built by AWS, nothing different to Mongo DB, it's a head to head comparison. It's like use my document store versus the other document store. So you don't get any of the benefits of a converge database. If you ever want to do a different data model, run analytics over, etc. You still have to use the many other services that AWS provides you to. You cannot all do it into one database. Now Cosmos DB it's more in interesting because they claim to be a multi-model database. And I say claim because what we understand as multi-model database is different to what they understand as multimodel database. And also one of the reasons why we start differentiating with converge database. So what we mean is you should be able to regardless what data format you want to store in the database leverage all the functionality of the database over that data format, with no trade offs. Cosmos DB when you look at it, it essentially gives you mode of operation. When you connect as the application or the user, you have to decide at connection time, how you want, how this database should be treated. Should it be a document store? Should it be a graph store? Should it be a relational store? Once you make that choice, you are locked into that. As long as you establish that connection. So it's like, if you say, I want a document store, all you get is a document store. There's no way for you to crossly analyze with the relational data sitting in the same service. There's no for you to break these boundaries. If you ever want to add some graph data and graph analytics, you essentially have to disconnect and now treat it as a graph store. So you get multiple data models in it, but really you still get, one trick pony the moment you connect to it that you have to choose to. And that is where we see a huge differentiation again with our converge database, because we essentially say, look, one database cloud service on Oracle cloud, where it allows you to do anything, if you wish to do so. You can start as a document store if you wish to do so. If you want to write some SQL queries on top, you can do so. If you want to add some graph data, you can do so. But there's no way for you to have to rewrite your application, use different libraries and frameworks now to connect et cetera, et cetera. >> Got it. Thank you for that. Do you have any data when you talk to customers? Like I'm interested in the diversity of deployments, like for instance, how many customers are using more than one data model? Do for instance, do JSON users need support for other data types or are they happy to stay kind of in their own little sandbox? Do you have any data on that? >> So what we see from the majority of our customers, there is no such thing as one data model fits everything. So, and it's like, there again we have to differentiate the developer that builds a certain microservice, that makes happy to stay in the JSON world or relational world, or the company that's trying to derive value from the data. So it's like the relational model has not gone away since 40 years of it existence. It's still kicking strong. It's still really good at what it does. The JSON data model is really good in what it does. The graph model is really good at what it does. But all these models have been built for different purposes. Try to do graph analytics on relational or JSON data. It's like, it's really tricky, but that's why you use a graph model to begin with. Try to shield yourself from the organization of the data, how it's structured, that's really easy in the relational world, not so much when you get into a document store world. And so what we see about our customers is like as they accumulate more data, is they have many different applications to run their enterprises. The question always comes back, as we have predicted since about six, seven years now, where they say, hey, we have all this different data and different data formats. We want to bring it all together, analyze it together, get value out of the data together. We have seen a whole trend of big data emerge and disappear to answer the question and didn't quite do the trick. And we are basically now back to where we were in the early 2000's when XML databases have faded away, because everybody just allowed you to store XML in the database. >> Got it. So let's make this real for people. So maybe you could give us some examples. You got this new API from Mongo, you have your multi model database. How, take a, paint a picture of how customers are going to benefit in real world use cases. How does it kind of change the customer's world before and after if you will? >> Yeah, absolutely. So, you know the API essentially we are going to use it to accept before, you know, make the lives of the developers easier, but also of course to assist our customers with migrations from Mongo DB over to Oracle Autonomous Database. One customer that we have, for example, that would've benefited of the API several a couple of years ago, two, three years ago, it's one of the largest logistics company on the planet. They track every package that is being sent in JSON documents. So every track package is entries resembled in a JSON document, and they very early on came in with the next question of like, hey, we track all these packages and document in JSON documents. It will be really nice to know actually which packages are stuck, or anywhere where we have to intervene. It's like, can we do this? Can we analyze just how many packages get stuck, didn't get delivered on, the end of a day or whatever. And they found this struggle with this question a lot, they found this was really tricky to do back then, in that case in MongoDB. So they actually approached Oracle, they came over, they migrated over and they rewrote their applications to accommodate that. And there are happy JSON users in Oracle database, but if we were having this API already for them then they wouldn't have had to rewrite their applications or would we often see like worry about the rewriting the application later on. Usually migration use cases, we want to get kind of the migration done, get the data over be running, and then worry about everything else. So this would be one where they would've greatly benefited to shorten this migration time window. If we had already demo the Mongo API back then or this compatibility layer. >> That's a good use case. I mean, it's, one of the most prominent and painful, so anything you could do to help that is key. I remember like the early days of big data, NoSQL, of course was the big thing. There was a lot of confusion. No, people thought was none or not only SQL, which is kind of the more widely accepted interpretation today. But really, it's talking about data that's stored in a non-relational format. So, some people, again they thought that SQL was going to fade away, some people probably still believe that. And, we saw the rise of NoSQL and document databases, but if I understand it correctly, a premise for your Mongo DB API is you really see SQL as a main contributor over Mongo DB's document collections for analytics for example. Can you make, add some color here? Are you seeing, what are you seeing in terms of resurgence of SQL or the momentum in SQL? Has it ever really waned? What's your take? >> Yeah, no, it's a very good point. So I think there as well, we see to some extent history repeating itself from, this all has been tried beforehand with object databases, XML database, et cetera. But if we stay with the NoSQL databases, I think it speaks at length that every NoSQL database that as you write for the sensor you started with NoSQL, and then while actually we always meant, not only SQL, everybody has introduced a SQL like engine or interface. The last two actually join this family is MongoDB. Now they have just recently introduced a SQL compatibility for the aggregation pipelines, something where you can put in a SQL statement and that essentially will then work with aggregation pipeline. So they all acknowledge that SQL is powerful, for us this was always clear. SQL is a declarative language. Some argue it's the only true 4GL language out there. You don't have to code how to get the data, but you just ask the question and the rest is done for you. And, we think that as we, basically, has SQL ever diminished as you said before, if you look out there? SQL has always been a demand. Look at the various developer surveys, etc. The various top skills that are asked for SQL has never gone away. Everybody loves and likes and you wants to use SQL. And so, yeah, we don't think this has ever been, going away. It has maybe just been, put in the shadow by some hypes. But again, we had the same discussion in the 2000's with XML databases, with the same discussions in the 90's with object databases. And we have just frankly, all forgotten about it. >> I love when you guys come on and and let me do my thing and I can pretty much ask any question I want, because, I got to say, when Oracle starts talking about another company I know that company's doing well. So I like, I see Mongo in the marketplace and I love that you guys are calling it out and making some moves there. So here's the thing, you guys have a large install base and that can be an advantage, but it can also be a weight in your shoulder. These specialized cloud databases they don't have that legacy. So they can just kind of move freely about, less friction. Now, all the cloud database services they're going to have more and more automation. I mean, I think that's pretty clear and inevitable. And most if not all of the database vendors they're going to provide support for these kind of converged data models. However they choose to do that. They might do it through the ecosystem, like what Snowflake's trying to do, or bring it in the house themselves, like a watch maker that brings an in-house movement, if you will. But it's like death and taxes, you can't avoid it. It's got to happen. That's what customers want. So with all that being said, how do you see the capabilities that you have today with automation and converge capabilities, How do you see that, that playing out? What's, do you think it gives you enough of an advantage? And obviously it's an advantage, but is it enough of an advantage over the specialized cloud database vendors, where there's clearly a lot of momentum today? >> I mean, honestly yes, absolutely. I mean, we are with some of these databases 20 years ahead. And I give you concrete examples. It's like Oracle had transaction support asset transactions since forever. NoSQL players all said, oh, we don't need assets transactions, base transactions is fine. Yada, yada, yada. Mongo DB started introducing some transaction support. It comes with some limits, cannot be longer than 60 seconds, cannot touch more than a thousand documents as well, et cetera. They still will have to do some catching up there. I mean, it took us a while to get there, let's be honest. Glad We have been around for a long time. Same thing, now that happened with version five, is like we started some simple version of multi version concurrency control that comes along with asset transactions. The interesting part here is like, we've introduced this also an Oracle five, which was somewhere in the 80's before I even started using Oracle Database. So there's a lot of catching up to do. And then you look at the cloud services as well, there's actually certain, a lot of things that we kind of gotten take, we've kind of, we Oracle people have taken for granted and we kind of keep forgetting. For example, our elastic scale, you want to add one CPU, you add one CPU. Should you take downtime for that? Absolutely not. It's like, this is ridiculous. Why would you, you cannot take it downtime in a 24/7 backend system that runs the world. Take any of our customers. If you look at most of these cloud services or you want to reshape, you want to scale your cloud service, that's fine. It's just the VM under the covers, we just shut everything down, give you a VM with more CPU, and you boot it up again, downtown right there. So it's like, there's a lot of these things where we go like, well, we solved this frankly decades ago, that these cloud vendors will run into. And just to add one more point here, so it's like one thing that we see with all these migrations happening is exactly in that field. It's like people essentially started building on whether it's Mongo DB or other of these NoSQL databases or cloud databases. And eventually as these systems grow, as they ask more difficult questions, their use cases expand, they find shortcomings. Whether it's the scalability, whether it's the security aspects, the functionalities that we have, and this is essentially what drives them back to Oracle. And this is why we see essentially this popularity now of pendulum swimming towards our direction again, where people actually happily come over back and they come over to us, to get their workloads enterprise grade if you like. >> Well, It's true. I mean, I just reported on this recently, the momentum that you guys have in cloud because it is, 'cause you got the best mission critical database. You're all about maps. I got to tell you a quick story. I was at a vertical conference one time, I was on stage with Kurt Monash. I don't know if you know Kurt, but he knows this space really well. He's probably forgot and more about database than I'll ever know. But, and I was kind of busting his chops. He was talking about asset transactions. I'm like, well with NoSQL, who needs asset transactions, just to poke him. And he was like, "Are you out of your mind?" And, and he said, look it's everybody is going to head in this direction. It turned out, it's true. So I got to give him props for that. And so, my last question, if you had a message for, let's say there's a skeptical developer out there that's using Mongo DB and Atlas, what would you say to them? >> I would say go try it for yourself. If you don't believe us, we have an always free cloud tier out there. You just go to oracle.com/cloud/free. You sign up for an always free tier, spin up an autonomous database, go try it for yourself. See what's actually possible today. Don't just follow your trends on Hackernews and use a set study here or there. Go try it for yourself and see what's capable of >> All right, Gerald. Hey, thanks for coming into my firing line today. I really appreciate your time. >> Thank you for having me again. >> Good luck with the announcement. You're very welcome, and thank you for watching this CUBE conversation. This is Dave Vellante, We'll see you next time. (gentle music)

Published Date : Feb 10 2022

SUMMARY :

the first to come out the next step forward to I wonder if you could talk is so that they don't have to manage them. and how you going to attract their users the moment you connect to it you talk to customers? So it's like the relational So maybe you could give us some examples. to accept before, you know, make API is you really see SQL that as you write for the and I love that you And I give you concrete examples. the momentum that you guys have in cloud If you don't believe us, I really appreciate your time. and thank you for watching

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Maria Colgan	PERSON	0.99+
Gerald Venzl	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Gerald	PERSON	0.99+
Kurt	PERSON	0.99+
NoSQL	TITLE	0.99+
MongoDB	TITLE	0.99+
JSON	TITLE	0.99+
SQL	TITLE	0.99+
MongoDB Atlas	TITLE	0.99+
40 years	QUANTITY	0.99+
Mongo	ORGANIZATION	0.99+
one	QUANTITY	0.99+
One customer	QUANTITY	0.99+
oracle.com/cloud/free	OTHER	0.98+
first	QUANTITY	0.98+
Kurt Monash	PERSON	0.98+
more than a thousand documents	QUANTITY	0.98+
today	DATE	0.98+
one time	QUANTITY	0.97+
two	DATE	0.97+
one database	QUANTITY	0.97+
more than one data model	QUANTITY	0.97+
one thing	QUANTITY	0.97+
90's	DATE	0.97+
one technology	QUANTITY	0.96+
20 years	QUANTITY	0.96+
80's	DATE	0.96+
one more point	QUANTITY	0.95+
decades ago	DATE	0.95+
one data model	QUANTITY	0.95+
Azure	TITLE	0.94+
three years ago	DATE	0.93+
seven years	QUANTITY	0.93+
version five	OTHER	0.92+
one approach	QUANTITY	0.92+

Analyst Predictions 2022: The Future of Data Management

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

ENTITIES

Entity	Category	Confidence
381 databases	QUANTITY	0.99+
2014	DATE	0.99+
2022	DATE	0.99+
2021	DATE	0.99+
january of 2022	DATE	0.99+
100 users	QUANTITY	0.99+
jamal dagani	PERSON	0.99+
last week	DATE	0.99+
dave meninger	PERSON	0.99+
sanji	PERSON	0.99+
second question	QUANTITY	0.99+
15 converged data stores	QUANTITY	0.99+
dave vellante	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
three	QUANTITY	0.99+
sanjeev	PERSON	0.99+
2023	DATE	0.99+
15 data stores	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
last year	DATE	0.99+
sanjeev mohan	PERSON	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
carl	PERSON	0.99+
tony	PERSON	0.99+
carl olufsen	PERSON	0.99+
six years	QUANTITY	0.99+
david	PERSON	0.99+
carlos specter	PERSON	0.98+
both sides	QUANTITY	0.98+
2010s	DATE	0.98+
first backlash	QUANTITY	0.98+
five years	QUANTITY	0.98+
today	DATE	0.98+
dave	PERSON	0.98+
each	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
first	QUANTITY	0.98+
single platform	QUANTITY	0.98+
lake house	ORGANIZATION	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
doug	PERSON	0.97+
one word	QUANTITY	0.97+
this year	DATE	0.97+
wikibon.com	OTHER	0.97+
one platform	QUANTITY	0.97+
39	QUANTITY	0.97+
about 600 percent	QUANTITY	0.97+
two analysts	QUANTITY	0.97+
ten years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
five	QUANTITY	0.96+
one	QUANTITY	0.96+
three quarters	QUANTITY	0.96+
california	LOCATION	0.96+
google	ORGANIZATION	0.96+
single	QUANTITY	0.95+

Predictions 2022: Top Analysts See the Future of Data

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

ENTITIES

Entity	Category	Confidence
Dave Menninger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Henschen	PERSON	0.99+
David	PERSON	0.99+
Brad Shimmin	PERSON	0.99+
Doug	PERSON	0.99+
Tony Baer	PERSON	0.99+
Dave Velannte	PERSON	0.99+
Tony	PERSON	0.99+
Carl	PERSON	0.99+
Brad	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2014	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
January of 2022	DATE	0.99+
three	QUANTITY	0.99+
381 databases	QUANTITY	0.99+
IDC	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
2021	DATE	0.99+
Google	ORGANIZATION	0.99+
Omdia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
SanjMo	ORGANIZATION	0.99+
79%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last week	DATE	0.99+
15 data stores	QUANTITY	0.99+
100%	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+

Thomas Hazel, ChaosSearchJSON Flex on ChaosSearch

[Thomas Hazel] - Hello, this is Thomas Hazel, founder CTO here at ChaosSearch. And tonight I'm going to demonstrate a new feature we are offering this quarter called JSON Flex. If you're familiar with JSON datasets, they're wonderful ways to represent information. You know, they're multidimensional, they have ability to set up arrays as attributes but those arrays are really problematic when you need to expand them or flatten them to do any type of elastic search or relational access, particularly when you're trying to do aggregations. And so the common process is to exclude those arrays or pick and choose that information. But with this new Chaos Flex capability, our system uniquely can index that data horizontally in a very small and efficient representation. And then with our Chaos Refinery, expand each attribute as you wish vertically, so you can do all the basic and natural constructs you would have done if you had, you know, a more straightforward, two dimensional, three dimensional type representation. So without further ado, I'mma get into this presentation of JSON Flex. Now, in this case, I've already set up the service to point to a particular S3 account that has CloudTrail data, one that is pretty problematic when it comes down to flattening data. And again, if you know CloudTrail, one row can become 10,000 as data gets flattened. So without further ado, let me jump right in. When you first log into the ChaosSearch service, you'll see a tab called 'Storage'. This is the S3 account, and I have variety of buckets. I have the refinery, it's a data refinery. This is where we create views or lenses into these index streams that you can do analysis that publishes it in elastic API as an index pattern or relational table in SQL Now a particular bucket I have here is a whole bunch of demonstration datasets that we have to show off our capabilities and our offering. In this bucket, I have CloudTrail data and I'm going to create what we call a 'object group'. An object group is a entry point, a filter of which files I want to index that data. Now, it can be statically there or a live streaming. These object groups had the ability to say, what type of data do you want to index on? Now through our wizard, you can type in, you know, prefix in this case, I want to type in CloudTrail, and you see here, I have a whole bunch of CloudTrail. I'mma choose one file to make it quick and easy. But this particular CloudTrail data will expand and we can show the capability of this horizontal to vertical expansion. So I walked through the wizard, as you can see here, we discovered JSON, it's a gzip file. Leave flattening unlimited 'cause we want to be able to expand infinitely. But this case, instead of doing default virtual, I'm going to horizontally represent this information. And this uniquely compresses the data in a way that can be stored efficiently on disc but then expanded in our data refinery on Pond Query or search requests. So I'mma create this object group. Now I'm going to call this, you know, 'JSON Flex test' and I could set up live indexing, SQS pops up but I'mma skip that and skip Retention and just create it. Once this object group is created, you kind of think of it as a virtual bucket, 'cause it does filter the data as you can see here. When I look at the view, I just see CloudTrail, but within the console, I can say start indexing. Now this is static data there could be a live stream and we set up workers to index this data. Whether it's one file, a million files or one terabyte, or one petabyte, we index the data. We discover all the schema, and as you see here, we discovered 104 columns. Now what's interesting is that we represent this expansion in a horizontal way. You know, if you know CloudTrail records zero, record one, record two. This can expand pretty dramatically if you fully flatten it but this case we horizontally representing it as the index. So when I go into the data refinery, I can create a view. Now, if you know the data refinery of ChaosSearch, you can bring multiple data streams together. You can do transformations virtually, you can do correlations, but in this case, I'm just going to take this one particular index stream, we call 'JSON Flex' and walk through a wizard, we try to simplify everything and select a particular attribute to expand. Now, again, we represent this in one row but if you had arrays and do all the permutations, it could go one to 100 to 10,000. We had one JSON audit that went from one row to 1 million rows. Now, clearly you don't want to create all those permutations, when you're tryna put into a database. With our unique index technology, you can do it virtually and sort horizontally. So let me just select 'Virtual' and walk through the wizard. Now, as I mentioned, we do all these different transformations changed schema, we're going to skip all that and select the order time, records event and say, 'create this'. I'm going to say, you know, 'JSON Flex View', I can set up caching, do a variety of things, I'm going to skip that. And once I create this, it's now available in the elastic API as an index pattern, as well as SQL via our Presto API dialect. And you can use Looker, Tableau, et cetera. But in this case, we go to this 'Analytics tab' and we built in the Kibana, open search tooling that is Apache Tonetto. And I click on discovery here and I'm going to select that particular view. Again, it looks like, oops, it looks like an index pattern, and I'mma choose, let's see here, let's choose 15 years from past and present and make sure I find where actually was timed. And what you'll see here is, you know, sure. It's just one particular data set has a variety of columns, but you see here is unlike that record zero, records one, now it's expanded. And so it has been expanded like a vertical flattening that you would traditionally do if you wanted to do anything that was an elastic or a relational construct, you know, that fit into a table format. Now the 'vantage of JSON Flex, you don't have that stored as a blob and use these proprietary JSON API's. You can use your native elastic API or your native SQL tooling to get access to it naturally without that expense of that explosion or without the complexity of ETLing it, and picking and choosing before you actually put into the database. That completes the demonstration of ChaosSearch new JSON Flex capability. If you're interested, come to ChaosSearch.io and set up a free trial. Thank you.

Published Date : Nov 15 2021

SUMMARY :

and as you see here, we

ENTITIES

Entity	Category	Confidence
Thomas Hazel	PERSON	0.99+
10,000	QUANTITY	0.99+
one terabyte	QUANTITY	0.99+
one file	QUANTITY	0.99+
104 columns	QUANTITY	0.99+
one petabyte	QUANTITY	0.99+
1 million rows	QUANTITY	0.99+
JSON Flex	TITLE	0.99+
ChaosSearch	ORGANIZATION	0.99+
one row	QUANTITY	0.99+
a million files	QUANTITY	0.99+
tonight	DATE	0.98+
Tableau	TITLE	0.98+
each attribute	QUANTITY	0.98+
first	QUANTITY	0.98+
SQL	TITLE	0.98+
S3	TITLE	0.98+
100	QUANTITY	0.98+
JSON	TITLE	0.98+
15 years	QUANTITY	0.98+
Presto	TITLE	0.97+
one	QUANTITY	0.96+
Looker	TITLE	0.95+
two	QUANTITY	0.93+
JSON Flex View	TITLE	0.92+
JSON API	TITLE	0.91+
Flex	TITLE	0.87+
zero	QUANTITY	0.87+
SQS	TITLE	0.86+
ChaosSearchJSON	ORGANIZATION	0.8+
this quarter	DATE	0.8+
CloudTrail	COMMERCIAL_ITEM	0.79+
Apache Tonetto	ORGANIZATION	0.72+
JSON	ORGANIZATION	0.69+
Chaos Flex	TITLE	0.69+
CloudTrail	TITLE	0.6+
ChaosSearch	TITLE	0.58+
ChaosSearch.io	TITLE	0.57+
data set	QUANTITY	0.56+
Kibana	ORGANIZATION	0.45+

Priya Rajagopal, Couchbase | Couchbase ConnectONLINE

>> Welcome to the Cubes coverage of Couchbase connect online 2021. I'm Lisa Martin. I have a first timer here on the cube Priya Rajgopal, the director of product management from Couchbase joins me next. Priya, welcome to the program. >> Thank you, Lisa. Thanks for having me here and glad to be here. First timer. So really excited. >> Yeah. Well, we'll make sure that you're going to have fun. We're going to talk about edge computing and what I'd love to get is your perspectives on what's going on and the evolution in the last 18 months. I'm sure so much has changed, but talk to me about edge computing what's going on >> Sure. >> Sure. there's 6 lot of literature on there and different there's a lot of literature on there and different interpretations and the way we see it at Couchbase, it's a distributed computing paradigm, that brings compute and storage to the edge. And what is the edge? The edge is the location where data is generated or consumed. And so the edge, again, the taxonomy is varied, but it's really a continuum. So it's not a thing, right? So it's a location. So it could be a single device or it could actually be a data center. And so it's getting a lot of traction with the proliferation of a lot of applications around AR, VR, IOT, and mobile devices and mobile applications. Because it delivers on the promise of ultra low latency access to data because you know, the edges where the data is generated and consumed, data privacy, governance, residency to network disruptions, low bandwidth usage. So to your question on how does mobile fit into the space of edge computing? In my view, mobile application, mobile devices are a classic example of edge computing because think about mobile devices, right, they're generating data, they're processing data, applications are processing data right there on the devices. You can store data in offline mode on those devices. So it is a classic edge device. And of course, the data doesn't have to be generated on the device itself. There's mobile applications could sort of be gateways to other external like variables for instance, and other IOT devices, which can connect to these mobile applications. And these mobile applications could process that data. >> Got it. So thank you for sharing Couchbase's definition. And it's a good point to do that as so many times, there's so many different terms and solutions and technologies that can be interpreted and explained many different ways. Let's now go through Couchbase's role in edge computing. Help the audience understand where you fit into that. >> Sure. So if you recap the definition, right? edge computing is all about storage and compute to the edge. So clearly a database has a key role to play in this model, Right? Or in this paradigm, because when you think about it, a classic application architecture, you've got three tiers, you've got an application tier, it includes your business logic, some of the UI elements, that's optional. You've got your database tier, which drives the application, Does the obligation needs data? It's driven by the database tier. And then you've got the infrastructure tier, that includes your network storage compute. Now, when you're talking about an edge computing architecture, you're talking about distributing all these three tiers. Your application tier, a database tier, as well as your infrastructure tier and a Couchbase is a fully distributed, no sequel database solution. So it fits in right into this paradigm of edge computing. Now, when we are talking about distributing our storage, that's just one aspect of it, right? You have to distribute it to these edge devices. You may have to distribute it to edge data centers, You need to be able to sync or move data between these You need to be able to sync or move data between these distributed cloud environments, right? So data synchronization is a key component of the tier of of edge computing architectures. And then finally, there's data management. That's all about enforcement of policies, when it comes to data privacy, you know what, data needs to be resident at the edge, what data needs to be filtered, what needs to be aggregated? what data needs to be filtered, what needs to be aggregated? So you need a solution that can provide those hooks that allows you to enforce those policies. So, a database like Couchbase has a critical role to play solution that can be deployed in the cloud, or it can be deployed at the edge. And again, or it can be deployed at the edge. And again, the edge could be a data center or it could be a device. So what about device? We have an embedded database solution for mobile desktop and embedded platforms. And then of course, data movement, comprehensive data synchronization technology. comprehensive data synchronization technology. >> Let's go through specifically some of the database capabilities that are required for businesses in any industry to be successful in edge computing. >> Sure, absolutely. Right. to do sort of reiterate or reinforce the three concepts, right? Data storage, data movement, data management, right? And Couchbase technology because that the stack consists of couchbase server, our flagship fully distributed, no sequel data platform. It can be containerized. It can be deployed in any public or private cloud. It could be deployed at the edge cloud. And then you've a Couchbase lite. Again, no sequel embedded database full featured, right? Anything that you can do with a standalone database, you can do it with the embedded database. Now you can embed that within your mobile applications, within your other embedded applications or desktop applications. And that's great, right? That's the data storage part of it. And, and that's one part of it, but what about the data movement? And that's where you got a data synchronization technology where we facilitate a high throughput, high performance, highly scalable data synchronization, between the edge and the cloud. And of course, as I mentioned, data management is a critical aspect of all this, right? And so the synchronization technology has got components that allow you to set filters, access control policies. And there's a lot of hooks when it comes to data governance. So for instance, if an edge goes out of commission, or if there's a security breach, for instance, you want to isolate the edge, you can do that. The data that was previously synchronized to that edge, you want to be able to poach that data. So we have options the automatically poach the data, if the device is no longer in the hands of the right recipient for instance. those are the critical aspects. Of course, the overarching theme is security, right? And, that goes hand-in-hand with encryption of the data at rest, encryption of the data in motion, then authentication, authorization, access control. >> Security is even more important in given the events of the last 18 months where we've seen a massive rise in ransom, where we've seen a huge rise in DDoS attacks. Let's, double-click more on the security aspect of what Couchbase is delivering. >> Sure, absolutely. So when it comes to security of data at rest, right, even when the Couchbase lite, which is our embedded device, your entire database is encrypted AES-256 data encryption, and then data, when it leaves the device through our data synchronization protocol, everything is encrypted. And of course, when it goes to a sync gateway, the sync gateway is sort of, as I mentioned, the middle tier component, that is responsible for data synchronization between the embedded devices and Couchbase server. That entity is responsible for enforcement of access control policies. So you are guaranteed that only users who should have access to those documents or data are granted access to that. And in fact, we are NoSQL Json database. So which means, everything is modeled in the form of documents, Json documents. And so when we're talking about read, write access control, read access is at the granularity of a document, and write access can be enforced at the granularity of a property within the document. So you may have access to an entire document, but you may only be allowed to update a certain property within the document. So, as I mentioned, when it comes to distributed computing architectures like edge computing, security is even more paramount, right? You have devices going offline, coming back online and, you might have a breach point at one edge environment, whether it is a data center or an edge device, you need to be able to ensure that you have isolated all the other edge components from that breach. And as I mentioned, when it comes to data governance and so on, data retention, for instance, even if it is not a security breach, let's say you do have, for some reason, the owner of a device should no longer have access to that content. You know, their role has changed, they have transitioned to a different company for instance. Then you will have a way of automatically purging all that data that was previously synchronized to the user's device. >> Got it. Okay. Let's continue talking about the events of the last year and a half. Because we saw this massive scatter, 18 months ago of an explosion at the edge when a lot of people went from the office to this work from home, work from any anywhere environment in which we're still in. So how has the pandemic and the events that related to it changed mobile apps and edge computing and what are some of the new requirements that customers have? >> Sure. Well, as you rightly said, right? In fact, if anything, the relevance of mobile devices and applications has just grown in significance through the pandemic. And it's kind of interesting, there are some surveys that have suggested that through the pandemic people have been using their mobile devices as their primary communication device for accessing the internet. And it's kind of interesting because you think, well, everyone is cooped up in their homes. They probably will have access to other forms of data consumption, but no, it's mobile devices. That's what they have primarily been using. So with that, there is also a new range of use cases and applications, which are driven in large part by the events of the pandemic. But I think that's just made things much more efficient. Customer satisfaction, user experience is paramount, is number one. And I think a lot of that is here to stay even following or post pandemic because it's just made things a lot more efficient. And we've seen that through different industries, right? Healthcare, there was always telemedicine medicine, but now for non-essential care, it's always telemedicine, Of course, specific to the pandemic. there was the, tracking, the contact tracing application, right? That's enabled through technologies like Bluetooth and GPS, so they track the whereabouts of infected persons. But then even if you arrive at your doctor's office, right, you wait in your car and you get notified when the doctor's ready to check you in. And then retail sector, E-commerce right? Of course everything was going online, but everything is overwhelming People are shopping online through their mobile devices, than the traditional web based applications. And you order on your phone, you pick up at the store, right? So curbside pickup, you pull into the store, the store clerk is notified of your arrival. They come out to the curb with your order. And here's the interesting bit, you know, it's kind of intuitive that it's going to be e-commerce applications. They got a huge boost through the pandemic. But interestingly, even the experience when it comes to retail in store, that's undergone a transformation because it was all about how do we make the process very efficient. So customers are in and out of the store really quick, right? there was the reason for that. But now we can translate to making the whole shopping experience much more easy. So you walk into the store, you meet a sales associate who can bring up information about the catalog inventory right there on the iPad. And so if you have any questions, whether it's something is available in the store or an access for you're looking for, they can give answers to you immediately. Right? And of course there are companies like Walmart, they have been rolling out applications. Mobile scan and go sort of applications, which is all about, you know, you scan items as you walk through the aisle, do a self checkout, totally contact-less experience. And, the list goes on, right? We talked about healthcare retail, same thing in, in a restaurant, right? A curbside that delivery and pickup, you can now track your delivery order because now it's just a huge surge in order deliveries. And then the same pickup concept, curbside pickup concept, you arrive in your car, the kitchen is notified of your arrival and they come out with your order, very streamlined drive-through. You've got people now coming to your car, taking the order, right there from the car on their tablets, that synced in real time to the backend kitchen, your order. And you get notified when your order is ready. So I think all this is about making things a lot more efficient. It's about customer experience, and edge computing has a big role to play in that. And so I think, if anything is just propelled the growth of mobile applications and use cases. >> Yeah. That that propulsion is something that we've been hearing a lot about the acceleration in the last year and a half. You did a great job there of painting a picture of some of the positives that come out of this accelerating the efficiencies that we all as consumers and in our business life expect to have. And this explosion at the edge that's really become even more of a lifeline for millions and millions and billions of people globally. I got to ask you that from a connectivity perspective, that's another area where we had this expectation as again, consumers or in our business lives we have connectivity. Where does all that talk about 5G; What does 5G fit into edge computing? >> Sure. That's a good question. Because 5G and edge computing sort of go hand in hand so much so that they are being used synonymously in some cases and that's inaccurate. Okay? So because every time people talk about edge computing, there are folks talking about 5G in the same breath, right? But really 5G, as we all know, is a cellular next generation cellular technology, promises, ultra low latency, very high bandwidth. Now we talked about this huge surge, right in mobile applications and new sort of use cases where a lot of the data is generated at the edge. IOT applications are just data intensive applications, right gaming apps and so on. And all of these applications, they demand ultra low latency, right? And they're generating a lot of this data and all that data needs to be processed in real time. So if you have to send all of that data back to the cloud, and then you get the responses, that's a really bad experience. So that's what 5G is here to solve, right? I mean, it's like low latency, high bandwidth, high concurrency. Now that's all great. But then the coverage of 5G, it terminates at the edge of the mobile operator network. So you have all these massive influx of devices generating all that data. And all that stuff is transmitted under a very low latency conditions over a 5G network. But then if all that data from the mobile operator network has to be back hauled to the internet, to the backend servers, then you kind of defeat the whole purpose of ultra low latency applications. So that's where edge computing comes into play because edge computing is really an architecture, right? It's a distributed architecture. So now what mobile operators are doing is deploying what they refer to as NDCs, but it's effectively micro data centers at the edge of the mobile operator network. So you have all this data coming in over the 5G network. Great. They get analyzed, they get processed locally at the edge of the mobile operator network and you get real-time responses. And of course, as needed that data in aggregated or filter form goes back to the cloud. And so that's where the two relate. So in my view, I think edge computing architectures are important to deliver on the promise of 5G, but 5G has propelled the relevance or importance of edge computing as a solution, as a deployment architecture. So very interrelated. >> Very interrelated, very symbiotic. And of course the need for real time data real-time analytics in every industry became very prominent in the last year and a half. We had this expectation that we're going to be able to understand things in real time. And that's often a huge differentiator for businesses. We're out of time, but I want to ask you one more question Priya, and that is where can customers go to get started with Couchbase? >> Oh, absolutely. So Couchbase servers and gateway, you can deploy that, it's available as software. You can download it from our website. Couchbased lite is available for all your mobile applications. So it is available as a download, but you also have the classic package management systems through which you can download Couchbase Lite. And then of course, as I said, you can deploy this standalone, but you can also deploy it in the cloud. So we have marketplace offerings for both Couchbase server and sync gateway. So if you want to deploy it on AWS, as your Google, you can do that as well. >> Excellent. Priya, thank you so much for joining me on the program, talking about Couchbase the evolution, the changes, the opportunities with edge computing and mobile and how Couchbase is involved. I appreciate your time. >> Thank you very much. And thanks for having me. >> For Priya Rajgopal, I'm Lisa Martin. You're watching the Cubes coverage of Couchbase connect, online 2021.

Published Date : Oct 26 2021

SUMMARY :

on the cube Priya Rajgopal, glad to be here. evolution in the last 18 months. the data doesn't have to be And it's a good point to is a key component of the specifically some of the And so the synchronization the events of the last 18 months So you are guaranteed that only the events that related to it And here's the interesting bit, you know, I got to ask you that from data centers at the edge of the And of course the need for So if you want to deploy joining me on the program, Thank you very much. Couchbase connect, online 2021.

ENTITIES

Entity	Category	Confidence
Priya	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Priya Rajgopal	PERSON	0.99+
Lisa	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
iPad	COMMERCIAL_ITEM	0.99+
Priya Rajagopal	PERSON	0.99+
millions	QUANTITY	0.99+
2021	DATE	0.99+
Couchbase Lite	TITLE	0.99+
AWS	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Couchbase	ORGANIZATION	0.99+
18 months ago	DATE	0.99+
Google	ORGANIZATION	0.99+
AES-256	OTHER	0.98+
Couchbased lite	TITLE	0.98+
both	QUANTITY	0.98+
Cubes	ORGANIZATION	0.98+
Couchbase	TITLE	0.98+
one part	QUANTITY	0.98+
pandemic	EVENT	0.97+
one aspect	QUANTITY	0.97+
three tiers	QUANTITY	0.97+
single device	QUANTITY	0.96+
last year and a half	DATE	0.96+
three concepts	QUANTITY	0.95+
one more question	QUANTITY	0.93+
5G	ORGANIZATION	0.9+
last 18 months	DATE	0.87+
First timer	QUANTITY	0.86+
6 lot of literature	QUANTITY	0.85+
one edge	QUANTITY	0.8+
Json	ORGANIZATION	0.77+
first timer	QUANTITY	0.77+
billions of people	QUANTITY	0.75+
Couchbase lite	TITLE	0.69+
double	QUANTITY	0.66+
connect	TITLE	0.6+
lot of people	QUANTITY	0.57+
5G	QUANTITY	0.56+
one	QUANTITY	0.5+
NoSQL	ORGANIZATION	0.5+
couchbase	ORGANIZATION	0.45+
5G	TITLE	0.23+

Ravi Mayuram, Senior Vice President of Engineering and CTO, Couchbase

>> Welcome back to the cubes coverage of Couchbase connect online, where the theme of the event is, is modernize now. Yes, let's talk about that. And with me is Ravi mayor him, who's the senior vice president of engineering and the CTO at Couchbase Ravi. Welcome. Great to see you. >> Thank you so much. I'm so glad to be here with you. >> I want to ask you what the new requirements are around modern applications. I've seen some of your comments, you got to be flexible, distributed, multimodal, mobile, edge. Those are all the very cool sort of buzz words, smart applications. What does that all mean? And how do you put that into a product and make it real? >> Yeah, I think what has basically happened is that so far it's been a transition of sorts. And now we are come to a point where that tipping point and that tipping point has been more because of COVID and there are COVID has pushed us to a world where we are living in a in a sort of occasionally connected manner where our digital interactions precede, our physical interactions in one sense. So it's a world where we do a lot more stuff that's less than in a digital manner, as opposed to sort of making a more specific human contact. That does really been the sort of accelerant to this modernize Now, as a team. In this process, what has happened is that so far all the databases and all the data infrastructure that we have built historically, are all very centralized. They're all sitting behind. They used to be in mainframes from where they came to like your own data centers, where we used to run hundreds of servers to where they're going now, which is the computing marvelous change to consumption-based computing, which is all cloud oriented now. And so, but they are all centralized still, but where our engagement happens with the data is at the edge at your point of convenience, at your point of consumption, not where the data is actually sitting. So this has led to, you know, all those buzzwords, as you said, which is like, oh, well we need a distributed data infrastructure, where is the edge? But it just basically comes down to the fact that the data needs to be there, if you are engaging with it. And that means if you are doing it on your mobile phone, or if you're sitting, but doing something in your while you're traveling, or whether you're in a subway, whether you're in a plane or a ship, wherever the data needs to come to you and be available, as opposed to every time you going to the data, which is centrally sitting in some place. And that is the fundamental shift in terms of how the modern architecture needs to think when they, when it comes to digital transformation and, transitioning their old applications to the, the modern infrastructure, because that's, what's going to define your customer experiences and your personalized experiences. Otherwise, people are basically waiting for that circle of death that we all know, and blaming the networks and other pieces. The problem was actually, the data is not where you are engaging with it. It's got to be fetched, you know, seven sea's away. And that is the problem that we are basically solving in this modern modernization of that data, data infrastructure. >> I love this conversation and I love the fact that there's a technical person that can kind of educate us on, on this because date data by its very nature is distributed. It's always been distributed, but with the distributed database has always been incredibly challenging, whether it was a global SIS Plex or an eventual consistency of getting recovery for a distributed architecture has been extremely difficult. You know, I hate that this is a terrible term, lots of ways to skin a cat, but, but you've been the visionary behind this notion of optionality, how to solve technical problems in different ways. So how do you solve that, that problem of, of, of, of, of a super rock solid database that can handle, you know, distributed data? >> Yes. So there are two issues that you alluded little too over there. The first is the optionality piece of it, which is that same data that you have that requires different types of processing on it. It's almost like fractional distillation. It is like your crude flowing through the system. You start all over from petrol and you can end up with Vaseline and rayon on the other end, but the raw material, that's our data. In one sense. So far, we never treated the data that way. That's part of the problem. It has always been very purpose built and cast first problem. And so you just basically have to recast it every time we want to look at the data. The first thing that we have done is make data that fluid. So when you're actually, when you have the data, you can first look at it to perform. Let's say a simple operation that we call as a key value store operation. Given my ID, give him a password kind of scenarios, which is like, you know, there are customers of ours who have billions of user IDs in their management. So things get slower. How do you make it fast and easily available? Log-in should not take more than five milliseconds, this is, this is a class of problem that we solve that same data. Now, eventually, without you ever having to sort of do a casting it to a different database, you can now do solid queries. Our classic SQL queries, which is our next magic. We are a no SQL database, but we have a full functional SQL. The SQL has been the language that has talked to data for 40 odd years successfully. Every other database has come and tried to implement their own QL query language, but they've all failed only SQL has stood the test of time of 40 odd years. Why? Because there's a solid mathematics behind it. It's called a relational calculus. And what that helps you is, is basically a look at the data and any common editorial, any, any which way you look at the data, all it will come, the data in a format that you can consume. That's the guarantee sort of gives you in one sense. And because of that, you can now do some really complex in the database signs, what we call us, predicate logic on top of that. And that gives you the ability to do the classic relational type queries select star from where, kind of stuff, because it's at an English level becomes easy to so the same day that you didn't have to go move it to another database, do your sort of transformation of the data and all the stuff, same day that you do this. Now that's where the optionality comes in. Now you can do another piece of logic on top of this, which we call search. This is built on this concept of inverted index and TF IDF, the classic Google in a very simple terms, what Google tokenized search, you can do that in the same data without you ever having to move the data to a different format. And then on top of it, they can do what is known as a eventing or your own custom logic, which we all which we do on a, on programming language called Java script. And finally analytics and analytics is the, your ability to query the operational data in a different way. And talk querying, what was my sales of this widget year over year on December 1st week, that's a very complex question to ask, and it takes a lot of different types of processing. So these are different types of that's optionality with different types of processing on the same data without you having to go to five different systems without you having to recast the data in five different ways and apply different application logic. So you put them in one place. Now is your second question. Now this has got to be distributed and made available in multiple cloud in your data center, all the way to the edge, which is the operational side of the, the database management system. And that's where the distributed platform that we have built enables us to get it to where you need the data to be, you know, in the classic way we call it CDN'ing the data as in like content delivery networks. So far do static, sort of moving of static content to the edges. Now we can actually dynamically move the data. Now imagine the richness of applications you can develop. >> And on the first part of, of the, the, the answer to my question, are you saying you could do this without scheme with a no schema on, right? And then you can apply those techniques. >> Fantastic question. Yes. That's the brilliance of this database is that so far classically databases have always demanded that you first define a schema before you can write a single byte of data. Couchbase is one of the rare databases. I, for one don't know any other one, but there could be, let's give the benefit of doubt. It's a database which writes data first and then late binds to schema as we call it. It's a schema on read thing. So, because there is no schema, it is just a Json document that is sitting inside. And Json is the lingua franca of the web, as you very well know by now. So it just Json that we manage, you can do key value look ups of the Json. You can do full credit capability, like a classic relational database. We even have cost-based optimizers and other sophisticated pieces of technology behind it. You can do searching on it, using the, the full textual analysis pipeline. You can do ad hoc webbing on the analytics side, and you can write your own custom logic on it using or inventing capabilities. So that's, that's what it allows because we keep the data in the native form of Json. It's not a data structure or a data schema imposed by a database. It is how the data is produced. And on top of it, bring, we bring different types of logic, five different types of it's like the philosophy is bringing logic to data as opposed to moving data to logic. This is what we have been doing in the last 40 years, because we developed various database systems and data processing systems at various points in time in our history, we had key value stores. We had relational systems, we had search systems, we had analytical systems. We had queuing systems, all these systems, if you want to use any one of them are answered. It always been, just move the data to that system. Versus we are saying that do not move the data as we get bigger and bigger and data just moving this data is going to be a humongous problem. If you're going to be moving petabytes of data for this, it's not going to fly instead, bring the logic to the data, right? So you can now apply different types of logic to the data. I think that's what, in one sense, the optionality piece of this. >> But as you know, there's plenty of schema-less data stores. They're just, they're called data swamps. I mean, that's what they, that's what they became, right? I mean, so this is some, some interesting magic that you're applying here. >> Yes. I mean, the one problem with the data swamps as you call them is that that was a little too open-ended because the data format itself could change. And then you do your, then everything became like a game data recasting because it required you to have it in seven schema in one sense at, at the end of the day, for certain types of processing. So in that where a lot of gaps it's probably related, but it not really, how do you say keep to the promise that it actually meant to be? So that's why it was a swamp I mean, because it was fundamentally not managing the data. The data was sitting in some file system, and then you are doing something, this is a classic database where the data is managed and you create indexes to manage it. And you create different types of indexes to manage it. You distribute the index, you distribute the data you have, like we were discussing, you have ACID semantics on top of, and when you, when you put all these things together, it's, it's, it's a tough proposition, but we have solved some really tough problems, which are good computer science stuff, computer science problems that we have to solve to bring this, to bring this, to bear, to bring this to the market. >> So you predicted the trend around multimodal and converged databases. You kind of led Couchbase through that. I, I want, I always ask this question because it's clearly a trend in the industry and it, and it definitely makes sense from a simplification standpoint. And, and, and so that I don't have to keep switching databases or the flip side of that though, Ravi. And I wonder if you could give me your opinion on this is kind of the right tool for the right job. So I often say isn't that the Swiss army knife approach, where you have have a little teeny scissors and a knife, that's not that sharp. How, how do you respond to that? >> A great one. My answer is always, I use another analogy to tackle that, and is that, have you ever accused a smartphone of being a Swiss army knife? - No. No. >> Nobody does. That because it actually 40 functions in one is what a smartphone becomes. You never call your iPhone or your Android phone, a Swiss army knife, because here's the reason is that you can use that same device in the full capacity. That's what optionality is. It's not, I'm not, it's not like your good old one where there's a keyboard hiding half the screen, and you can do everything only through the keyboard without touching and stuff like that. That's not the whole devices available to you to do one type of processing when you want it. When you're done with that, it can do another completely different types of processing. Right? As in a moment, it could be a TomTom, telling you all the directions, the next one, it's your PDA. Third one. It's a fantastic phone. Four. It's a beautiful camera which can do your f-stop management and give you a nice SLR quality picture. Right? So next moment, it's the video camera. People are shooting movies with this thing in Hollywood, these days for God's sake. So it gives you the full power of what you want to do when you want it. And now, if you just thought that iPhone is a great device or any smartphone is a great device, because you can do five things in one or 50 things in one, and at a certain level, he missed the point because what that device really enabled is not just these five things in one place. It becomes easy to consume and easy to operate. It actually started the app based economy. That's the brilliance of bringing so many things in one place, because in the morning, you know, I get an alert saying that today you got to leave home at >> 8: 15 for your nine o'clock meeting. And the next day it might actually say 8 45 is good enough because it knows where the phone is sitting. The geo position of it. It knows from my calendar where the meeting is actually happening. It can do a traffic calculation because it's got my map and all of the routes. And then it's got this notification system, which eventually pops up on my phone to say, Hey, you got to leave at this time. Now five different systems have to come together and they can because the data is in one place. Without that, you couldn't even do this simple function in a, in a sort of predictable manner in a, in a, in a manner that's useful to you. So I believe a database which gives you this optionality of doing multiple data processing on the same set of data allows you will allow you to build a class of products, which you are so far been able to struggling to build. Because half the time you're running sideline to sideline, just, you know, integrating data from one system to the other. >> So I love the analogy with the smartphone. I want to, I want to continue it and double click on it. So I use this camera. I used to, you know, my kid had a game. I would bring the, the, the big camera, the 35 millimeter. So I don't use that anymore no way, but my wife does, she still uses the DSLR. So is, is there a similar analogy here? That those, and by the way, the camera, the camera shop in my town went out of business, you know? So, so, but, but is there, is that a fair and where, in other words, those specialized databases, they say there still is a place for them, but they're getting. >> Absolutely, absolutely great analogy and a great extension to the question. That's like, that's the contrarian side of it in one sense is that, Hey, if everything can just be done in one, do you have a need for the other things? I mean, you gave a camera example where it is sort of, it's a, it's a slippery slope. Let me give you another one, which is actually less straight to the point better. I've been just because my, I, I listened to half of my music on the iPhone. Doesn't stop me from having my full digital receiver. And, you know, my Harman Kardon speakers at home because they, I mean, they produce a kind of sounded immersive experience. This teeny little speaker has never in its lifetime intended to produce, right? It's the convenience. Yes. It's the convenience of convergence that I can put my earphones on and listen to all the great music. Yes, it's 90% there or 80% there. It depends on your audio file-ness of your, I mean, your experience super specialized ones do not go away. You know, there are, there are places where the specialized use cases will demand a separate system to exist. But even there that has got to be very closed. How do you say close, binding or late binding? I should be able to stream that song from my phone to that receiver so I can get it from those speakers. You can say that all, there's a digital divide between these two things done, and I can only play CDs on that one. That's not how it's going to work going forward. It's going to be, this is the connected world, right? As in, if I'm listening to the song in my car and then step off the car, walk into my living room, that same songs should continue and play in my living room speakers. Then it's a connected world because it knows my preference and what I'm doing that all happened only because of this data flowing between all these systems. >> I love, I love that example too. When I was a kid, we used to go to Tweeter, et cetera. And we used to play around with three, take home, big four foot speakers. Those stores are out of business too. Absolutely. And now we just plug into Sonos. So that is the debate between relational and non-relational databases over Ravi? >> I believe so, because I think what had happened was relational systems. I've mean where the norm, they rule the roost, if you will, for the last 40 odd years and then gain this no SQL movement, which was almost as though a rebellion from the relational world, we all inhabited because we, it was very restrictive. It, it had the schema definition and the schema evolution as we call it, all those things, they were like, they required a committee. They required your DBA and your data architect. And you had to call them just to add one column and stuff like that. And the world had moved on. This was a world of blogs and tweets and, you know, mashups and a different generation of digital behavior, There are digital, native people now who are operating in these and the, the applications, the, the consumer facing applications. We are living in this world. And yet the enterprise ones were still living in the, in the other, the other side of the divide. So out came this solution to say that we don't need SQL. Actually the problem was never SQL. No SQL was, you know, best approximation, good marketing name, but from a technologist perspective, the problem was never the query language, no SQL was not the problem, the schema limitations and the inability for these, the system to scale, the relational systems were built like airplanes, which is that if a San Francisco, Boston, there is a flight route, it's so popular that if you want to add 50 more seats to it, the only way you can do that is to go back to Boeing and ask them to get you a set from 7 3 7 2 7 7 7, or whatever it is. And they'll stick you with a billion dollar bill on the allowance that you'll somehow pay that by, you know, either flying more people or raising the rates or whatever you have to do. These are all vertically scaling systems. So relational systems are vertically scaling. They are expensive. Versus what we have done in this modern world is make the system horizontally scaling, which is more like the same thing. If it's a train that is going from San Francisco to Boston, you need 50 more people be my guest. I'll add one more coach to it, one more car to it. And the better part of the way we have done this here is that, and we are super specialized on that. This route actually requires three, three dining cars and only 10 sort of sleeper cars or whatever. Then just pick those and attach the next route. You can choose to have, I need only one dining car. That's good enough. So the way you scale the plane is also can be customized based on the route along the route, more, more dining capabilities, shorter route, not an abandoned capability. You can attach the kind of coaches we call this multidimensional scaling. Not only do we scale horizontally, we can scale to different types of workloads by adding different types of coaches to it, right? So that's the beauty of this architecture. Now, why is that architecture important? Is that where we land eventually is the ability to do operational and analytical in the same place. This is another thing which doesn't happen in the past, because, you would say that I cannot run this analytical query because then my operational workload will suffer. Then my front end, then we'll slow down millions of customers that impacted that problem. They'll solve the same data once again, do analytical query, an operational query because they're separated by these cars, right? As in like we, we, we fence the, the, the resources so that one doesn't impede the other. So you can, at the same time, have a microsecond 10 million ops per second, happening of a key value or a query. And then yet you can run this analytical query, which will take a couple of minutes to them. One, not impeding the other. So that's in one sense, sort of the part of the problems that we have solved it here is that relational versus the no SQL portion of it. These are the kinds of problems we have to solve. We solve those. And then we yet put back the same query language on top. Why? It's like Tesla in one sense, right underneath the surface is where all the stuff that had to be changed had to change, which is like the gasoline, the internal combustion engine the gas, you says, these were the issues we really wanted to solve. So solve that, change the engine out, you don't need to change the steering wheel or the gas pedal or the, you know, the battle shifters or whatever else you need, over there your gear shifters. Those need to remain in the same place. Otherwise people won't buy it. Otherwise it does not even look like a car to people. So even when you feed people, the most advanced technology, it's got to be accessible to them in the manner that people can consume. Only in software, we forget this first design principle, and we go and say that, well, I got a car here, you got the blow harder to go fast. And they lean back for, for it to, you know, to apply a break that's, that's how we seem to define design software. Instead, we shouldn't be designing them in a manner that it is easiest for our audience, which is developers to consume. And they've been using SQL for 40 years or 30 years. And so we give them the steering wheel on the, and the gas pedal and the, and the gear shifters by putting SQL back on underneath the surface, we have completely solved the relational limitations of schema, as well as scalability. So in, in, in that way, and by bringing back the classic ACID capabilities, which is what relational systems we accounted on, and being able to do that with the SQL programming language, we call it like multi-statement SQL transaction. So to say, which is what a classic way all the enterprise software was built by putting that back. Now, I can say that that debate between relational and non-relational is over because this has truly extended the database to solve the problems that the relational systems had to grow up to solve in the modern times, rather than get sort of pedantic about whether it's we have no SQL or SQL or new SQL, or, you know, any of that sort of jargon oriented debate. This is, these are the debates of computer science that they are actually, and they were the solve, and they have solved them with the latest release of 7.0, which we released a few months ago. >> Right, right. Last July, Ravi, we got got to leave it there. I love the examples and the analogies. I can't wait to be face-to-face with you. I want to hang with you at the cocktail party because I've learned so much and really appreciate your time. Thanks for coming to the cube. >> Fantastic. Thanks for the time. And the opportunity I was, I mean, very insightful questions really appreciate it. - Thank you. >> Okay. This is Dave Volante. We're covering Couchbase connect online, keep it right there for more great content on the cube.

Published Date : Oct 1 2021

SUMMARY :

of engineering and the CTO Thank you so much. And how do you put that into And that is the problem that that can handle, you know, the data in a format that you can consume. the answer to my question, the data to that system. But as you know, the data is managed and you So I often say isn't that the have you ever accused a place, because in the morning, you know, And the next day it might So I love the analogy with my music on the iPhone. So that is the debate between So the way you scale the plane I love the examples and the analogies. And the opportunity I was, I mean, great content on the cube.

ENTITIES

Entity	Category	Confidence
San Francisco	LOCATION	0.99+
Boston	LOCATION	0.99+
90%	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Ravi Mayuram	PERSON	0.99+
40 years	QUANTITY	0.99+
80%	QUANTITY	0.99+
second question	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
five things	QUANTITY	0.99+
Ravi	PERSON	0.99+
today	DATE	0.99+
40 odd years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
one	QUANTITY	0.99+
Last July	DATE	0.99+
50 more seats	QUANTITY	0.99+
35 millimeter	QUANTITY	0.99+
three	QUANTITY	0.99+
five things	QUANTITY	0.99+
Harman Kardon	ORGANIZATION	0.99+
SQL	TITLE	0.99+
two issues	QUANTITY	0.99+
nine o'clock	DATE	0.99+
40 functions	QUANTITY	0.99+
five different systems	QUANTITY	0.99+
Sonos	ORGANIZATION	0.99+
Java	TITLE	0.99+
Tesla	ORGANIZATION	0.99+
50 more people	QUANTITY	0.99+
millions	QUANTITY	0.99+
50 things	QUANTITY	0.99+
one more car	QUANTITY	0.99+
one place	QUANTITY	0.99+
one more coach	QUANTITY	0.99+
one place	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
first	QUANTITY	0.98+
Couchbase	ORGANIZATION	0.98+
one sense	QUANTITY	0.98+
December 1st week	DATE	0.98+
five different systems	QUANTITY	0.98+
first part	QUANTITY	0.98+
Android	TITLE	0.98+
Third one	QUANTITY	0.97+
Four	QUANTITY	0.97+
next day	DATE	0.96+
first thing	QUANTITY	0.96+
Json	TITLE	0.96+
8 45	OTHER	0.95+
SIS Plex	TITLE	0.95+
Boeing	ORGANIZATION	0.95+
one problem	QUANTITY	0.95+
one column	QUANTITY	0.94+
more than five milliseconds	QUANTITY	0.94+
three dining cars	QUANTITY	0.94+
One	QUANTITY	0.94+
one system	QUANTITY	0.94+
10 sort of sleeper cars	QUANTITY	0.93+
8: 15	DATE	0.93+
billion dollar	QUANTITY	0.92+
one dining car	QUANTITY	0.92+
first problem	QUANTITY	0.92+
English	OTHER	0.92+

Vasanth Kumar, MongoDB Principal Solutions Architect | Io-Tahoe Episode 7

>> Okay. We're here with Vasanth Kumar who's the Principal Solutions Architect for MongoDB. Vasanth, welcome to "theCube." >> Thanks Dave. >> Hey, listen, I feel like you were born to be an architect in technology. I mean, you've worked for big SIs, you've worked with many customers, you have experience in financial services and banking. Tell us, the audience, a little bit more about yourself, and what you're up to these days. >> Yeah. Hi, thanks for the for inviting me for this discussion. I'm based out of Bangalore, India, having around 18 years experience in IT industry, building enterprise products for different domains, verticals, finance built and enterprise banking applications, IOT platforms, digital experience solutions. Now being with MongoDB nearly two years, been working in a partner team as a principal solutions architect, especially working with ISBs to build the best practices of handling the data and embed the right database as part of their product. I also worked with technology partners to integrate the compatible technology compliance with MongoDB. And also worked with the private cloud providers to provide a database as a service. >> Got it. So, you know, I have to Vasanth, I think Mongo, you kind of nailed it. They were early on with the trends of managing unstructured data, making it really simple. There was always a developer appeal, which has lasted and then doing so with an architecture that scales out, and back in the early days when Mongo was founded, I remember those days, I mean, digital transformation, wasn't a thing, it wasn't a buzz word, but it just so happens that Mongo's approach, it dovetails very nicely with a digital business. So I wonder if you could talk about that, talk about the fit and how MongoDB thinks about accelerating digital transformation and why you're different from like a traditional RDBMS. >> Sure, exactly, yeah. You had a right understanding, let me elaborate it. So we all know that the customer expectation changes day by day, because of the business agility functionality changes, how they want to experience the applications, or in apps that changes okay. And obviously this yields to the agility of the information which transforms between the multiple systems or layers. And to achieve this, obviously the way of architecting or developing the product as completely a different shift, might be moving from the monolith to microservices or event-based architecture and so on. And obviously the database has to be opt for these environment to adopt these changes, to adopt the scale of load and the other thing. Okay. And also like we see that the common, the protocol for the information exchange is JSON, and something like you, you adopt it. The database adopts it natively to that is a perfect fit. Okay. So that's where the MongoDB fits perfectly for billing or transforming the modern applications, because it's a general purpose database which accepts the JSON as a payload and stores it in a BSON format. You don't need to be, suppose like to develop any particular application or to transfer an existing application, typically they see the what is the effort required and how much, what is the cost involved in it, and how quickly I can do that. That's main important thing without disturbing the functionality here where, since it is a multimodal database in a JSON format, you don't easily build an application. Okay? Don't need a lot of transformation in case of an RDBMS, you get the JSON payload, you transform into a tabular structure or a different format, and then probably you build an ORM layer and then map it and save it. There are lot of work involved in it. There are a lot of components need to be written in between. But in case of MongoDB, what they can do is you get the information from the multiple sources. And as is, you can put it in a DB based on where, or you can transform it based on the access patterns. And then you can store it quickly. >> Dave: Got it. And I tell Dave, because today you haven't context data, which has a selected set of information. Probably tomorrow the particular customer has more information to put it. So how do you capture that? In case of an RDBMS, you need to change the schema. Once you scheme change the schema, your application breaks down. But here it magically adopts it. Like you pass the extra information, it's open for extension. It adopts it easily. You don't need to redeploy or change the schema or do something like that. >> Right. That's the genius of Mongo. And then of course, you know, in the early days people say, oh, you know, Mongo, it won't scale. And then of course we, through the cloud. And I follow very closely Atlas. I look at the numbers every quarter. I mean, overall cloud adoption is increasing like crazy, you know, our Wiki Bon analyst team. We got the big four cloud vendors just in IAS growing beyond a 115 billion this year. That's 35% on top of, you know, 80-90 billion last year. So talk more about how MongoDB fits with the cloud and how it helps with the whole migration story. 'Cause you're killing it in that space. >> Yeah. Sure. Just to add one more point on the previous question. So for continuously, for past four to five years, we have been the number one in the wanted database. >> Dave: Right Okay. That that's how like the popularity is getting done. That's how the adoption has happened. >> Dave: Right. >> I'm coming back to your question- >> Yeah let's talk about the cloud and database as a service, you guys actually have packaged that very nicely I have to say. >> Yeah. So we have spent lot of effort and time in developing Atlas, our managed database as a service, which typically gives the customer the way of just concentrating on their application rather than maintaining and managing the whole set of database or how to scale infrastructure. All those things on work is taken care. You don't need to be an expert of DB, like when you are using an Atlas. So we provide the managed database in three major cloud providers, AWS, GCP, and Azure, and also it's a purely a multicloud, you know, like you can have a primary in AWS and you have the replicated nodes in GCP or Azure. It's a purely multicloud. So that like, you don't have a cloud blocking. You feel that, okay, your business is, I mean, if this is the right for your business you are choosing the model, you think that I need to move to GCP. You don't need to bother, you easily migrate this to GCP. Okay. No vendor lock in, no cloud lock in this particular- >> So Vasanth, maybe you could talk a little bit more about Atlas and some of the differentiable features and things that you can do with Atlas that maybe people don't know about. >> Yeah, sure Dave like, Atlas is not just a manage database as a service, you know, like it's a complete data platform and it provides many features. Like for example, you build an application and probably down the line of three years, the data which you captured three years back might be an old data. Like how do you do it? Like there's no need for you to manually purge or do thing. Like we do have an online archival where you configure the data. So that like the data, which is older than two years, just purge it. So automatically this is taken care. So that like you have hot data kept in Atlas cluster and the cold data moved up to an ARKit. And also like we have a data lake where you can run a federated queries . For example, you've done an archival, but what if people want to access the data? So with data lake, what it can do is, on a single connection, you can fire a- you can run a federated queries both on the active and the archival data. That's the beauty, like you archive the data, but still you can able to query it. And we do also have a charts where like, you can build in visualization on top of the data, what you have captured. You can build in graphs or you can build in graphs and also embed these graphs as part of your application, or you can collaborate to the customers, to the CXOs and other theme. >> Dave: Got it. >> It's a complete data platform. >> Okay. Well, speaking of data platform, let's talk about Io-Tahoe's data RPA platform, and coupling that with Mongo DB. So maybe you could help us understand how you're helping with process automation, which is a very hot topic and just this whole notion of a modern application development. >> Sure. See, the process automation is more with respect to the data and how you manage this data and what to derive and build a business process on top of it. I see there are two parts into it. Like one is the source of data. How do you identify, how do you discover the data? How do you enrich the context or transform it, give a business context to it. And then you build a business rules or act on it, and then you store the data or you derive the insights or enrich it and store it into DB. The first part is completely taken by Io-Tahoe, where you can tag the data for the multiple data sources. For example, if we take an customer 360 view, you can grab the data from multiple data sources using Io-Tahoe and you discover this data, you can tag it, you can label it and you build a view of the complete customer context, and use a realm web book and then the data is ingested back to Mongo. So that's all like more sort of like server-less fashion. You can build this particular customer 360 view for example. And just to talk about the realm I spoke, right? The realm web book, realm is a backend APA that you can create on top of the data on Mongo cluster, which is available in addclass. Okay. Then once you run, the APS are ready. Data as a service, you build it as a data as a service, and you fully secure APIs, which are available. These APS can be integrated within a mobile app or an web application to build in a built in modern application. But what left out is like, just build a UI artifacts and integrate these APIs. >> Yeah, I mean we live in this API economy companies. People throw that out as sort of a buzz phrase, but Mongo lives that. I mean, that's why developers really like the Mongo. So what's your take on DevOps? Maybe you could talk a little bit about, you know, your perspective there, how you help Devs and data engineers build faster pipelines. >> Yeah, sure. Like, okay, this is the most favorite topic. Like, no, and it's a buzzword along, like all the DevOps moving out from the traditional deployment, what I learned online. So like we do support like the deployment automation in multiple ways okay, and also provide the diagnostic under the hood. We have two options in Mongo DB. One is an enterprise option, which is more on the on-prem's version. And Atlas is more with respect to the cloud one manage database service. Okay. In case of an enterprise advanced, like we do have an Ops manager and the Kubernetes operator, like a Ops manager will manage all sort of deployment automation. Upgrades, provides your diagnostics, both with respect to the hardwares, and also with respect to the MongoDB gives you a profiling, slow running queries and what you can get a context of what's working on the data using that. I'm using an enterprise operator. You can integrate with existing Kubernetes cluster, either in a different namespace on an existing namespace. And orchestrate the deployment. And in case of Atlas, we do have an Atlas-Kubernetes operator, which helps you to integrate your Kubernetes operator. And you don't need to leave your Kubernetes. And also we have worked with the cloud providers. For example, we have we haven't cloud formation templates where you can just in one click, you can just roll out an Atlas cluster with a complete platform. So that's one, like we are continuously working, evolving on the DevOps site to roll out the might be a helm chart, or we do have an operator, which has a standard (indistinct) for different types of deployments. >> You know, some really important themes here. Obviously, anytime you talk about Mongo, simplicity comes in, automation, you know, that big, big push that Io-Tahoe was making. What you said about data context was interesting because a lot of data systems, organizations, they lack context and context is very important. So auto classification and things like that. And the other thing you said about federated queries I think fits very well into the trend toward decentralized data architecture. So very important there. And of course, hybridisity. I call it hybridisity. On-prem, cloud, abstracting that complexity away and allowing people to really focus on their digital transformations. I tell ya, Vasanth, it's great stuff. It's always a pleasure chatting with Io-Tahoe partners, and really getting into the tech with folks like yourself. So thanks so much for coming on theCube. >> Thanks. Thanks, Dave. Thanks for having a nice discussion with you. >> Okay. Stay right there. We've got one more quick session that you don't want to miss.

Published Date : Aug 10 2021

SUMMARY :

Okay. We're here with Vasanth Kumar you have experience in of handling the data and and back in the early days And then you can store it quickly. So how do you capture that? And then of course, you know, on the previous question. That's how the adoption has happened. you guys actually have So that like, you don't So Vasanth, maybe you could talk the data which you So maybe you could help us and then you store the data little bit about, you know, and what you can get a context And the other thing you discussion with you. that you don't want to miss.

ENTITIES

Entity	Category	Confidence
Vasanth Kumar	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
Dave	PERSON	0.99+
two parts	QUANTITY	0.99+
Vasanth	PERSON	0.99+
35%	QUANTITY	0.99+
tomorrow	DATE	0.99+
last year	DATE	0.99+
115 billion	QUANTITY	0.99+
first part	QUANTITY	0.99+
Bangalore, India	LOCATION	0.99+
three years	QUANTITY	0.99+
JSON	TITLE	0.99+
AWS	ORGANIZATION	0.99+
Io-Tahoe	ORGANIZATION	0.99+
80-90 billion	QUANTITY	0.99+
MongoDB	ORGANIZATION	0.99+
ARKit	TITLE	0.98+
two options	QUANTITY	0.98+
one click	QUANTITY	0.98+
today	DATE	0.98+
Atlas	TITLE	0.98+
this year	DATE	0.98+
both	QUANTITY	0.97+
older than two years	QUANTITY	0.97+
around 18 years	QUANTITY	0.96+
One	QUANTITY	0.96+
nearly two years	QUANTITY	0.96+
Azure	TITLE	0.96+
Wiki Bon	ORGANIZATION	0.96+
MongoDB	TITLE	0.95+
three years back	DATE	0.94+
Vasanth	ORGANIZATION	0.93+
Io-Tahoe	TITLE	0.92+
DevOps	TITLE	0.91+
Atlas	ORGANIZATION	0.91+
Kubernetes	TITLE	0.91+
360 view	QUANTITY	0.89+
one	QUANTITY	0.89+
single connection	QUANTITY	0.88+
five years	QUANTITY	0.85+
one more quick session	QUANTITY	0.83+
GCP	ORGANIZATION	0.83+
four cloud vendors	QUANTITY	0.82+
GCP	TITLE	0.79+
three major cloud providers	QUANTITY	0.76+
one more point	QUANTITY	0.73+
Io	TITLE	0.72+
Azure	ORGANIZATION	0.72+
-Tahoe	ORGANIZATION	0.68+
four	QUANTITY	0.67+
Mongo DB	TITLE	0.65+
APA	TITLE	0.58+
ISBs	ORGANIZATION	0.54+

Maria Colgan & Gerald Venzl, Oracle | June CUBEconversation

(upbeat music) Developers have become the new king makers in the world of digital and cloud. The rise of containers and microservices has accelerated the transition to cloud native applications. A lot of people will talk about application architecture and the related paradigms and the benefits they bring for the process of writing and delivering new apps. But a major challenge continues to be, the how and the what when it comes to accessing, processing and getting insights from the massive amounts of data that we have to deal with in today's world. And with me are two experts from the data management world who will share with us how they think about the best techniques and practices based on what they see at large organizations who are working with data and developing so-called data-driven apps. Please welcome Maria Colgan and Gerald Venzl, two distinguish product managers from Oracle. Folks, welcome, thanks so much for coming on. >> Thanks for having us Dave. >> Thank you very much for having us. >> Okay, Maria let's start with you. So, we throw around this term data-driven, data-driven applications. What are we really talking about there? >> So data-driven applications are applications that work on a diverse set of data. So anything from spatial to sensor data, document data as well as your usual transaction processing data. And what they're going to do is they'll generate value from that data in very different ways to a traditional application. So for example, they may use machine learning, they are able to do product recommendations in the middle of a transaction. Or we could use graph to be able to identify an influencer within the community so we can target them with a specific promotion. It could also use spatial data to be able to help find the nearest stores to a particular customer. And because these apps are deployed on multiple platforms, everything from mobile devices as well as standard browsers, they need a data platform that's going to be both secure, reliable and scalable. >> Well, so when you think about how the workloads are shifting I mean, we're not talking about, you know it's not anymore a world of just your ERP or your HCM or your CRM, you know kind of the traditional operational systems. You really are seeing an explosion of these new data oriented apps. You're seeing, you know, modeling in the cloud, you are going to see more and more inferencing, inferencing at the edge. But Maria maybe you could talk a little bit about sort of the benefits that customers are seeing from developing these types of applications. I mean, why should people care about data-driven apps? >> Oh, for sure, there's massive benefits to them. I mean, probably the most obvious one for any business regardless of the industry, is that they not only allow you to understand what your customers are up to, but they allow you to be able to anticipate those customer's needs. So that helps businesses maintain that competitive edge and retain their customers. But it also helps them make data-driven decisions in real time based on actual data rather than on somebody's gut feeling or basing those decisions on historical data. So for example, you can do real-time price adjustments on products based on demand and so forth, that kind of thing. So it really changes the way people do business today. >> So Gerald, you think about the narrative in the industry everybody wants to be a platform player all your customers they are becoming software companies, they are becoming platform players. Everybody wants to be like, you know name a company that is huge trillion dollar market cap or whatever, and those are data-driven companies. And so it would seem to me that data-driven applications, there's nobody, no company really shouldn't be data-driven. Do you buy that? >> Yeah, absolutely. I mean, data-driven, and that naturally the whole industry is data-driven, right? It's like we all have information technologies about processing data and deriving information out of it. But when it comes to app development I think there is a big push to kind of like we have to do machine learning in our applications, we have to get insights from data. And when you actually look back a bit and take a step back, you see that there's of course many different kinds of applications out there as well that's not to be forgotten, right? So there is a usual front end user interfaces where really the application all it does is just entering some piece of information that's stored somewhere or perhaps a microservice that's not attached to a data to you at all but just receives or asks calls (indistinct). So I think it's not necessarily so important for every developer to kind of go on a bandwagon that they have to be data-driven. But I think it's equally important for those applications and those developers that build applications, that drive the business, that make business critical decisions as Maria mentioned before. Those guys should take really a close look into what data-driven apps means and what the data to you can actually give to them. Because what we see also happening a lot is that a lot of the things that are well known and out there just ready to use are being reimplemented in the applications. And for those applications, they essentially just ended up spending more time writing codes that will be already there and then have to maintain and debug the code as well rather than just going to market faster. >> Gerald can you talk to the prevailing approaches that developers take to build data-driven applications? What are the ones that you see? Let's dig into that a little bit more and maybe differentiate the different approaches and talk about that? >> Yeah, absolutely. I think right now the industry is like in two camps, it's like sort of a religious war going on that you'll see often happening with different architectures and so forth going on. So we have single purpose databases or data management technologies. Which are technologies that are as the name suggests build around a single purpose. So it's like, you know a typical example would be your ordinary key-value store. And a key-value store all it does is it allows you to store and retrieve a piece of data whatever that may be really, really fast but it doesn't really go beyond that. And then the other side of the house or the other camp would be multimodal databases, multimodal data management technologies. Those are technologies that allow you to store different types of data, different formats of data in the same technology in the same system alongside. And, you know, when you look at the geographics out there of what we have from technology, is pretty much any relational database or any database really has evolved into such a multimodal database. Whether that's MySQL that allows you to store or chase them alongside relational or even a MongoDB that allows you to do or gives you native graph support since (mumbles) and as well alongside the adjacent support. >> Well, it's clearly a trend in the industry. We've talked about this a lot in The Cube. We know where Oracle stands on this. I mean, you just mentioned MySQL but I mean, Oracle Databases you've been extending, you've mentioned JSON, we've got blockchain now in there you're infusing, you know ML and AI into the database, graph database capabilities, you know on and on and on. We talked a lot about we compared that to Amazon which is kind of the right tool, the right job approach. So maybe you could talk about, you know, your point of view, the benefits for developers of using that converged database if I can use that word approach being able to store multiple data formats? Why do you feel like that's a better approach? >> Yeah, I think on a high level it comes down to complexity. You are actually avoiding additional complexity, right? So not every use case that you have necessarily warrants to have yet another data management technology or yet the special build technology for managing that data, right? It's like many use cases that we see out there happily want to just store a piece of a chase and document, a piece of chase in a database and then perhaps retrieve it again afterwards so write some simple queries over it. And you really don't have to get a new database technology or a NoSQL database into the mix if you already have some to just fulfill that exact use case. You could just happily store that information as well in the database you already have. And what it really comes down to is the learning curve for developers, right? So it's like, as you use the same technology to store other types of data, you don't have to learn a new technology, you don't have to associate yourself with new and learn new drivers. You don't have to find new frameworks and you don't have to know how to necessarily operate or best model your data for that database. You can essentially just reuse your knowledge of the technology as well as the libraries and code you have already built in house perhaps in another application, perhaps, you know framework that you used against the same technology because it is still the same technology. So, kind of all comes down again to avoiding complexity rather than not fragmenting you know, the many different technologies we have. If you were to look at the different data formats that are out there today it's like, you know, you would end up with many different databases just to store them if you were to fully religiously follow the single purpose best built technology for every use case paradigm, right? And then you would just end up having to manage many different databases more than actually focusing on your app and getting value to your business or to your user. >> Okay, so I get that and I buy that by the way. I mean, especially if you're a larger organization and you've got all these projects going on but before we go back to Maria, Gerald, I want to just, I want to push on that a little bit. Because the counter to that argument would be in the analogy. And I wonder if you, I'd love for you to, you know knock this analogy off the blocks. The counter would be okay, Oracle is the Swiss Army knife and it's got, you know, all in one. But sometimes I need that specialized long screwdriver and I go into my toolbox and I grab that. It's better than the screwdriver in my Swiss Army knife. Why, are you the Swiss Army knife of databases? Or are you the all-in-one have that best of breed screwdriver for me? How do you think about that? >> Yeah, that's a fantastic question, right? And I think it's first of all, you have to separate between Oracle the company that has actually multiple data management technologies and databases out there as you said before, right? And Oracle Database. And I think Oracle Database is definitely a Swiss Army knife has many capabilities of since the last 40 years, you know that we've seen object support coming that's still in the Oracle Database today. We have seen XML coming, it's still in the Oracle Database, graph, spatial, et cetera. And so you have many different ways of managing your data and then on top of that going into the converge, not only do we allow you to store the different data model in there but we actually allow you also to, you apply all the security policies and so forth on top of it something Maria can talk more about the mission around converged database. I would also argue though that for some aspects, we do actually have to or add a screwdriver that you talked about as well. So especially in the relational world people get very quickly hung up on this idea that, oh, if you only do rows and columns, well, that's kind of what you put down on disk. And that was never true, it's the relational model is actually a logical model. What's probably being put down on disk is blocks that align themselves nice with block storage and always has been. So that allows you to actually model and process the data sort of differently. And one common example or one good example that we have that we introduced a couple of years ago was when, column and databases were very strong and you know, the competition came it's like, yeah, we have In-Memory column that stores now they're so much better. And we were like, well, orienting the data role-based or column-based really doesn't matter in the sense that we store them as blocks on disks. And so we introduced the in memory technology which gives you an In-Memory column, a representation of your data as well alongside your relational. So there is an example where you go like, well, actually you know, if you have this use case of the column or analytics all In-Memory, I would argue Oracle Database is also that screwdriver you want to go down to and gives you that capability. Because not only gives you representation in columnar, but also which many people then forget all the analytic power on top of SQL. It's one thing to store your data columnar, it's a completely different story to actually be able to run analytics on top of that and having all the built-in functionalities and stuff that you want to do with the data on top of it as you analyze it. >> You know, that's a great example, the kilometer 'cause I remember there was like a lot of hype around it. Oh, it's the Oracle killer, you know, at Vertica. Vertica is still around but, you know it never really hit escape velocity. But you know, good product, good company, whatever. Natezza, it kind of got buried inside of IBM. ParXL kind of became, you know, red shift with that deal so that kind of went away. Teradata bought a company, I forget which company it bought but. So that hype kind of disapated and now it's like, oh yeah, columnar. It's kind of like In-Memory, we've had a In-Memory databases ever since we've had databases you know, it's a kind of a feature not a sector. But anyway, Maria, let's come back to you. You've got a lot of customer experience. And you speak with a lot of companies, you know during your time at Oracle. What else are you seeing in terms of the benefits to this approach that might not be so intuitive and obvious right away? >> I think one of the biggest benefits to having a multimodel multiworkload or as we call it a converged database, is the fact that you can get greater data synergy from it. In other words, you can utilize all these different techniques and data models to get better value out of that data. So things like being able to do real-time machine learning, fraud detection inside a transaction or being able to do a product recommendation by accessing three different data models. So for example, if I'm trying to recommend a product for you Dave, I might use graph analytics to be able to figure out your community. Not just your friends, but other people on our system who look and behave just like you. Once I know that community then I can go over and see what products they bought by looking up our product catalog which may be stored as JSON. And then on top of that I can then see using the key-value what products inside that catalog those community members gave a five star rating to. So that way I can really pinpoint the right product for you. And I can do all of that in one transaction inside the database without having to transform that data into different models or God forbid, access different systems to be able to get all of that information. So it really simplifies how we can generate that value from the data. And of course, the other thing our customers love is when it comes to deploying data-driven apps, when you do it on a converged database it's much simpler because it is that standard data platform. So you're not having to manage multiple independent single purpose databases. You're not having to implement the security and the high availability policies, you know across a bunch of different diverse platforms. All of that can be done much simpler with a converged database 'cause the DBA team of course, is going to just use that standard set of tools to manage, monitor and secure those systems. >> Thank you for that. And you know, it's interesting, you talk about simplification and you are in Juan's organization so you've big focus on mission critical. And so one of the things that I think is often overlooked well, we talk about all the time is recovery. And if things are simpler, recovery is faster and easier. And so it's kind of the hallmark of Oracle is like the gold standard of the toughest apps, the most mission critical apps. But I wanted to get to the cloud Maria. So because everything is going to the cloud, right? Not all workloads are going to the cloud but everybody is talking about the cloud. Everybody has cloud first mentality and so yes, it's a hybrid world. But the natural next question is how do you think the cloud fits into this world of data-driven apps? >> I think just like any app that you're developing, the cloud helps to accelerate that development. And of course the deployment of these data-driven applications. 'Cause if you think about it, the developer is instantly able to provision a converged database that Oracle will automatically manage and look after for them. But what's great about doing something like that if you use like our autonomous database service is that it comes in different flavors. So you can get autonomous transaction processing, data warehousing or autonomous JSON so that the developer is going to get a database that's been optimized for their specific use case, whatever they are trying to solve. And it's also going to contain all of that great functionality and capabilities that we've been talking about. So what that really means to the developer though is as the project evolves and inevitably the business needs change a little, there's no need to panic when one of those changes comes in because your converged database or your autonomous database has all of those additional capabilities. So you can simply utilize those to able to address those evolving changes in the project. 'Cause let's face it, none of us normally know exactly what we need to build right at the very beginning. And on top of that they also kind of get a built-in buddy in the cloud, especially in the autonomous database. And that buddy comes in the form of built-in workload optimizations. So with the autonomous database we do things like automatic indexing where we're using machine learning to be that buddy for the developer. So what it'll do is it'll monitor the workload and see what kind of queries are being run on that system. And then it will actually determine if there are indexes that should be built to help improve the performance of that application. And not only does it bill those indexes but it verifies that they help improve the performance before publishing it to the application. So by the time the developer is finished with that app and it's ready to be deployed, it's actually also been optimized by the developers buddy, the Oracle autonomous database. So, you know, it's a really nice helping hand for developers when they're building any app especially data-driven apps. >> I like how you sort of gave us, you know the truth here is you don't always know where you're going when you're building an app. It's like it goes from you are trying to build it and they will come to start building it and we'll figure out where it's going to go. With Agile that's kind of how it works. But so I wonder, can you give some examples of maybe customers or maybe genericize them if you need to. Data-driven apps in the cloud where customers were able to drive more efficiency, where the cloud buddy allowed the customers to do more with less? >> No, we have tons of these but I'll try and keep it to just a couple. One that comes to mind straight away is retrace. These folks built a blockchain app in the Oracle Cloud that allows manufacturers to actually share the supply chain with the consumer. So the consumer can see exactly, who made their product? Using what raw materials? Where they were sourced from? How it was done? All of that is visible to the consumer. And in order to be able to share that they had to work on a very diverse set of data. So they had everything from JSON documents to images as well as your traditional transactions in there. And they store all of that information inside the Oracle autonomous database, they were able to build their app and deploy it on the cloud. And they were able to do all of that very, very quickly. So, you know, that ability to work on multiple different data types in a single database really helped them build that product and get it to market in a very short amount of time. Another customer that's doing something really, really interesting is MindSense. So these guys operate the largest mines in Canada, Chile, and Peru. But what they do is they put these x-ray devices on the massive mechanical shovels that are at the cove or at the mine face. And what that does is it senses the contents of the buckets inside these mining machines. And it's looking to see at that content, to see how it can optimize the processing of the ore inside in that bucket. So they're looking to minimize the amount of power and water that it's going to take to process that. And also of course, minimize the amount of waste that's going to come out of that project. So all of that sensor data is sent into an autonomous database where it's going to be processed by a whole host of different users. So everything from the mine engineers to the geo scientists, to even their own data scientists utilize that data to drive their business forward. And what I love about these guys is they're not happy with building just one app. MindSense actually use our built-in low core development environment, APEX that comes as part of the autonomous database and they actually produce applications constantly for different aspects of their business using that technology. And it's actually able to accelerate those new apps to the business. It takes them now just a couple of days or weeks to produce an app instead of months or years to build those new apps. >> Great, thank you for that Maria. Gerald, I'm going to push you again. So, I said upfront and talked about microservices and the cloud and containers and you know, anybody in the developer space follows that very closely. But some of the things that we've been talking about here people might look at that and say, well, they're kind of antithetical to microservices. This is our Oracles monolithic approach. But when you think about the benefits of microservices, people want freedom of choice, technology choice, seen as a big advantage of microservices and containers. How do you address such an argument? >> Yeah, that's an excellent question and I get that quite often. The microservices architecture in general as I said before had architectures, Linux distributions, et cetera. It's kind of always a bit of like there's an academic approach and there's a pragmatic approach. And when you look at the microservices the original definitions that came out at the early 2010s. They actually never said that each microservice has to have a database. And they also never said that if a microservice has a database, you have to use a different technology for each microservice. Just like they never said, you have to write a microservice in a different programming language, right? So where I'm going with this is like, yes you know, sometimes when you look at some vendors out there, some niche players, they push this message or they jump on this academic approach of like each microservice has the best tool at hand or I'd use a different database for your purpose, et cetera. Which almost often comes across like us. You know, we want to stay part of the conversation. Nothing stops a developer from, you know using a multimodal database for the microservice and just using that as a document store, right? Or just using that as a relational database. And, you know, sometimes I mean, it was actually something that happened that was really interesting yesterday I don't know whether you follow Dave or not. But Facebook had an outage yesterday, right? And Facebook is one of those companies that are seen as the Silicon Valley, you know know how to do microservices companies. And when you add through the outage, well, what happened, right? Some unfortunate logical error with configuration as a force that took a database cluster down. So, you know, there you have it where you go like, well, maybe not every microservice is actually in fact talking to its own database or its own special purpose database. I think there, you know, well, what we should, the industry should be focusing much more on this argument of which technology to use? What's the right tool for a job? Is more to ask themselves, what business problem actually are we trying to solve? And therefore what's the right approach and the right technology for this. And so therefore, just as I said before, you know multimodal databases they do have strong benefits. They have many built-in functionalities that are already there and they allow you to reduce this complexity of having to know many different technologies, right? And so it's not only to store different data models either you know, treat a multimodal database as a chasing documents store or a relational database but most databases are multimodal since 20 plus years. But it's also actually being able to perhaps if you store that data together, you can perhaps actually derive additional value for somebody else but perhaps not for your application. But like for example, if you were to use Oracle Database you can actually write queries on top of all of that data. It doesn't really matter for our query engine whether it's the data is format that then chase or the data is formatted in rows and columns you can just rather than query over it. And that's actually very powerful for those guys that have to, you know get the reporting done the end of the day, the end of the week. And for those guys that are the data scientists that they want to figure out, you know which product performed really well or can we tweak something here and there. When you look into that space you still see a huge divergence between the guys to put data in kind of the altarpiece style and guys that try to derive new insights. And there's still a lot of ETL going around and, you know we have big data technologies that some of them come and went and some of them came in that are still around like Apache Spark which is still like a SQL engine on top of any of your data kind of going back to the same concept. And so I will say that, you know, for developers when we look at microservices it's like, first of all, is the argument you were making because the vendor or the technology you want to use tells you this argument or, you know, you kind of want to have an argument to use a specific technology? Or is it really more because it is the best technology, to best use for this given use case for this given application that you have? And if so there's of course, also nothing wrong to use a single purpose technology either, right? >> Yeah, I mean, whenever I talk about Oracle I always come back to the most important applications, the mission critical. It's very difficult to architect databases with microservices and containers. You have to be really, really careful. And so and again, it comes back to what we were talking before about with Maria that the complexity and the recovery. But Gerald I want to stay with you for a minute. So there's other data management technologies popping out there. I mean, I've seen some people saying, okay just leave the data in an S3 bucket. We can query that, then we've got some magic sauce to do that. And so why are you optimistic about you know, traditional database technology going forward? >> I would say because of the history of databases. So one thing that once struck me when I came to Oracle and then got to meet great people like Juan Luis and Andy Mendelsohn who had been here for a long, long time. I come to realization that relational databases are around for about 45 years now. And, you know, I was like, I'm too young to have been around then, right? So I was like, what else was around 45 years? It's like just the tech stack that we have today. It's like, how does this look like? Well, Linux only came out in 93. Well, databases pre-date Linux a lot rather than as I started digging I saw a lot of technologies come and go, right? And you mentioned before like the technologies that data management systems that we had that came and went like the columnar databases or XML databases, object databases. And even before relational databases before Cot gave us the relational model there were apparently these networks stores network databases which to some extent look very similar to adjacent documents. There wasn't a harder storing data and a hierarchy to format. And, you know when you then start actually reading the Cot paper and diving a little bit more into the relation model, that's I think one important crux in there that most of the industry keeps forgetting or it hasn't been around to even know. And that is that when Cot created the relational model, he actually focused not so much on the application putting the data in, but on future users and applications still being able to making sense out of the data, right? And that's kind of like I said before we had those network models, we had XML databases you have adjacent documents stores. And the one thing that they all have along with it is like the application that puts the data in decides the structure of the data. And that's all well and good if you had an application of the developer writing an application. It can become really tricky when 10 years later you still want to look at that data and the application that the developer is no longer around then you go like, what does this all mean? Where is the structure defined? What is this attribute? What does it mean? How does it correlate to others? And the one thing that people tend to forget is that it's actually the data that's here to stay not someone who does the applications where it is. Ideally, every company wants to store every single byte of data that they have because there might be future value in it. Economically may not make sense that's now much more feasible than just years ago. But if you could, why wouldn't you want to store all your data, right? And sometimes you actually have to store the data for seven years or whatever because the laws require you to. And so coming back then and you know, like 10 years from now and looking at the data and going like making sense of that data can actually become a lot more difficult and a lot more challenging than having to first figure out and how we store this data for general use. And that kind of was what the relational model was all about. We decompose the data structures into tables and columns with relationships amongst each other so therefore between each other. So that therefore if somebody wants to, you know typical example would be well you store some purchases from your web store, right? There's a customer attribute in it. There's some credit card payment information in it, just some product information on what the customer bought. Well, in the relational model if you just want to figure out which products were sold on a given day or week, you just would query the payment and products table to get the sense out of it. You don't need to touch the customer and so forth. And with the hierarchical model you have to first sit down and understand how is the structure, what is the customer? Where is the payment? You know, does the document start with the payment or does it start with the customer? Where do I find this information? And then in the very early days those databases even struggled to then not having to scan all the documents to get the data out. So coming back to your question a bit, I apologize for going on here. But you know, it's like relational databases have been around for 45 years. I actually argue it's one of the most successful software technologies that we have out there when you look in the overall industry, right? 45 years is like, in IT terms it's like from a star being the ones who are going supernova. You have said it before that many technologies coming and went, right? And just want to add a more really interesting example by the way is Hadoop and HDFS, right? They kind of gave us this additional promise of like, you know, the 2010s like 2012, 2013 the hype of Hadoop and so forth and (mumbles) and HDFS. And people are just like, just put everything into HDFS and worry about the data later, right? And we can query it and map reduce it and whatever. And we had customers actually coming to us they were like, great we have half a petabyte of data on an HDFS cluster and we have no clue what's stored in there. How do we figure this out? What are we going to do now? Now you had a big data cleansing problem. And so I think that is why databases and also data modeling is something that will not go away anytime soon. And I think databases and database technologies are here for quite a while to stay. Because many of those are people they don't think about what's happening to the data five years from now. And many of the niche players also and also frankly even Amazon you know, following with this single purpose thing is like, just use the right tool for the job for your application, right? Just pull in the data there the way you wanted. And it's like, okay, so you use technologies all over the place and then five years from now you have your data fragmented everywhere in different formats and, you know inconsistencies, and, and, and. And those are usually when you come back to this data-driven business critical business decision applications the worst case scenario you can have, right? Because now you need an army of people to actually do data cleansing. And there's not a coincidence that data science has become very, very popular the last recent years as we kind of went on with this proliferation of different database or data management technologies some of those are not even database. But I think I leave it at that. >> It's an interesting talk track because you're right. I mean, no schema on right was alluring, but it definitely created some problems. It also created an entire, you know you referenced the hyper specialized roles and did the data cleansing component. I mean, maybe technology will eventually solve that problem but it hasn't up at least up tonight. Okay, last question, Maria maybe you could start off and Gerald if you want to chime in as well it'd be great. I mean, it's interesting to watch this industry when Oracle sort of won the top database mantle. I mean, I watched it, I saw it. It was, remember it was Informix and it was (indistinct) too and of course, Microsoft you got to give them credit with SQL server, but Oracle won the database wars. And then everything got kind of quiet for awhile database was sort of boring. And then it exploded, you know, all the, you know not only SQL and the key-value stores and the cloud databases and this is really a hot area now. And when we looked at Oracle we said, okay, Oracle it's all about Oracle Database, but we've seen the kind of resurgence in MySQL which everybody thought, you know once Oracle bought Sun they were going to kill MySQL. But now we see you investing in HeatWave, TimesTen, we talked about In-Memory databases before. So where do those fit in Maria in the grand scheme? How should we think about Oracle's database portfolio? >> So there's lots of places where you'd use those different things. 'Cause just like any other industry there are going to be new and boutique use cases that are going to benefit from a more specialized product or single purpose product. So good examples off the top of my head of the kind of systems that would benefit from that would be things like a stock exchange system or a telephone exchange system. Both of those are latency critical transaction processing applications where they need microsecond response times. And that's going to exceed perhaps what you might normally get or deploy with a converged database. And so Oracle's TimesTen database our In-Memory database is perfect for those kinds of applications. But there's also a host of MySQL applications out there today and you said it yourself there Dave, HeatWave is a great place to provision and deploy those kinds of applications because it's going to run 100 times faster than AWS (mumbles). So, you know, there really is a place in the market and in our customer's systems and the needs they have for all of these different members of our database family here at Oracle. >> Yeah, well, the internet is basically running in the lamp stack so I see MySQL going away. All right Gerald, will give you the final word, bring us home. >> Oh, thank you very much. Yeah, I mean, as Maria said, I think it comes back to what we discussed before. There is obviously still needs for special technologies or different technologies than a relational database or multimodal database. Oracle has actually many more databases that people may first think of. Not only the three that we have already mentioned but there's even SP so the Oracle's NoSQL database. And, you know, on a high level Oracle is a data management company, right? And we want to give our customers the best tools and the best technology to manage all of their data. Rather than therefore there has to be a need or there should be a part of the business that also focuses on this highly specialized systems and this highly specialized technologies that address those use cases. And I think it makes perfect sense. It's like, you know, when the customer comes to Oracle they're not only getting this, take this one product you know, and if you don't like it your problem but actually you have choice, right? And choice allows you to make a decision based on what's best for you and not necessarily best for the vendor you're talking to. >> Well guys, really appreciate your time today and your insights. Maria, Gerald, thanks so much for coming on The Cube. >> Thank you very much for having us. >> And thanks for watching this Cube conversation this is Dave Vellante and we'll see you next time. (upbeat music)

Published Date : Jun 24 2021

SUMMARY :

in the world of digital and cloud. and the benefits they bring What are we really talking about there? the nearest stores to kind of the traditional So it really changes the way So Gerald, you think about to you at all but just receives or even a MongoDB that allows you to do ML and AI into the database, in the database you already have. and I buy that by the way. of since the last 40 years, you know the benefits to this approach is the fact that you can get And so one of the things that And that buddy comes in the form of the truth here is you don't and deploy it on the cloud. and the cloud and containers and you know, is the argument you were making that the complexity and the recovery. because the laws require you to. And then it exploded, you and the needs they have in the lamp stack so I and the best technology to and your insights. we'll see you next time.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Gerald Venzl	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Maria	PERSON	0.99+
Chile	LOCATION	0.99+
Peru	LOCATION	0.99+
Maria Colgan	PERSON	0.99+
Canada	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
Gerald	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Maria Colgan	PERSON	0.99+
seven years	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Juan Luis	PERSON	0.99+
100 times	QUANTITY	0.99+
five star	QUANTITY	0.99+
Dave	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
two experts	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Sun	ORGANIZATION	0.99+
45 years	QUANTITY	0.99+
MySQL	TITLE	0.99+
three	QUANTITY	0.99+
yesterday	DATE	0.99+
each microservice	QUANTITY	0.99+
Swiss Army	ORGANIZATION	0.99+
early 2010s	DATE	0.99+
Teradata	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
Linux	TITLE	0.99+
10 years later	DATE	0.99+
2012	DATE	0.99+
two camps	QUANTITY	0.99+
SQL	TITLE	0.99+
Both	QUANTITY	0.98+
Oracle Database	TITLE	0.98+
2010s	DATE	0.98+
TimesTen	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
first	QUANTITY	0.98+
Oracles	ORGANIZATION	0.98+
Vertica	ORGANIZATION	0.98+
tonight	DATE	0.98+
2013	DATE	0.98+

Maria Colgan & Gerald Venzl, Oracle | June CUBEconversation

(upbeat music) >> It'll be five, four, three and then silent two, one, and then you guys just follow my lead. We're just making some last minute adjustments. Like I said, we're down two hands today. So, you good Alex? Okay, are you guys ready? >> I'm ready. >> Ready. >> I got to get get one note here. >> So I noticed Maria you stopped anyway, so I have time. >> Just so they know Dave and the Boston Studio, are they both kind of concurrently be on film even when they're not speaking or will only the speaker be on film for like if Gerald's drawing while Maria is talking about-- >> Sorry but then I missed one part of my onboarding spiel. There should be, if you go into gallery there should be a label. There should be something labeled Boston live switch feed. If you pin that gallery view you'll see what our program currently being recorded is. So any time you don't see yourself on that feed is an excellent time to take a drink of water, scratch your nose, check your notes. Do whatever you got to do off screen. >> Can you give us a three shot, Alex? >> Yes, there it is. >> And then go to me, just give me a one-shot to Dave. So when I'm here you guys can take a drink or whatever >> That makes sense? >> Yeah. >> Excellent, I will get my recordings restarted and we'll open up when Dave's ready. >> All right, you guys ready? >> Ready. >> All right Steve, you go on mute. >> Okay, on me in 5, 4, 3. Developers have become the new king makers in the world of digital and cloud. The rise of containers and microservices has accelerated the transition to cloud native applications. A lot of people will talk about application architecture and the related paradigms and the benefits they bring for the process of writing and delivering new apps. But a major challenge continues to be, the how and the what when it comes to accessing, processing and getting insights from the massive amounts of data that we have to deal with in today's world. And with me are two experts from the data management world who will share with us how they think about the best techniques and practices based on what they see at large organizations who are working with data and developing so-called data-driven apps. Please welcome Maria Colgan and Gerald Venzl, two distinguish product managers from Oracle. Folks, welcome, thanks so much for coming on. >> Thanks for having us Dave. >> Thank you very much for having us. >> Okay, Maria let's start with you. So, we throw around this term data-driven, data-driven applications. What are we really talking about there? >> So data-driven applications are applications that work on a diverse set of data. So anything from spatial to sensor data, document data as well as your usual transaction processing data. And what they're going to do is they'll generate value from that data in very different ways to a traditional application. So for example, they may use machine learning, they are able to do product recommendations in the middle of a transaction. Or we could use graph to be able to identify an influencer within the community so we can target them with a specific promotion. It could also use spatial data to be able to help find the nearest stores to a particular customer. And because these apps are deployed on multiple platforms, everything from mobile devices as well as standard browsers, they need a data platform that's going to be both secure, reliable and scalable. >> Well, so when you think about how the workloads are shifting I mean, we're not talking about, you know it's not anymore a world of just your ERP or your HCM or your CRM, you know kind of the traditional operational systems. You really are seeing an explosion of these new data oriented apps. You're seeing, you know, modeling in the cloud, you are going to see more and more inferencing, inferencing at the edge. But Maria maybe you could talk a little bit about sort of the benefits that customers are seeing from developing these types of applications. I mean, why should people care about data-driven apps? >> Oh, for sure, there's massive benefits to them. I mean, probably the most obvious one for any business regardless of the industry, is that they not only allow you to understand what your customers are up to, but they allow you to be able to anticipate those customer's needs. So that helps businesses maintain that competitive edge and retain their customers. But it also helps them make data-driven decisions in real time based on actual data rather than on somebody's gut feeling or basing those decisions on historical data. So for example, you can do real-time price adjustments on products based on demand and so forth, that kind of thing. So it really changes the way people do business today. >> So Gerald, you think about the narrative in the industry everybody wants to be a platform player all your customers they are becoming software companies, they are becoming platform players. Everybody wants to be like, you know name a company that is huge trillion dollar market cap or whatever, and those are data-driven companies. And so it would seem to me that data-driven applications, there's nobody, no company really shouldn't be data-driven. Do you buy that? >> Yeah, absolutely. I mean, data-driven, and that naturally the whole industry is data-driven, right? It's like we all have information technologies about processing data and deriving information out of it. But when it comes to app development I think there is a big push to kind of like we have to do machine learning in our applications, we have to get insights from data. And when you actually look back a bit and take a step back, you see that there's of course many different kinds of applications out there as well that's not to be forgotten, right? So there is a usual front end user interfaces where really the application all it does is just entering some piece of information that's stored somewhere or perhaps a microservice that's not attached to a data to you at all but just receives or asks calls (indistinct). So I think it's not necessarily so important for every developer to kind of go on a bandwagon that they have to be data-driven. But I think it's equally important for those applications and those developers that build applications, that drive the business, that make business critical decisions as Maria mentioned before. Those guys should take really a close look into what data-driven apps means and what the data to you can actually give to them. Because what we see also happening a lot is that a lot of the things that are well known and out there just ready to use are being reimplemented in the applications. And for those applications, they essentially just ended up spending more time writing codes that will be already there and then have to maintain and debug the code as well rather than just going to market faster. >> Gerald can you talk to the prevailing approaches that developers take to build data-driven applications? What are the ones that you see? Let's dig into that a little bit more and maybe differentiate the different approaches and talk about that? >> Yeah, absolutely. I think right now the industry is like in two camps, it's like sort of a religious war going on that you'll see often happening with different architectures and so forth going on. So we have single purpose databases or data management technologies. Which are technologies that are as the name suggests build around a single purpose. So it's like, you know a typical example would be your ordinary key-value store. And a key-value store all it does is it allows you to store and retrieve a piece of data whatever that may be really, really fast but it doesn't really go beyond that. And then the other side of the house or the other camp would be multimodal databases, multimodal data management technologies. Those are technologies that allow you to store different types of data, different formats of data in the same technology in the same system alongside. And, you know, when you look at the geographics out there of what we have from technology, is pretty much any relational database or any database really has evolved into such a multimodal database. Whether that's MySQL that allows you to store or chase them alongside relational or even a MongoDB that allows you to do or gives you native graph support since (mumbles) and as well alongside the adjacent support. >> Well, it's clearly a trend in the industry. We've talked about this a lot in The Cube. We know where Oracle stands on this. I mean, you just mentioned MySQL but I mean, Oracle Databases you've been extending, you've mentioned JSON, we've got blockchain now in there you're infusing, you know ML and AI into the database, graph database capabilities, you know on and on and on. We talked a lot about we compared that to Amazon which is kind of the right tool, the right job approach. So maybe you could talk about, you know, your point of view, the benefits for developers of using that converged database if I can use that word approach being able to store multiple data formats? Why do you feel like that's a better approach? >> Yeah, I think on a high level it comes down to complexity. You are actually avoiding additional complexity, right? So not every use case that you have necessarily warrants to have yet another data management technology or yet the special build technology for managing that data, right? It's like many use cases that we see out there happily want to just store a piece of a chase and document, a piece of chase in a database and then perhaps retrieve it again afterwards so write some simple queries over it. And you really don't have to get a new database technology or a NoSQL database into the mix if you already have some to just fulfill that exact use case. You could just happily store that information as well in the database you already have. And what it really comes down to is the learning curve for developers, right? So it's like, as you use the same technology to store other types of data, you don't have to learn a new technology, you don't have to associate yourself with new and learn new drivers. You don't have to find new frameworks and you don't have to know how to necessarily operate or best model your data for that database. You can essentially just reuse your knowledge of the technology as well as the libraries and code you have already built in house perhaps in another application, perhaps, you know framework that you used against the same technology because it is still the same technology. So, kind of all comes down again to avoiding complexity rather than not fragmenting you know, the many different technologies we have. If you were to look at the different data formats that are out there today it's like, you know, you would end up with many different databases just to store them if you were to fully religiously follow the single purpose best built technology for every use case paradigm, right? And then you would just end up having to manage many different databases more than actually focusing on your app and getting value to your business or to your user. >> Okay, so I get that and I buy that by the way. I mean, especially if you're a larger organization and you've got all these projects going on but before we go back to Maria, Gerald, I want to just, I want to push on that a little bit. Because the counter to that argument would be in the analogy. And I wonder if you, I'd love for you to, you know knock this analogy off the blocks. The counter would be okay, Oracle is the Swiss Army knife and it's got, you know, all in one. But sometimes I need that specialized long screwdriver and I go into my toolbox and I grab that. It's better than the screwdriver in my Swiss Army knife. Why, are you the Swiss Army knife of databases? Or are you the all-in-one have that best of breed screwdriver for me? How do you think about that? >> Yeah, that's a fantastic question, right? And I think it's first of all, you have to separate between Oracle the company that has actually multiple data management technologies and databases out there as you said before, right? And Oracle Database. And I think Oracle Database is definitely a Swiss Army knife has many capabilities of since the last 40 years, you know that we've seen object support coming that's still in the Oracle Database today. We have seen XML coming, it's still in the Oracle Database, graph, spatial, et cetera. And so you have many different ways of managing your data and then on top of that going into the converge, not only do we allow you to store the different data model in there but we actually allow you also to, you apply all the security policies and so forth on top of it something Maria can talk more about the mission around converged database. I would also argue though that for some aspects, we do actually have to or add a screwdriver that you talked about as well. So especially in the relational world people get very quickly hung up on this idea that, oh, if you only do rows and columns, well, that's kind of what you put down on disk. And that was never true, it's the relational model is actually a logical model. What's probably being put down on disk is blocks that align themselves nice with block storage and always has been. So that allows you to actually model and process the data sort of differently. And one common example or one good example that we have that we introduced a couple of years ago was when, column and databases were very strong and you know, the competition came it's like, yeah, we have In-Memory column that stores now they're so much better. And we were like, well, orienting the data role-based or column-based really doesn't matter in the sense that we store them as blocks on disks. And so we introduced the in memory technology which gives you an In-Memory column, a representation of your data as well alongside your relational. So there is an example where you go like, well, actually you know, if you have this use case of the column or analytics all In-Memory, I would argue Oracle Database is also that screwdriver you want to go down to and gives you that capability. Because not only gives you representation in columnar, but also which many people then forget all the analytic power on top of SQL. It's one thing to store your data columnar, it's a completely different story to actually be able to run analytics on top of that and having all the built-in functionalities and stuff that you want to do with the data on top of it as you analyze it. >> You know, that's a great example, the kilometer 'cause I remember there was like a lot of hype around it. Oh, it's the Oracle killer, you know, at Vertica. Vertica is still around but, you know it never really hit escape velocity. But you know, good product, good company, whatever. Natezza, it kind of got buried inside of IBM. ParXL kind of became, you know, red shift with that deal so that kind of went away. Teradata bought a company, I forget which company it bought but. So that hype kind of disapated and now it's like, oh yeah, columnar. It's kind of like In-Memory, we've had a In-Memory databases ever since we've had databases you know, it's a kind of a feature not a sector. But anyway, Maria, let's come back to you. You've got a lot of customer experience. And you speak with a lot of companies, you know during your time at Oracle. What else are you seeing in terms of the benefits to this approach that might not be so intuitive and obvious right away? >> I think one of the biggest benefits to having a multimodel multiworkload or as we call it a converged database, is the fact that you can get greater data synergy from it. In other words, you can utilize all these different techniques and data models to get better value out of that data. So things like being able to do real-time machine learning, fraud detection inside a transaction or being able to do a product recommendation by accessing three different data models. So for example, if I'm trying to recommend a product for you Dave, I might use graph analytics to be able to figure out your community. Not just your friends, but other people on our system who look and behave just like you. Once I know that community then I can go over and see what products they bought by looking up our product catalog which may be stored as JSON. And then on top of that I can then see using the key-value what products inside that catalog those community members gave a five star rating to. So that way I can really pinpoint the right product for you. And I can do all of that in one transaction inside the database without having to transform that data into different models or God forbid, access different systems to be able to get all of that information. So it really simplifies how we can generate that value from the data. And of course, the other thing our customers love is when it comes to deploying data-driven apps, when you do it on a converged database it's much simpler because it is that standard data platform. So you're not having to manage multiple independent single purpose databases. You're not having to implement the security and the high availability policies, you know across a bunch of different diverse platforms. All of that can be done much simpler with a converged database 'cause the DBA team of course, is going to just use that standard set of tools to manage, monitor and secure those systems. >> Thank you for that. And you know, it's interesting, you talk about simplification and you are in Juan's organization so you've big focus on mission critical. And so one of the things that I think is often overlooked well, we talk about all the time is recovery. And if things are simpler, recovery is faster and easier. And so it's kind of the hallmark of Oracle is like the gold standard of the toughest apps, the most mission critical apps. But I wanted to get to the cloud Maria. So because everything is going to the cloud, right? Not all workloads are going to the cloud but everybody is talking about the cloud. Everybody has cloud first mentality and so yes, it's a hybrid world. But the natural next question is how do you think the cloud fits into this world of data-driven apps? >> I think just like any app that you're developing, the cloud helps to accelerate that development. And of course the deployment of these data-driven applications. 'Cause if you think about it, the developer is instantly able to provision a converged database that Oracle will automatically manage and look after for them. But what's great about doing something like that if you use like our autonomous database service is that it comes in different flavors. So you can get autonomous transaction processing, data warehousing or autonomous JSON so that the developer is going to get a database that's been optimized for their specific use case, whatever they are trying to solve. And it's also going to contain all of that great functionality and capabilities that we've been talking about. So what that really means to the developer though is as the project evolves and inevitably the business needs change a little, there's no need to panic when one of those changes comes in because your converged database or your autonomous database has all of those additional capabilities. So you can simply utilize those to able to address those evolving changes in the project. 'Cause let's face it, none of us normally know exactly what we need to build right at the very beginning. And on top of that they also kind of get a built-in buddy in the cloud, especially in the autonomous database. And that buddy comes in the form of built-in workload optimizations. So with the autonomous database we do things like automatic indexing where we're using machine learning to be that buddy for the developer. So what it'll do is it'll monitor the workload and see what kind of queries are being run on that system. And then it will actually determine if there are indexes that should be built to help improve the performance of that application. And not only does it bill those indexes but it verifies that they help improve the performance before publishing it to the application. So by the time the developer is finished with that app and it's ready to be deployed, it's actually also been optimized by the developers buddy, the Oracle autonomous database. So, you know, it's a really nice helping hand for developers when they're building any app especially data-driven apps. >> I like how you sort of gave us, you know the truth here is you don't always know where you're going when you're building an app. It's like it goes from you are trying to build it and they will come to start building it and we'll figure out where it's going to go. With Agile that's kind of how it works. But so I wonder, can you give some examples of maybe customers or maybe genericize them if you need to. Data-driven apps in the cloud where customers were able to drive more efficiency, where the cloud buddy allowed the customers to do more with less? >> No, we have tons of these but I'll try and keep it to just a couple. One that comes to mind straight away is retrace. These folks built a blockchain app in the Oracle Cloud that allows manufacturers to actually share the supply chain with the consumer. So the consumer can see exactly, who made their product? Using what raw materials? Where they were sourced from? How it was done? All of that is visible to the consumer. And in order to be able to share that they had to work on a very diverse set of data. So they had everything from JSON documents to images as well as your traditional transactions in there. And they store all of that information inside the Oracle autonomous database, they were able to build their app and deploy it on the cloud. And they were able to do all of that very, very quickly. So, you know, that ability to work on multiple different data types in a single database really helped them build that product and get it to market in a very short amount of time. Another customer that's doing something really, really interesting is MindSense. So these guys operate the largest mines in Canada, Chile, and Peru. But what they do is they put these x-ray devices on the massive mechanical shovels that are at the cove or at the mine face. And what that does is it senses the contents of the buckets inside these mining machines. And it's looking to see at that content, to see how it can optimize the processing of the ore inside in that bucket. So they're looking to minimize the amount of power and water that it's going to take to process that. And also of course, minimize the amount of waste that's going to come out of that project. So all of that sensor data is sent into an autonomous database where it's going to be processed by a whole host of different users. So everything from the mine engineers to the geo scientists, to even their own data scientists utilize that data to drive their business forward. And what I love about these guys is they're not happy with building just one app. MindSense actually use our built-in low core development environment, APEX that comes as part of the autonomous database and they actually produce applications constantly for different aspects of their business using that technology. And it's actually able to accelerate those new apps to the business. It takes them now just a couple of days or weeks to produce an app instead of months or years to build those new apps. >> Great, thank you for that Maria. Gerald, I'm going to push you again. So, I said upfront and talked about microservices and the cloud and containers and you know, anybody in the developer space follows that very closely. But some of the things that we've been talking about here people might look at that and say, well, they're kind of antithetical to microservices. This is our Oracles monolithic approach. But when you think about the benefits of microservices, people want freedom of choice, technology choice, seen as a big advantage of microservices and containers. How do you address such an argument? >> Yeah, that's an excellent question and I get that quite often. The microservices architecture in general as I said before had architectures, Linux distributions, et cetera. It's kind of always a bit of like there's an academic approach and there's a pragmatic approach. And when you look at the microservices the original definitions that came out at the early 2010s. They actually never said that each microservice has to have a database. And they also never said that if a microservice has a database, you have to use a different technology for each microservice. Just like they never said, you have to write a microservice in a different programming language, right? So where I'm going with this is like, yes you know, sometimes when you look at some vendors out there, some niche players, they push this message or they jump on this academic approach of like each microservice has the best tool at hand or I'd use a different database for your purpose, et cetera. Which almost often comes across like us. You know, we want to stay part of the conversation. Nothing stops a developer from, you know using a multimodal database for the microservice and just using that as a document store, right? Or just using that as a relational database. And, you know, sometimes I mean, it was actually something that happened that was really interesting yesterday I don't know whether you follow Dave or not. But Facebook had an outage yesterday, right? And Facebook is one of those companies that are seen as the Silicon Valley, you know know how to do microservices companies. And when you add through the outage, well, what happened, right? Some unfortunate logical error with configuration as a force that took a database cluster down. So, you know, there you have it where you go like, well, maybe not every microservice is actually in fact talking to its own database or its own special purpose database. I think there, you know, well, what we should, the industry should be focusing much more on this argument of which technology to use? What's the right tool for a job? Is more to ask themselves, what business problem actually are we trying to solve? And therefore what's the right approach and the right technology for this. And so therefore, just as I said before, you know multimodal databases they do have strong benefits. They have many built-in functionalities that are already there and they allow you to reduce this complexity of having to know many different technologies, right? And so it's not only to store different data models either you know, treat a multimodal database as a chasing documents store or a relational database but most databases are multimodal since 20 plus years. But it's also actually being able to perhaps if you store that data together, you can perhaps actually derive additional value for somebody else but perhaps not for your application. But like for example, if you were to use Oracle Database you can actually write queries on top of all of that data. It doesn't really matter for our query engine whether it's the data is format that then chase or the data is formatted in rows and columns you can just rather than query over it. And that's actually very powerful for those guys that have to, you know get the reporting done the end of the day, the end of the week. And for those guys that are the data scientists that they want to figure out, you know which product performed really well or can we tweak something here and there. When you look into that space you still see a huge divergence between the guys to put data in kind of the altarpiece style and guys that try to derive new insights. And there's still a lot of ETL going around and, you know we have big data technologies that some of them come and went and some of them came in that are still around like Apache Spark which is still like a SQL engine on top of any of your data kind of going back to the same concept. And so I will say that, you know, for developers when we look at microservices it's like, first of all, is the argument you were making because the vendor or the technology you want to use tells you this argument or, you know, you kind of want to have an argument to use a specific technology? Or is it really more because it is the best technology, to best use for this given use case for this given application that you have? And if so there's of course, also nothing wrong to use a single purpose technology either, right? >> Yeah, I mean, whenever I talk about Oracle I always come back to the most important applications, the mission critical. It's very difficult to architect databases with microservices and containers. You have to be really, really careful. And so and again, it comes back to what we were talking before about with Maria that the complexity and the recovery. But Gerald I want to stay with you for a minute. So there's other data management technologies popping out there. I mean, I've seen some people saying, okay just leave the data in an S3 bucket. We can query that, then we've got some magic sauce to do that. And so why are you optimistic about you know, traditional database technology going forward? >> I would say because of the history of databases. So one thing that once struck me when I came to Oracle and then got to meet great people like Juan Luis and Andy Mendelsohn who had been here for a long, long time. I come to realization that relational databases are around for about 45 years now. And, you know, I was like, I'm too young to have been around then, right? So I was like, what else was around 45 years? It's like just the tech stack that we have today. It's like, how does this look like? Well, Linux only came out in 93. Well, databases pre-date Linux a lot rather than as I started digging I saw a lot of technologies come and go, right? And you mentioned before like the technologies that data management systems that we had that came and went like the columnar databases or XML databases, object databases. And even before relational databases before Cot gave us the relational model there were apparently these networks stores network databases which to some extent look very similar to adjacent documents. There wasn't a harder storing data and a hierarchy to format. And, you know when you then start actually reading the Cot paper and diving a little bit more into the relation model, that's I think one important crux in there that most of the industry keeps forgetting or it hasn't been around to even know. And that is that when Cot created the relational model, he actually focused not so much on the application putting the data in, but on future users and applications still being able to making sense out of the data, right? And that's kind of like I said before we had those network models, we had XML databases you have adjacent documents stores. And the one thing that they all have along with it is like the application that puts the data in decides the structure of the data. And that's all well and good if you had an application of the developer writing an application. It can become really tricky when 10 years later you still want to look at that data and the application that the developer is no longer around then you go like, what does this all mean? Where is the structure defined? What is this attribute? What does it mean? How does it correlate to others? And the one thing that people tend to forget is that it's actually the data that's here to stay not someone who does the applications where it is. Ideally, every company wants to store every single byte of data that they have because there might be future value in it. Economically may not make sense that's now much more feasible than just years ago. But if you could, why wouldn't you want to store all your data, right? And sometimes you actually have to store the data for seven years or whatever because the laws require you to. And so coming back then and you know, like 10 years from now and looking at the data and going like making sense of that data can actually become a lot more difficult and a lot more challenging than having to first figure out and how we store this data for general use. And that kind of was what the relational model was all about. We decompose the data structures into tables and columns with relationships amongst each other so therefore between each other. So that therefore if somebody wants to, you know typical example would be well you store some purchases from your web store, right? There's a customer attribute in it. There's some credit card payment information in it, just some product information on what the customer bought. Well, in the relational model if you just want to figure out which products were sold on a given day or week, you just would query the payment and products table to get the sense out of it. You don't need to touch the customer and so forth. And with the hierarchical model you have to first sit down and understand how is the structure, what is the customer? Where is the payment? You know, does the document start with the payment or does it start with the customer? Where do I find this information? And then in the very early days those databases even struggled to then not having to scan all the documents to get the data out. So coming back to your question a bit, I apologize for going on here. But you know, it's like relational databases have been around for 45 years. I actually argue it's one of the most successful software technologies that we have out there when you look in the overall industry, right? 45 years is like, in IT terms it's like from a star being the ones who are going supernova. You have said it before that many technologies coming and went, right? And just want to add a more really interesting example by the way is Hadoop and HDFS, right? They kind of gave us this additional promise of like, you know, the 2010s like 2012, 2013 the hype of Hadoop and so forth and (mumbles) and HDFS. And people are just like, just put everything into HDFS and worry about the data later, right? And we can query it and map reduce it and whatever. And we had customers actually coming to us they were like, great we have half a petabyte of data on an HDFS cluster and we have no clue what's stored in there. How do we figure this out? What are we going to do now? Now you had a big data cleansing problem. And so I think that is why databases and also data modeling is something that will not go away anytime soon. And I think databases and database technologies are here for quite a while to stay. Because many of those are people they don't think about what's happening to the data five years from now. And many of the niche players also and also frankly even Amazon you know, following with this single purpose thing is like, just use the right tool for the job for your application, right? Just pull in the data there the way you wanted. And it's like, okay, so you use technologies all over the place and then five years from now you have your data fragmented everywhere in different formats and, you know inconsistencies, and, and, and. And those are usually when you come back to this data-driven business critical business decision applications the worst case scenario you can have, right? Because now you need an army of people to actually do data cleansing. And there's not a coincidence that data science has become very, very popular the last recent years as we kind of went on with this proliferation of different database or data management technologies some of those are not even database. But I think I leave it at that. >> It's an interesting talk track because you're right. I mean, no schema on right was alluring, but it definitely created some problems. It also created an entire, you know you referenced the hyper specialized roles and did the data cleansing component. I mean, maybe technology will eventually solve that problem but it hasn't up at least up tonight. Okay, last question, Maria maybe you could start off and Gerald if you want to chime in as well it'd be great. I mean, it's interesting to watch this industry when Oracle sort of won the top database mantle. I mean, I watched it, I saw it. It was, remember it was Informix and it was (indistinct) too and of course, Microsoft you got to give them credit with SQL server, but Oracle won the database wars. And then everything got kind of quiet for awhile database was sort of boring. And then it exploded, you know, all the, you know not only SQL and the key-value stores and the cloud databases and this is really a hot area now. And when we looked at Oracle we said, okay, Oracle it's all about Oracle Database, but we've seen the kind of resurgence in MySQL which everybody thought, you know once Oracle bought Sun they were going to kill MySQL. But now we see you investing in HeatWave, TimesTen, we talked about In-Memory databases before. So where do those fit in Maria in the grand scheme? How should we think about Oracle's database portfolio? >> So there's lots of places where you'd use those different things. 'Cause just like any other industry there are going to be new and boutique use cases that are going to benefit from a more specialized product or single purpose product. So good examples off the top of my head of the kind of systems that would benefit from that would be things like a stock exchange system or a telephone exchange system. Both of those are latency critical transaction processing applications where they need microsecond response times. And that's going to exceed perhaps what you might normally get or deploy with a converged database. And so Oracle's TimesTen database our In-Memory database is perfect for those kinds of applications. But there's also a host of MySQL applications out there today and you said it yourself there Dave, HeatWave is a great place to provision and deploy those kinds of applications because it's going to run 100 times faster than AWS (mumbles). So, you know, there really is a place in the market and in our customer's systems and the needs they have for all of these different members of our database family here at Oracle. >> Yeah, well, the internet is basically running in the lamp stack so I see MySQL going away. All right Gerald, will give you the final word, bring us home. >> Oh, thank you very much. Yeah, I mean, as Maria said, I think it comes back to what we discussed before. There is obviously still needs for special technologies or different technologies than a relational database or multimodal database. Oracle has actually many more databases that people may first think of. Not only the three that we have already mentioned but there's even SP so the Oracle's NoSQL database. And, you know, on a high level Oracle is a data management company, right? And we want to give our customers the best tools and the best technology to manage all of their data. Rather than therefore there has to be a need or there should be a part of the business that also focuses on this highly specialized systems and this highly specialized technologies that address those use cases. And I think it makes perfect sense. It's like, you know, when the customer comes to Oracle they're not only getting this, take this one product you know, and if you don't like it your problem but actually you have choice, right? And choice allows you to make a decision based on what's best for you and not necessarily best for the vendor you're talking to. >> Well guys, really appreciate your time today and your insights. Maria, Gerald, thanks so much for coming on The Cube. >> Thank you very much for having us. >> And thanks for watching this Cube conversation this is Dave Vellante and we'll see you next time. (upbeat music)

Published Date : Jun 24 2021

SUMMARY :

and then you guys just follow my lead. So I noticed Maria you stopped anyway, So any time you don't So when I'm here you guys and we'll open up when Dave's ready. and the benefits they bring What are we really talking about there? the nearest stores to kind of the traditional So for example, you can do So Gerald, you think about to you at all but just receives or even a MongoDB that allows you to do ML and AI into the database, in the database you already have. and I buy that by the way. of since the last 40 years, you know the benefits to this approach is the fact that you can get And you know, it's And that buddy comes in the form of the truth here is you don't and deploy it on the cloud. and the cloud and containers and you know, is the argument you were making And so why are you because the laws require you to. And then it exploded, you and the needs they have in the lamp stack so I and the best technology to and your insights. we'll see you next time.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Gerald Venzl	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Maria	PERSON	0.99+
Dave	PERSON	0.99+
Chile	LOCATION	0.99+
Maria Colgan	PERSON	0.99+
Peru	LOCATION	0.99+
100 times	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Gerald	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Canada	LOCATION	0.99+
seven years	QUANTITY	0.99+
Juan Luis	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Steve	PERSON	0.99+
five star	QUANTITY	0.99+
Maria Colgan	PERSON	0.99+
Swiss Army	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
Alex	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
one note	QUANTITY	0.99+
yesterday	DATE	0.99+
two hands	QUANTITY	0.99+
three	QUANTITY	0.99+
two experts	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Linux	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
each microservice	QUANTITY	0.99+
Hadoop	TITLE	0.99+
45 years	QUANTITY	0.99+
Oracles	ORGANIZATION	0.99+
early 2010s	DATE	0.99+
today	DATE	0.99+
one-shot	QUANTITY	0.99+
five	QUANTITY	0.99+
one good example	QUANTITY	0.99+
Sun	ORGANIZATION	0.99+
tonight	DATE	0.99+
first	QUANTITY	0.99+

Breaking Analysis: Chasing Snowflake in Database Boomtown

(upbeat music) >> From theCUBE studios in Palo Alto, in Boston bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> Database is the heart of enterprise computing. The market is both exploding and it's evolving. The major force is transforming the space include Cloud and data, of course, but also new workloads, advanced memory and IO capabilities, new processor types, a massive push towards simplicity, new data sharing and governance models, and a spate of venture investment. Snowflake stands out as the gold standard for operational excellence and go to market execution. The company has attracted the attention of customers, investors, and competitors and everyone from entrenched players to upstarts once in the act. Hello everyone and welcome to this week's Wikibon CUBE Insights powered by ETR. In this breaking analysis, we'll share our most current thinking on the database marketplace and dig into Snowflake's execution. Some of its challenges and we'll take a look at how others are making moves to solve customer problems and try to get a piece of the growing database pie. Let's look at some of the factors that are driving market momentum. First, customers want lower license costs. They want simplicity. They want to avoid database sprawl. They want to run anywhere and manage new data types. These needs often are divergent and they pull vendors and technologies in different direction. It's really hard for any one platform to accommodate every customer need. The market is large and it's growing. Gardner has it at around 60 to 65 billion with a CAGR of somewhere around 20% over the next five years. But the market, as we know it is being redefined. Traditionally, databases have served two broad use cases, OLTP or transactions and reporting like data warehouses. But a diversity of workloads and new architectures and innovations have given rise to a number of new types of databases to accommodate all these diverse customer needs. Many billions have been spent over the last several years in venture money and it continues to pour in. Let me just give you some examples. Snowflake prior to its IPO, raised around 1.4 billion. Redis Labs has raised more than 1/2 billion dollars so far, Cockroach Labs, more than 350 million, Couchbase, 250 million, SingleStore formerly MemSQL, 238 million, Yellowbrick Data, 173 million. And if you stretch the definition of database a little bit to including low-code or no-code, Airtable has raised more than 600 million. And that's by no means a complete list. Now, why is all this investment happening? Well, in a large part, it's due to the TAM. The TAM is huge and it's growing and it's being redefined. Just how big is this market? Let's take a look at a chart that we've shown previously. We use this chart to Snowflakes TAM, and it focuses mainly on the analytics piece, but we'll use it here to really underscore the market potential. So the actual database TAM is larger than this, we think. Cloud and Cloud-native technologies have changed the way we think about databases. Virtually 100% of the database players that they're are in the market have pivoted to a Cloud first strategy. And many like Snowflake, they're pretty dogmatic and have a Cloud only strategy. Databases has historically been very difficult to manage, they're really sensitive to latency. So that means they require a lot of tuning. Cloud allows you to throw virtually infinite resources on demand and attack performance problems and scale very quickly, minimizing the complexity and tuning nuances. This idea, this layer of data as a service we think of it as a staple of digital transformation. Is this layer that's forming to support things like data sharing across ecosystems and the ability to build data products or data services. It's a fundamental value proposition of Snowflake and one of the most important aspects of its offering. Snowflake tracks a metric called edges, which are external connections in its data Cloud. And it claims that 15% of its total shared connections are edges and that's growing at 33% quarter on quarter. This notion of data sharing is changing the way people think about data. We use terms like data as an asset. This is the language of the 2010s. We don't share our assets with others, do we? No, we protect them, we secure or them, we even hide them. But we absolutely don't want to share those assets but we do want to share our data. I had a conversation recently with Forrester analyst, Michelle Goetz. And we both agreed we're going to scrub data as an asset from our phrasiology. Increasingly, people are looking at sharing as a way to create, as I said, data products or data services, which can be monetized. This is an underpinning of Zhamak Dehghani's concept of a data mesh, make data discoverable, shareable and securely governed so that we can build data products and data services that can be monetized. This is where the TAM just explodes and the market is redefining. And we think is in the hundreds of billions of dollars. Let's talk a little bit about the diversity of offerings in the marketplace. Again, databases used to be either transactional or analytic. The bottom lines and top lines. And this chart here describe those two but the types of databases, you can see the middle of mushrooms, just looking at this list, blockchain is of course a specialized type of database and it's also finding its way into other database platforms. Oracle is notable here. Document databases that support JSON and graph data stores that assist in visualizing data, inference from multiple different sources. That's is one of the ways in which adtech has taken off and been so effective. Key Value stores, log databases that are purpose-built, machine learning to enhance insights, spatial databases to help build the next generation of products, the next automobile, streaming databases to manage real time data flows and time series databases. We might've missed a few, let us know if you think we have, but this is a kind of pretty comprehensive list that is somewhat mind boggling when you think about it. And these unique requirements, they've spawned tons of innovation and companies. Here's a small subset on this logo slide. And this is by no means an exhaustive list, but you have these companies here which have been around forever like Oracle and IBM and Teradata and Microsoft, these are the kind of the tier one relational databases that have matured over the years. And they've got properties like atomicity, consistency, isolation, durability, what's known as ACID properties, ACID compliance. Some others that you may or may not be familiar with, Yellowbrick Data, we talked about them earlier. It's going after the best price, performance and analytics and optimizing to take advantage of both hybrid installations and the latest hardware innovations. SingleStore, as I said, formerly known as MemSQL is a very high end analytics and transaction database, supports mixed workloads, extremely high speeds. We're talking about trillions of rows per second that could be ingested in query. Couchbase with hybrid transactions and analytics, Redis Labs, open source, no SQL doing very well, as is Cockroach with distributed SQL, MariaDB with its managed MySQL, Mongo and document database has a lot of momentum, EDB, which supports open source Postgres. And if you stretch the definition a bit, Splunk, for log database, why not? ChaosSearch, really interesting startup that leaves data in S-3 and is going after simplifying the ELK stack, New Relic, they have a purpose-built database for application performance management and we probably could have even put Workday in the mix as it developed a specialized database for its apps. Of course, we can't forget about SAP with how not trying to pry customers off of Oracle. And then the big three Cloud players, AWS, Microsoft and Google with extremely large portfolios of database offerings. The spectrum of products in this space is very wide, with you've got AWS, which I think we're up to like 16 database offerings, all the way to Oracle, which has like one database to do everything not withstanding MySQL because it owns MySQL got that through the Sun Acquisition. And it recently, it made some innovations there around the heat wave announcement. But essentially Oracle is investing to make its database, Oracle database run any workload. While AWS takes the approach of the right tool for the right job and really focuses on the primitives for each database. A lot of ways to skin a cat in this enormous and strategic market. So let's take a look at the spending data for the names that make it into the ETR survey. Not everybody we just mentioned will be represented because they may not have quite the market presence of the ends in the survey, but ETR that capture a pretty nice mix of players. So this chart here, it's one of the favorite views that we like to share quite often. It shows the database players across the 1500 respondents in the ETR survey this past quarter and it measures their net score. That's spending momentum and is shown on the vertical axis and market share, which is the pervasiveness in the data set is on the horizontal axis. The Snowflake is notable because it's been hovering around 80% net score since the survey started picking them up. Anything above 40%, that red line there, is considered by us to be elevated. Microsoft and AWS, they also stand out because they have both market presence and they have spending velocity with their platforms. Oracle is very large but it doesn't have the spending momentum in the survey because nearly 30% of Oracle installations are spending less, whereas only 22% are spending more. Now as a caution, this survey doesn't measure dollar spent and Oracle will be skewed toward the big customers with big budgets. So you got to consider that caveat when evaluating this data. IBM is in a similar position although its market share is not keeping up with Oracle's. Google, they've got great tech especially with BigQuery and it has elevated momentum. So not a bad spot to be in although I'm sure it would like to be closer to AWS and Microsoft on the horizontal axis, so it's got some work to do there. And some of the others we mentioned earlier, like MemSQL, Couchbase. As shown MemSQL here, they're now SingleStore. Couchbase, Reddis, Mongo, MariaDB, all very solid scores on the vertical axis. Cloudera just announced that it was selling to private equity and that will hopefully give it some time to invest in this platform and get off the quarterly shot clock. MapR was acquired by HPE and it's part of HPE's Ezmeral platform, their data platform which doesn't yet have the market presence in the survey. Now, something that is interesting in looking at in Snowflakes earnings last quarter, is this laser focused on large customers. This is a hallmark of Frank Slootman and Mike Scarpelli who I know they don't have a playbook but they certainly know how to go whale hunting. So this chart isolates the data that we just showed you to the global 1000. Note that both AWS and Snowflake go up higher on the X-axis meaning large customers are spending at a faster rate for these two companies. The previous chart had an end of 161 for Snowflake, and a 77% net score. This chart shows the global 1000, in the end there for Snowflake is 48 accounts and the net score jumps to 85%. We're not going to show it here but when you isolate the ETR data, nice you can just cut it, when you isolate it on the fortune 1000, the end for Snowflake goes to 59 accounts in the data set and Snowflake jumps another 100 basis points in net score. When you cut the data by the fortune 500, the Snowflake N goes to 40 accounts and the net score jumps another 200 basis points to 88%. And when you isolate on the fortune 100 accounts is only 18 there but it's still 18, their net score jumps to 89%, almost 90%. So it's very strong confirmation that there's a proportional relationship between larger accounts and spending momentum in the ETR data set. So Snowflakes large account strategy appears to be working. And because we think Snowflake is sticky, this probably is a good sign for the future. Now we've been talking about net score, it's a key measure in the ETR data set, so we'd like to just quickly remind you what that is and use Snowflake as an example. This wheel chart shows the components of net score, that lime green is new adoptions. 29% of the customers in the ETR dataset that are new to Snowflake. That's pretty impressive. 50% of the customers are spending more, that's the forest green, 20% are flat, that's the gray, and only 1%, the pink, are spending less. And 0% zero or replacing Snowflake, no defections. What you do here to get net scores, you subtract the red from the green and you get a net score of 78%. Which is pretty sick and has been sick as in good sick and has been steady for many, many quarters. So that's how the net score methodology works. And remember, it typically takes Snowflake customers many months like six to nine months to start consuming it's services at the contracted rate. So those 29% new adoptions, they're not going to kick into high gear until next year, so that bodes well for future revenue. Now, it's worth taking a quick snapshot at Snowflakes most recent quarter, there's plenty of stuff out there that you can you can google and get a summary but let's just do a quick rundown. The company's product revenue run rate is now at 856 million they'll surpass $1 billion on a run rate basis this year. The growth is off the charts very high net revenue retention. We've explained that before with Snowflakes consumption pricing model, they have to account for retention differently than what a SaaS company. Snowflake added 27 net new $1 million accounts in the quarter and claims to have more than a hundred now. It also is just getting its act together overseas. Slootman says he's personally going to spend more time in Europe, given his belief, that the market is huge and they can disrupt it and of course he's from the continent. He was born there and lived there and gross margins expanded, do in a large part to renegotiation of its Cloud costs. Welcome back to that in a moment. Snowflake it's also moving from a product led growth company to one that's more focused on core industries. Interestingly media and entertainment is one of the largest along with financial services and it's several others. To me, this is really interesting because Disney's example that Snowflake often puts in front of its customers as a reference. And it seems to me to be a perfect example of using data and analytics to both target customers and also build so-called data products through data sharing. Snowflake has to grow its ecosystem to live up to its lofty expectations and indications are that large SIS are leaning in big time. Deloitte cross the $100 million in deal flow in the quarter. And the balance sheet's looking good. Thank you very much with $5 billion in cash. The snarks are going to focus on the losses, but this is all about growth. This is a growth story. It's about customer acquisition, it's about adoption, it's about loyalty and it's about lifetime value. Now, as I said at the IPO, and I always say this to young people, don't buy a stock at the IPO. There's probably almost always going to be better buying opportunities ahead. I'm not always right about that, but I often am. Here's a chart of Snowflake's performance since IPO. And I have to say, it's held up pretty well. It's trading above its first day close and as predicted there were better opportunities than day one but if you have to make a call from here. I mean, don't take my stock advice, do your research. Snowflake they're priced to perfection. So any disappointment is going to be met with selling. You saw that the day after they beat their earnings last quarter because their guidance in revenue growth,. Wasn't in the triple digits, it sort of moderated down to the 80% range. And they pointed, they pointed to a new storage compression feature that will lower customer costs and consequently, it's going to lower their revenue. I swear, I think that that before earnings calls, Scarpelli sits back he's okay, what kind of creative way can I introduce the dampen enthusiasm for the guidance. Now I'm not saying lower storage costs will translate into lower revenue for a period of time. But look at dropping storage prices, customers are always going to buy more, that's the way the storage market works. And stuff like did allude to that in all fairness. Let me introduce something that people in Silicon Valley are talking about, and that is the Cloud paradox for SaaS companies. And what is that? I was a clubhouse room with Martin Casado of Andreessen when I first heard about this. He wrote an article with Sarah Wang, calling it to question the merits of SaaS companies sticking with Cloud at scale. Now the basic premise is that for startups in early stages of growth, the Cloud is a no brainer for SaaS companies, but at scale, the cost of Cloud, the Cloud bill approaches 50% of the cost of revenue, it becomes an albatross that stifles operating leverage. Their conclusion ended up saying that as much as perhaps as much as the back of the napkin, they admitted that, but perhaps as much as 1/2 a trillion dollars in market cap is being vacuumed away by the hyperscalers that could go to the SaaS providers as cost savings from repatriation. And that Cloud repatriation is an inevitable path for large SaaS companies at scale. I was particularly interested in this as I had recently put on a post on the Cloud repatriation myth. I think in this instance, there's some merit to their conclusions. But I don't think it necessarily bleeds into traditional enterprise settings. But for SaaS companies, maybe service now has it right running their own data centers or maybe a hybrid approach to hedge bets and save money down the road is prudent. What caught my attention in reading through some of the Snowflake docs, like the S-1 in its most recent 10-K were comments regarding long-term purchase commitments and non-cancelable contracts with Cloud companies. And the companies S-1, for example, there was disclosure of $247 million in purchase commitments over a five plus year period. And the company's latest 10-K report, that same line item jumped to 1.8 billion. Now Snowflake is clearly managing these costs as it alluded to when its earnings call. But one has to wonder, at some point, will Snowflake follow the example of say Dropbox which Andreessen used in his blog and start managing its own IT? Or will it stick with the Cloud and negotiate hard? Snowflake certainly has the leverage. It has to be one of Amazon's best partners and customers even though it competes aggressively with Redshift but on the earnings call, CFO Scarpelli said, that Snowflake was working on a new chip technology to dramatically increase performance. What the heck does that mean? Is this Snowflake is not becoming a hardware company? So I going to have to dig into that a little bit and find out what that it means. I'm guessing, it means that it's taking advantage of ARM-based processes like graviton, which many ISVs ar allowing their software to run on that lower cost platform. Or maybe there's some deep dark in the weeds secret going on inside Snowflake, but I doubt it. We're going to leave all that for there for now and keep following this trend. So it's clear just in summary that Snowflake they're the pace setter in this new exciting world of data but there's plenty of room for others. And they still have a lot to prove. For instance, one customer in ETR, CTO round table express skepticism that Snowflake will live up to its hype because its success is going to lead to more competition from well-established established players. This is a common theme you hear it all the time. It's pretty easy to reach that conclusion. But my guess is this the exact type of narrative that fuels Slootman and sucked him back into this game of Thrones. That's it for now, everybody. Remember, these episodes they're all available as podcasts, wherever you listen. All you got to do is search braking analysis podcast and please subscribe to series. Check out ETR his website at etr.plus. We also publish a full report every week on wikinbon.com and siliconangle.com. You can get in touch with me, Email is David.vellante@siliconangle.com. You can DM me at DVelante on Twitter or comment on our LinkedIn posts. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week everybody, be well and we'll see you next time. (upbeat music)

Published Date : Jun 5 2021

SUMMARY :

This is braking analysis and the net score jumps to 85%.

ENTITIES

Entity	Category	Confidence
Michelle Goetz	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Mike Scarpelli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Sarah Wang	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Andreessen	PERSON	0.99+
Europe	LOCATION	0.99+
40 accounts	QUANTITY	0.99+
$1 billion	QUANTITY	0.99+
Frank Slootman	PERSON	0.99+
Slootman	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
Scarpelli	PERSON	0.99+
TAM	ORGANIZATION	0.99+
six	QUANTITY	0.99+
33%	QUANTITY	0.99+
$5 billion	QUANTITY	0.99+
80%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
1.8 billion	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
59 accounts	QUANTITY	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Disney	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
18	QUANTITY	0.99+
77%	QUANTITY	0.99+
85%	QUANTITY	0.99+
29%	QUANTITY	0.99+
20%	QUANTITY	0.99+
Boston	LOCATION	0.99+
78%	QUANTITY	0.99+
Martin Casado	PERSON	0.99+
48 accounts	QUANTITY	0.99+
856 million	QUANTITY	0.99+
1500 respondents	QUANTITY	0.99+
nine months	QUANTITY	0.99+
Zhamak Dehghani	PERSON	0.99+
0%	QUANTITY	0.99+
wikinbon.com	OTHER	0.99+
88%	QUANTITY	0.99+
two	QUANTITY	0.99+
$100 million	QUANTITY	0.99+
89%	QUANTITY	0.99+
Airtable	ORGANIZATION	0.99+
next year	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.99+
200 basis points	QUANTITY	0.99+
First	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
15%	QUANTITY	0.99+
more than 600 million	QUANTITY	0.99+
last quarter	DATE	0.99+
161	QUANTITY	0.99+
David.vellante@siliconangle.com	OTHER	0.99+
$247 million	QUANTITY	0.99+
27 net	QUANTITY	0.99+
2010s	DATE	0.99+
siliconangle.com	OTHER	0.99+
Forrester	ORGANIZATION	0.99+
MemSQL	TITLE	0.99+
Yellowbrick Data	ORGANIZATION	0.99+
more than 1/2 billion dollars	QUANTITY	0.99+
Dropbox	ORGANIZATION	0.99+
MySQL	TITLE	0.99+
BigQuery	TITLE	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for json: