How to Make a Data Fabric Smart A Technical Demo With Jess Jowdy

(inspirational music) (music ends) >> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities, including data exploration business intelligence, natural language processing and machine learning directly within the fabric makes it faster and easier for organizations to gain new insights and power intelligence predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi, yeah, thank you so much for having me. And so for this demo, we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements, and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo, and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see, and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be, for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software, or adverse reaction warnings from a clinical risk grouping application, and so much more. So I'm really going to be simulating a patient logging in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here, and I'm going to be looking for information where the last name of this patient is Simmons, and their medical record number or their patient identifier in the system is 32345. And so as you can see, I have this single JSON payload that showed up here of, just, relevant clinical information for my patient whose last name is Simmons, all within a single response. So fantastic, right? Typically though, when we see responses that look like this there is an assumption that this service is interacting with a single backend system, and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture, we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here, we have our data fabric coordinator which is going to be in charge of this refinement and analysis, those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service, and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end, we do also support full life cycle API management within this platform. When you're dealing with APIs, I always like to make a little shout out on this, that you really want to make sure you have enough, like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what context. >> Can I just interrupt you for a second, Jess? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you could have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry, and API securities are like, really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So, there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So, the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product, and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So, that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of the security. >> And that's been designed in, it's not a bolt on as they like to say. >> Absolutely. >> Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly, each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like Fire. Interactions with a homegrown enterprise data warehouse for instance, may use SQL. For a cloud-based solutions managed by a vendor, they may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and applications. And I'm about to log out, so I'm going to (chuckles) keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources, and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is, it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST, or SOAP, or SQL, or FTP, regardless of that protocol, there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as in healthcare we have HL7, we have Fire, we have CCDs, across the industry, JSON is, you know, really hitting a market strong now, and XML payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel, I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example, communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR, I'm leveraging a standard healthcare messaging format known as Fire, which is a REST based protocol. And then when I'm working with my health record management system, I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly, and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN, and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out-of-the box or black box approach to be able to develop things that are specific to their data fabric, or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you not only get an opportunity to view how we're establishing these connections or how we're building out these processes, but you have the opportunity to inject your own kind of processing, your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out-of-the-box code that is provided in this data fabric platform from IRIS, combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out-of-the-box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. (laughs) >> It's a lot here. You know, actually- >> I can pause. >> If I could, if we just want to sort of play that back. So we went to the connect and the collect phase. >> Yes, we're going into refine. So it's a good place to stop. >> So before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know, the ability to bring in different dev tools. We heard about Fire, which of course big in healthcare. And that's the standard, and then SQL for traditional kind of structured data, and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely. And I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection, into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refinement. >> We're going into refinement. Exciting. (chuckles) So how do we actually do refinement? Where does refinement happen? And how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator, or stands for Smart Data Fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information, it's aggregating it, and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like. And as you can see, it follows a flow chart like structure. So there's a start, there is an end, and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce, or we make this data fabric a bit smarter, and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection, we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging, 'cause you need to be able to know, you know, if there was an issue, where did that issue happen in which connected process, and how did it affect the other processes that are related to it? In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric, to when data was sent back out from that smart data fabric. So I didn't record the time, but I bet if you recorded the time it was this time that we sent that request in and you can see my patient's name and their medical record number here, and you can see that that instigated four different calls to four different systems, and they're represented by these arrows going out. So we sent something to chart script, to our health record management system, to our clinical risk grouping application, into my EMR through their Fire server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems, and we bundle them together. And from my Fire lovers, here's our Fire bundle that we got back from our Fire server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping, or errors that were thrown, alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Sure, who did what when where, what did the machine do what went wrong, and where did that go wrong? Right at your fingertips. >> Right. And I'm a visual person so a bunch of log files to me is not the most helpful. While being able to see this happened at this time in this location, gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric, is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information it's transforming that data, in a format that your consumer's not going to understand. It's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? It all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well, we can keep going. I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this, but essentially if we go back to our coordinator here, we can see here's that original, that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here, which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric, but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at, and we're running it through a machine learning model that exists within the smart data fabric pipeline, and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world, is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL-like syntax to be able to construct and execute these predictions. So, it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge, right? Because it directly affects the cost for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment, or, you know, as an outpatient perhaps, to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day, which is what makes this so exciting. >> Awesome demo. >> Thank you! >> Jess, are you cool if people want to get in touch with you? Can they do that? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy, and we'd love to hear from you. I always love talking about this topic so we'd be happy to engage on that. >> Great stuff. Thank you Jessica, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment, we're going to dig into the use cases where data fabric is driving business value. Stay right there. (inspirational music) (music fades)

Published Date : Feb 22 2023

SUMMARY :

and she's going to show And to that end, we do also So you were showing hundreds of these APIs depending in the healthcare industry, So can I even see this as they like to say. that are specific to their data fabric, Yeah, I'll pause. It's a lot here. So we went to the connect So it's a good place to stop. So before we get So that platform needs to All right, so now we're that are related to it? Right at your fingertips. I need to actually troubleshoot a problem. of being able to create of clients that are using this technology Anything else you want to show us? So in this scenario, we're and the patient, you know. And that really brings So you can find me on Thank you Jessica, appreciate it. in the next segment,

ENTITIES

Entity	Category	Confidence
Joe Lichtenberg	PERSON	0.99+
Jessica Jowdy	PERSON	0.99+
Jessica	PERSON	0.99+
Jess Jowdy	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Scott	PERSON	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
Jess	PERSON	0.99+
32345	OTHER	0.99+
hundreds	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
each	QUANTITY	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
third segment	QUANTITY	0.98+
Fire	COMMERCIAL_ITEM	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.97+
each data	QUANTITY	0.97+
one	QUANTITY	0.97+
single	QUANTITY	0.95+
single response	QUANTITY	0.94+
single backend system	QUANTITY	0.92+
two more	QUANTITY	0.92+
four different segments	QUANTITY	0.89+
APIs	QUANTITY	0.88+
one step	QUANTITY	0.88+
four	QUANTITY	0.85+
Healthcare Field Engineering	ORGANIZATION	0.82+
JSON	TITLE	0.8+
single payload	QUANTITY	0.8+
second	QUANTITY	0.79+
one payload	QUANTITY	0.76+
next 30 days	DATE	0.76+
IRIS	TITLE	0.75+
Fire	TITLE	0.72+
Postman	TITLE	0.71+
every	QUANTITY	0.68+
four different calls	QUANTITY	0.66+
Jes	PERSON	0.66+
a second	QUANTITY	0.61+
services	QUANTITY	0.6+
evelopers	PERSON	0.58+
Postman	ORGANIZATION	0.54+
HL7	OTHER	0.4+

Daren Brabham & Erik Bradley | What the Spending Data Tells us About Supercloud

(gentle synth music) (music ends) >> Welcome back to Supercloud 2, an open industry collaboration between technologists, consultants, analysts, and of course practitioners to help shape the future of cloud. At this event, one of the key areas we're exploring is the intersection of cloud and data. And how building value on top of hyperscale clouds and across clouds is evolving, a concept of course we call "Supercloud". And we're pleased to welcome our friends from Enterprise Technology research, Erik Bradley and Darren Brabham. Guys, thanks for joining us, great to see you. we love to bring the data into these conversations. >> Thank you for having us, Dave, I appreciate it. >> Yeah, thanks. >> You bet. And so, let me do the setup on what is Supercloud. It's a concept that we've floated, Before re:Invent 2021, based on the idea that cloud infrastructure is becoming ubiquitous, incredibly powerful, but there's a lack of standards across the big three clouds. That creates friction. So we defined over the period of time, you know, better part of a year, a set of essential elements, deployment models for so-called supercloud, which create this common experience for specific cloud services that, of course, again, span multiple clouds and even on-premise data. So Erik, with that as background, I wonder if you could add your general thoughts on the term supercloud, maybe play proxy for the CIO community, 'cause you do these round tables, you talk to these guys all the time, you gather a lot of amazing information from senior IT DMs that compliment your survey. So what are your thoughts on the term and the concept? >> Yeah, sure. I'll even go back to last year when you and I did our predictions panel, right? And we threw it out there. And to your point, you know, there's some haters. Anytime you throw out a new term, "Is it marketing buzz? Is it worth it? Why are you even doing it?" But you know, from my own perspective, and then also speaking to the IT DMs that we interview on a regular basis, this is just a natural evolution. It's something that's inevitable in enterprise tech, right? The internet was not built for what it has become. It was never intended to be the underlying infrastructure of our daily lives and work. The cloud also was not built to be what it's become. But where we're at now is, we have to figure out what the cloud is and what it needs to be to be scalable, resilient, secure, and have the governance wrapped around it. And to me that's what supercloud is. It's a way to define operantly, what the next generation, the continued iteration and evolution of the cloud and what its needs to be. And that's what the supercloud means to me. And what depends, if you want to call it metacloud, supercloud, it doesn't matter. The point is that we're trying to define the next layer, the next future of work, which is inevitable in enterprise tech. Now, from the IT DM perspective, I have two interesting call outs. One is from basically a senior developer IT architecture and DevSecOps who says he uses the term all the time. And the reason he uses the term, is that because multi-cloud has a stigma attached to it, when he is talking to his business executives. (David chuckles) the stigma is because it's complex and it's expensive. So he switched to supercloud to better explain to his business executives and his CFO and his CIO what he's trying to do. And we can get into more later about what it means to him. But the inverse of that, of course, is a good CSO friend of mine for a very large enterprise says the concern with Supercloud is the reduction of complexity. And I'll explain, he believes anything that takes the requirement of specific expertise out of the equation, even a little bit, as a CSO worries him. So as you said, David, always two sides to the coin, but I do believe supercloud is a relevant term, and it is necessary because the cloud is continuing to be defined. >> You know, that's really interesting too, 'cause you know, Darren, we use Snowflake a lot as an example, sort of early supercloud, and you think from a security standpoint, we've always pushed Amazon and, "Are you ever going to kind of abstract the complexity away from all these primitives?" and their position has always been, "Look, if we produce these primitives, and offer these primitives, we we can move as the market moves. When you abstract, then it becomes harder to peel the layers." But Darren, from a data standpoint, like I say, we use Snowflake a lot. I think of like Tim Burners-Lee when Web 2.0 came out, he said, "Well this is what the internet was always supposed to be." So in a way, you know, supercloud is maybe what multi-cloud was supposed to be. But I mean, you think about data sharing, Darren, across clouds, it's always been a challenge. Snowflake always, you know, obviously trying to solve that problem, as are others. But what are your thoughts on the concept? >> Yeah, I think the concept fits, right? It is reflective of, it's a paradigm shift, right? Things, as a pendulum have swung back and forth between needing to piece together a bunch of different tools that have specific unique use cases and they're best in breed in what they do. And then focusing on the duct tape that holds 'em all together and all the engineering complexity and skill, it shifted from that end of the pendulum all the way back to, "Let's streamline this, let's simplify it. Maybe we have budget crunches and we need to consolidate tools or eliminate tools." And so then you kind of see this back and forth over time. And with data and analytics for instance, a lot of organizations were trying to bring the data closer to the business. That's where we saw self-service analytics coming in. And tools like Snowflake, what they did was they helped point to different databases, they helped unify data, and organize it in a single place that was, you know, in a sense neutral, away from a single cloud vendor or a single database, and allowed the business to kind of be more flexible in how it brought stuff together and provided it out to the business units. So Snowflake was an example of one of those times where we pulled back from the granular, multiple points of the spear, back to a simple way to do things. And I think Snowflake has continued to kind of keep that mantle to a degree, and we see other tools trying to do that, but that's all it is. It's a paradigm shift back to this kind of meta abstraction layer that kind of simplifies what is the reality, that you need a complex multi-use case, multi-region way of doing business. And it sort of reflects the reality of that. >> And you know, to me it's a spectrum. As part of Supercloud 2, we're talking to a number of of practitioners, Ionis Pharmaceuticals, US West, we got Walmart. And it's a spectrum, right? In some cases the practitioner's saying, "You know, the way I solve multi-cloud complexity is mono-cloud, I just do one cloud." (laughs) Others like Walmart are saying, "Hey, you know, we actually are building an abstraction layer ourselves, take advantage of it." So my general question to both of you is, is this a concept, is the lack of standards across clouds, you know, really a problem, you know, or is supercloud a solution looking for a problem? Or do you hear from practitioners that "No, this is really an issue, we have to bring together a set of standards to sort of unify our cloud estates." >> Allow me to answer that at a higher level, and then we're going to hand it over to Dr. Brabham because he is a little bit more detailed on the realtime streaming analytics use cases, which I think is where we're going to get to. But to answer that question, it really depends on the size and the complexity of your business. At the very large enterprise, Dave, Yes, a hundred percent. This needs to happen. There is complexity, there is not only complexity in the compute and actually deploying the applications, but the governance and the security around them. But for lower end or, you know, business use cases, and for smaller businesses, it's a little less necessary. You certainly don't need to have all of these. Some of the things that come into mind from the interviews that Darren and I have done are, you know, financial services, if you're doing real-time trading, anything that has real-time data metrics involved in your transactions, is going to be necessary. And another use case that we hear about is in online travel agencies. So I think it is very relevant, the complexity does need to be solved, and I'll allow Darren to explain a little bit more about how that's used from an analytics perspective. >> Yeah, go for it. >> Yeah, exactly. I mean, I think any modern, you know, multinational company that's going to have a footprint in the US and Europe, in China, or works in different areas like manufacturing, where you're probably going to have on-prem instances that will stay on-prem forever, for various performance reasons. You have these complicated governance and security and regulatory issues. So inherently, I think, large multinational companies and or companies that are in certain areas like finance or in, you know, online e-commerce, or things that need real-time data, they inherently are going to have a very complex environment that's going to need to be managed in some kind of cleaner way. You know, they're looking for one door to open, one pane of glass to look at, one thing to do to manage these multi points. And, streaming's a good example of that. I mean, not every organization has a real-time streaming use case, and may not ever, but a lot of organizations do, a lot of industries do. And so there's this need to use, you know, they want to use open-source tools, they want to use Apache Kafka for instance. They want to use different megacloud vendors offerings, like Google Pub/Sub or you know, Amazon Kinesis Firehose. They have all these different pieces they want to use for different use cases at different stages of maturity or proof of concept, you name it. They're going to have to have this complexity. And I think that's why we're seeing this need, to have sort of this supercloud concept, to juggle all this, to wrangle all of it. 'Cause the reality is, it's complex and you have to simplify it somehow. >> Great, thanks you guys. All right, let's bring up the graphic, and take a look. Anybody who follows the breaking analysis, which is co-branded with ETR Cube Insights powered by ETR, knows we like to bring data to the table. ETR does amazing survey work every quarter, 1200 plus 1500 practitioners that that answer a number of questions. The vertical axis here is net score, which is ETR's proprietary methodology, which is a measure of spending momentum, spending velocity. And the horizontal axis here is overlap, but it's the presence pervasiveness, and the dataset, the ends, that table insert on the bottom right shows you how the dots are plotted, the net score and then the ends in the survey. And what we've done is we've plotted a bunch of the so-called supercloud suspects, let's start in the upper right, the cloud platforms. Without these hyperscale clouds, you can't have a supercloud. And as always, Azure and AWS, up and to the right, it's amazing we're talking about, you know, 80 plus billion dollar company in AWS. Azure's business is, if you just look at the IaaS is in the 50 billion range, I mean it's just amazing to me the net scores here. Anything above 40% we consider highly elevated. And you got Azure and you got Snowflake, Databricks, HashiCorp, we'll get to them. And you got AWS, you know, right up there at that size, it's quite amazing. With really big ends as well, you know, 700 plus ends in the survey. So, you know, kind of half the survey actually has these platforms. So my question to you guys is, what are you seeing in terms of cloud adoption within the big three cloud players? I wonder if you could could comment, maybe Erik, you could start. >> Yeah, sure. Now we're talking data, now I'm happy. So yeah, we'll get into some of it. Right now, the January, 2023 TSIS is approaching 1500 survey respondents. One caveat, it's not closed yet, it will close on Friday, but with an end that big we are over statistically significant. We also recently did a cloud survey, and there's a couple of key points on that I want to get into before we get into individual vendors. What we're seeing here, is that annual spend on cloud infrastructure is expected to grow at almost a 70% CAGR over the next three years. The percentage of those workloads for cloud infrastructure are expected to grow over 70% as three years as well. And as you mentioned, Azure and AWS are still dominant. However, we're seeing some share shift spreading around a little bit. Now to get into the individual vendors you mentioned about, yes, Azure is still number one, AWS is number two. What we're seeing, which is incredibly interesting, CloudFlare is number three. It's actually beating GCP. That's the first time we've seen it. What I do want to state, is this is on net score only, which is our measure of spending intentions. When you talk about actual pervasion in the enterprise, it's not even close. But from a spending velocity intention point of view, CloudFlare is now number three above GCP, and even Salesforce is creeping up to be at GCPs level. So what we're seeing here, is a continued domination by Azure and AWS, but some of these other players that maybe might fit into your moniker. And I definitely want to talk about CloudFlare more in a bit, but I'm going to stop there. But what we're seeing is some of these other players that fit into your Supercloud moniker, are starting to creep up, Dave. >> Yeah, I just want to clarify. So as you also know, we track IaaS and PaaS revenue and we try to extract, so AWS reports in its quarterly earnings, you know, they're just IaaS and PaaS, they don't have a SaaS play, a little bit maybe, whereas Microsoft and Google include their applications and so we extract those out and if you do that, AWS is bigger, but in the surveys, you know, customers, they see cloud, SaaS to them as cloud. So that's one of the reasons why you see, you know, Microsoft as larger in pervasion. If you bring up that survey again, Alex, the survey results, you see them further to the right and they have higher spending momentum, which is consistent with what you see in the earnings calls. Now, interesting about CloudFlare because the CEO of CloudFlare actually, and CloudFlare itself uses the term supercloud basically saying, "Hey, we're building a new type of internet." So what are your thoughts? Do you have additional information on CloudFlare, Erik that you want to share? I mean, you've seen them pop up. I mean this is a really interesting company that is pretty forward thinking and vocal about how it's disrupting the industry. >> Sure, we've been tracking 'em for a long time, and even from the disruption of just a traditional CDN where they took down Akamai and what they're doing. But for me, the definition of a true supercloud provider can't just be one instance. You have to have multiple. So it's not just the cloud, it's networking aspect on top of it, it's also security. And to me, CloudFlare is the only one that has all of it. That they actually have the ability to offer all of those things. Whereas you look at some of the other names, they're still piggybacking on the infrastructure or platform as a service of the hyperscalers. CloudFlare does not need to, they actually have the cloud, the networking, and the security all themselves. So to me that lends credibility to their own internal usage of that moniker Supercloud. And also, again, just what we're seeing right here that their net score is now creeping above AGCP really does state it. And then just one real last thing, one of the other things we do in our surveys is we track adoption and replacement reasoning. And when you look at Cloudflare's adoption rate, which is extremely high, it's based on technical capabilities, the breadth of their feature set, it's also based on what we call the ability to avoid stack alignment. So those are again, really supporting reasons that makes CloudFlare a top candidate for your moniker of supercloud. >> And they've also announced an object store (chuckles) and a database. So, you know, that's going to be, it takes a while as you well know, to get database adoption going, but you know, they're ambitious and going for it. All right, let's bring the chart back up, and I want to focus Darren in on the ecosystem now, and really, we've identified Snowflake and Databricks, it's always fun to talk about those guys, and there are a number of other, you know, data platforms out there, but we use those too as really proxies for leaders. We got a bunch of the backup guys, the data protection folks, Rubric, Cohesity, and Veeam. They're sort of in a cluster, although Rubric, you know, ahead of those guys in terms of spending momentum. And then VMware, Tanzu and Red Hat as sort of the cross cloud platform. But I want to focus, Darren, on the data piece of it. We're seeing a lot of activity around data sharing, governed data sharing. Databricks is using Delta Sharing as their sort of place, Snowflakes is sort of this walled garden like the app store. What are your thoughts on, you know, in the context of Supercloud, cross cloud capabilities for the data platforms? >> Yeah, good question. You know, I think Databricks is an interesting player because they sort of have made some interesting moves, with their Data Lakehouse technology. So they're trying to kind of complicate, or not complicate, they're trying to take away the complications of, you know, the downsides of data warehousing and data lakes, and trying to find that middle ground, where you have the benefits of a managed, governed, you know, data warehouse environment, but you have sort of the lower cost, you know, capability of a data lake. And so, you know, Databricks has become really attractive, especially by data scientists, right? We've been tracking them in the AI machine learning sector for quite some time here at ETR, attractive for a data scientist because it looks and acts like a lake, but can have some managed capabilities like a warehouse. So it's kind of the best of both worlds. So in some ways I think you've seen sort of a data science driver for the adoption of Databricks that has now become a little bit more mainstream across the business. Snowflake, maybe the other direction, you know, it's a cloud data warehouse that you know, is starting to expand its capabilities and add on new things like Streamlit is a good example in the analytics space, with apps. So you see these tools starting to branch and creep out a bit, but they offer that sort of neutrality, right? We heard one IT decision maker we recently interviewed that referred to Snowflake and Databricks as the quote unquote Switzerland of what they do. And so there's this desirability from an organization to find these tools that can solve the complex multi-headed use-case of data and analytics, which every business unit needs in different ways. And figure out a way to do that, an elegant way that's governed and centrally managed, that federated kind of best of both worlds that you get by bringing the data close to the business while having a central governed instance. So these tools are incredibly powerful and I think there's only going to be room for growth, for those two especially. I think they're going to expand and do different things and maybe, you know, join forces with others and a lot of the power of what they do well is trying to define these connections and find these partnerships with other vendors, and try to be seen as the nice add-on to your existing environment that plays nicely with everyone. So I think that's where those two tools are going, but they certainly fit this sort of label of, you know, trying to be that supercloud neutral, you know, layer that unites everything. >> Yeah, and if you bring the graphic back up, please, there's obviously big data plays in each of the cloud platforms, you know, Microsoft, big database player, AWS is, you know, 11, 12, 15, data stores. And of course, you know, BigQuery and other, you know, data platforms within Google. But you know, I'm not sure the big cloud guys are going to go hard after so-called supercloud, cross-cloud services. Although, we see Oracle getting in bed with Microsoft and Azure, with a database service that is cross-cloud, certainly Google with Anthos and you know, you never say never with with AWS. I guess what I would say guys, and I'll I'll leave you with this is that, you know, just like all players today are cloud players, I feel like anybody in the business or most companies are going to be so-called supercloud players. In other words, they're going to have a cross-cloud strategy, they're going to try to build connections if they're coming from on-prem like a Dell or an HPE, you know, or Pure or you know, many of these other companies, Cohesity is another one. They're going to try to connect to their on-premise states, of course, and create a consistent experience. It's natural that they're going to have sort of some consistency across clouds. You know, the big question is, what's that spectrum look like? I think on the one hand you're going to have some, you know, maybe some rudimentary, you know, instances of supercloud or maybe they just run on the individual clouds versus where Snowflake and others and even beyond that are trying to go with a single global instance, basically building out what I would think of as their own cloud, and importantly their own ecosystem. I'll give you guys the last thought. Maybe you could each give us, you know, closing thoughts. Maybe Darren, you could start and Erik, you could bring us home on just this entire topic, the future of cloud and data. >> Yeah, I mean I think, you know, two points to make on that is, this question of these, I guess what we'll call legacy on-prem players. These, mega vendors that have been around a long time, have big on-prem footprints and a lot of people have them for that reason. I think it's foolish to assume that a company, especially a large, mature, multinational company that's been around a long time, it's foolish to think that they can just uproot and leave on-premises entirely full scale. There will almost always be an on-prem footprint from any company that was not, you know, natively born in the cloud after 2010, right? I just don't think that's reasonable anytime soon. I think there's some industries that need on-prem, things like, you know, industrial manufacturing and so on. So I don't think on-prem is going away, and I think vendors that are going to, you know, go very cloud forward, very big on the cloud, if they neglect having at least decent connectors to on-prem legacy vendors, they're going to miss out. So I think that's something that these players need to keep in mind is that they continue to reach back to some of these players that have big footprints on-prem, and make sure that those integrations are seamless and work well, or else their customers will always have a multi-cloud or hybrid experience. And then I think a second point here about the future is, you know, we talk about the three big, you know, cloud providers, the Google, Microsoft, AWS as sort of the opposite of, or different from this new supercloud paradigm that's emerging. But I want to kind of point out that, they will always try to make a play to become that and I think, you know, we'll certainly see someone like Microsoft trying to expand their licensing and expand how they play in order to become that super cloud provider for folks. So also don't want to downplay them. I think you're going to see those three big players continue to move, and take over what players like CloudFlare are doing and try to, you know, cut them off before they get too big. So, keep an eye on them as well. >> Great points, I mean, I think you're right, the first point, if you're Dell, HPE, Cisco, IBM, your strategy should be to make your on-premise state as cloud-like as possible and you know, make those differences as minimal as possible. And you know, if you're a customer, then the business case is going to be low for you to move off of that. And I think you're right. I think the cloud guys, if this is a real problem, the cloud guys are going to play in there, and they're going to make some money at it. Erik, bring us home please. >> Yeah, I'm going to revert back to our data and this on the macro side. So to kind of support this concept of a supercloud right now, you know Dave, you and I know, we check overall spending and what we're seeing right now is total year spent is expected to only be 4.6%. We ended 2022 at 5% even though it began at almost eight and a half. So this is clearly declining and in that environment, we're seeing the top two strategies to reduce spend are actually vendor consolidation with 36% of our respondents saying they're actively seeking a way to reduce their number of vendors, and consolidate into one. That's obviously supporting a supercloud type of play. Number two is reducing excess cloud resources. So when I look at both of those combined, with a drop in the overall spending reduction, I think you're on the right thread here, Dave. You know, the overall macro view that we're seeing in the data supports this happening. And if I can real quick, couple of names we did not touch on that I do think deserve to be in this conversation, one is HashiCorp. HashiCorp is the number one player in our infrastructure sector, with a 56% net score. It does multiple things within infrastructure and it is completely agnostic to your environment. And if we're also speaking about something that's just a singular feature, we would look at Rubric for data, backup, storage, recovery. They're not going to offer you your full cloud or your networking of course, but if you are looking for your backup, recovery, and storage Rubric, also number one in that sector with a 53% net score. Two other names that deserve to be in this conversation as we watch it move and evolve. >> Great, thank you for bringing that up. Yeah, we had both of those guys in the chart and I failed to focus in on HashiCorp. And clearly a Supercloud enabler. All right guys, we got to go. Thank you so much for joining us, appreciate it. Let's keep this conversation going. >> Always enjoy talking to you Dave, thanks. >> Yeah, thanks for having us. >> All right, keep it right there for more content from Supercloud 2. This is Dave Valente for John Ferg and the entire Cube team. We'll be right back. (gentle synth music) (music fades)

Published Date : Feb 17 2023

SUMMARY :

is the intersection of cloud and data. Thank you for having period of time, you know, and evolution of the cloud So in a way, you know, supercloud the data closer to the business. So my general question to both of you is, the complexity does need to be And so there's this need to use, you know, So my question to you guys is, And as you mentioned, Azure but in the surveys, you know, customers, the ability to offer and there are a number of other, you know, and maybe, you know, join forces each of the cloud platforms, you know, the three big, you know, And you know, if you're a customer, you and I know, we check overall spending and I failed to focus in on HashiCorp. to you Dave, thanks. Ferg and the entire Cube team.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Erik	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
John Ferg	PERSON	0.99+
Dave	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
David	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Valente	PERSON	0.99+
January, 2023	DATE	0.99+
China	LOCATION	0.99+
US	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
50 billion	QUANTITY	0.99+
Ionis Pharmaceuticals	ORGANIZATION	0.99+
Darren Brabham	PERSON	0.99+
56%	QUANTITY	0.99+
4.6%	QUANTITY	0.99+
Europe	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
53%	QUANTITY	0.99+
36%	QUANTITY	0.99+
Tanzu	ORGANIZATION	0.99+
Darren	PERSON	0.99+
1200	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Friday	DATE	0.99+
Rubric	ORGANIZATION	0.99+
last year	DATE	0.99+
two sides	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
Cohesity	ORGANIZATION	0.99+
two tools	QUANTITY	0.99+
Veeam	ORGANIZATION	0.99+
CloudFlare	TITLE	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
2022	DATE	0.99+
One	QUANTITY	0.99+
Daren Brabham	PERSON	0.99+
three years	QUANTITY	0.99+
TSIS	ORGANIZATION	0.99+
Brabham	PERSON	0.99+
CloudFlare	ORGANIZATION	0.99+
1500 survey respondents	QUANTITY	0.99+
second point	QUANTITY	0.99+
first point	QUANTITY	0.98+
Snowflake	TITLE	0.98+
one	QUANTITY	0.98+
Supercloud	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Snowflake	ORGANIZATION	0.98+
Akamai	ORGANIZATION	0.98+

Is Data Mesh the Killer App for Supercloud | Supercloud2

(gentle bright music) >> Okay, welcome back to our "Supercloud 2" event live coverage here at stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We've got exclusive news and a scoop here for SiliconANGLE and theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called NextData.com NextData, she's a cube alumni and contributor to our Supercloud initiative, as well as our coverage and breaking analysis with Dave Vellante on data, the killer app for Supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> John: Wonderful. Your contributions to the data conversation has been well-documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing, you know, cold water on it. Some are, I think, it's the next big thing. Tell us about the data mesh super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, it's, you know, the pain point that it surfaced were universal. Everybody said, "Oh, why didn't I think of that?" You know, it was just an obvious next step and people are approaching it, implementing it. I guess the last few years, I've been involved in many of those implementations, and I guess Supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include boundaries, organizational boundaries cloud technology boundaries and trust boundaries. >> I want to bring that up because your venture, NextData which is new, just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Zhamak: Absolutely, yes. So next data is the result of, I suppose, the pains that I suffered from implementing a database for many of the organizations. Basically, a lot of organizations that I've worked with, they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find, I guess, the common denominator that solves those problems and enables that developer experience for data sharing. >> John: Since you just announced the news, what's been the reaction? >> Zhamak: I just announced the news right now, so what's the reaction? >> John: But people in the industry that know you, you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth modes, so we haven't publicly talked about it, but folks that have been close to us in fact have reached out. We already have implementations of our pilot platform with early customers, which is super exciting. And we're going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those where we are going to have multiple pilots, implementations of our platform in real world. We're real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak: When I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally obviously not surprising. They don't include the big vision of inclusivity across clouds across different data stores. But it seems like people are having to go through some gymnastics to get to, you know, the organizational reality of decentralizing data, and at least pushing data ownership to the line of business. How are you approaching or are you approaching, solving that problem? Are you taking a narrow slice? What can you tell us about Next Data? >> Zhamak: Sure, yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that, you know, the data, as you know, resides on different platforms. It's owned by different people, it's processed by pipelines that who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem, the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in this autonomous units, we call them data products, I guess in data mesh, right? That constitutes computation, that governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is, you know, data in different places, decentralization and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation, APIs to get to it in a technology agnostic way, in an open way. And then, sit on top and use existing existing tech, you know, Snowflake, Databricks, whatever exists, you know, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here, that the language, and the modeling that we use is really native to data mesh is that I will make a data product, I'm sharing a data product, and that encapsulates on providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side connected to peer-to-peer data sharing with data product as a primitive first class concept. >> Okay, so the idea would be developers would build applications leveraging those data products which are discoverable and governed. Now, today you see some companies, you know, take a snowflake for example. >> Zhamak: Yeah. >> Attempting to do that within their own little walled garden. They even, at one point, used the term, "Mesh." I dunno if they pull back on that. And then they sort of became aware of some of your work. But a lot of the things that they're doing within their little insulated environment, you know, support that, that, you know, governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realize that, you know, and this is a reality, like you go to organizations, they have a snowflake and half of the organization happily operates on Snowflake. And on the other half, oh, we are on, you know, bare infrastructure on AWS, or we are on Databricks. This is the realities, you know, this Supercloud that's written up here. It's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use next data mesh operating system. People will have different platforms." So you have to build with openness in mind, and in case of Snowflake, I think, you know, they have I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So, it's worth reviewing that basically, the concept of data mesh is that, whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember, I wrote a blog post in 2007 called, "Data's the new developer kit." Back then, they used to call 'em developer kits, if you remember. And that we said at that time, whoever can code data >> Zhamak: Yes. >> Will have a competitive advantage. >> Aren't there machines going to be doing that? Didn't we just hear that? >> Well we have, and you know, Hey Siri, hey Cube. Find me that best video for data mesh. There it is. I mean, this is the point, like what's happening is that, now, data has to be addressable >> Zhamak: Yes. >> For machines and for coding. >> Zhamak: Yes. >> Because as you need to call the data. So the question is, how do you manage the complexity of big things as promiscuous as possible, making it available as well as then governing it because it's a trade off. The more you make open >> Zhamak: Definitely. >> The better the machine learning. >> Zhamak: Yes. >> But yet, the governance issue, so this is the, you need an OS to handle this maybe. >> Yes, well, we call our mental model for our platform is an OS operating system. Operating systems, you know, have shown us how you can kind of abstract what's complex and take care of, you know, a lot of complexities, but yet provide an open and, you know, dynamic enough interface. So we think about it that way. We try to solve the problem of policies live with the data. An enforcement of the policies happens at the most granular level which is, in this concept, the data product. And that would happen whether you read, write, or access a data product. But we can never imagine what are these policies could be. So our thinking is, okay, we should have a open policy framework that can allow organizations write their own policy drivers, and policy definitions, and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, you know, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now, the primitives that we work with to train machine-learning model are still bits and bites in data. They're fields, rows, columns, right? And that creates quite a large surface area, an attack area for, you know, for privacy of the data. So perhaps, one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area so you can really leave the control of the data to the sovereign owners of that data, right? So that data product. So I think the evolution of our data APIs perhaps will become more and more computational. So you describe what you want, and the data owner decides, you know, how to manage the- >> John: That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment with you, who's a machine learning, could really been around the industry. It's almost as if you're starting to see reason come into the data, reasoning. It's like you starting to see not just metadata, using the data to reason so that you don't have to expose the raw data. It's almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that 'cause that seems to be where the dots are connecting. >> Zhamak: Yes, this is perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge-making mode, however, by just the basic notion of saying, "I'm going to put an API in front of my data, and that API today might be as primitive as a level of indirection as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage, and insert all of this intelligence that need to happen. And then I will, today, I will still give you a file. But by just defining that API and standardizing it, now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can kind of evolve that, right? Now you have a point of evolution to this very futuristic, I guess, future where you just describe the question that you're asking from the chat. >> Well, this is the Supercloud, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so, his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has, and he wants your feedback on this, "Is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products, how do you respond to that? How do you see, is that a problem or is that something that is overstated, or do you have an answer for that?" >> Zhamak: Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not burdening them with the complexity of the application and application logic, and yet reducing their cognitive load by localizing what they need to know about which is that domain where they're operating within. Because what's happening right now? what's happening right now is that data engineers, a ton of empathy for them for their high threshold of pain that they can, you know, deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curates it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers, these are still separately moving units. Your app and your data products are independent but yet tightly closed with each other, tightly coupled with each other based on the context of the domain, so reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application but yet have them them separate from app because app provides a very different service. Transactional data for my e-commerce transaction, data product provides a very different service, longitudinal data for the, you know, variety of this intelligent analysis that I can do on the data. But yet, it's all within the domain of e-commerce or sales or whatnot. >> So a lot of decoupling and coupling create that cohesiveness. >> Zhamak: Absolutely. >> Architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server, data center days and cloud, SRE, Google coined the term, "Site Reliability Engineer" for someone to look over the hundreds of thousands of servers. We asked a question to data engineering community who have been suffering, by the way, agree. Is there an SRE-like role for data? Because in a way, data engineering, that platform engineer, they are like the SRE for data. In other words, managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Zhamak: Yes, exactly. So, maybe we go through that history of how SRE came to be. So we had the first DevOps movement which was, remove the wall between dev and ops and bring them together. So you have one cross-functional units of the organization that's responsible for, you build it you run it, right? So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that, and then we said, "Okay, as we decentralized and had this many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing and running a lot while giving autonomy to this cross-functional team." And that's where the SRE, a new generation of engineers came to exist. So I think if I just look- >> Hence Borg, hence Kubernetes. >> Hence, hence, exactly. Hence chaos engineering, hence embracing the complexity and messiness, right? And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think, if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain oriented cross-functional teams, full stop, and still have a very advanced maybe at the platform infrastructure level kind of operational team that they're not busy doing two jobs which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> John: So you see similarities. >> Absolutely, but I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening yet. Eh, a little bit fast and loose with some complexities to clean up. >> Yes, yes. This is a different restructure. As you said we, you know, the job of this industry as a whole on architects is decompose, recompose, decompose, recomposing a new way, and now we're like decomposing centralized team, recomposing them as domains and- >> John: So is data mesh the killer app for Supercloud? >> You had to do this for me. >> Dave: Sorry, I couldn't- (John and Dave laughing) >> Zhamak: What do you want me to say, Dave? >> John: Yes. >> Zhamak: Yes of course. >> I mean Supercloud, I think it's, really the terminology's Supercloud, Opencloud. But I think, in spirits of it, this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> John: Well thank you so much for coming on Supercloud too, really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. (John laughs) >> John: That's now going well. We can move faster, so thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Okay, Supercloud 2 live here in Palo Alto. Our stage performance, I'm John Furrier with Dave Vellante. We're back with more after this short break, Stay with us all day for Supercloud 2. (gentle bright music)

Published Date : Feb 17 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few years, What's the pain point? a database for many of the organizations. in terms of the approach, but folks that have been close to us to get to, you know, the data, as you know, resides Okay, so the idea would be developers But a lot of the things that they're doing This is the realities, you know, inside of the data. And that we said at that Well we have, and you know, So the question is, how do so this is the, you need and the data owner decides, you know, so that you don't have 'cause that seems to be where of this API, you not So the concern that he has, into the domain closer to So a lot of decoupling So I have to ask you, this a lot of the complexity of domains and the infrastructure, in a more early days of that movement. to clean up. the job of this industry the world would work. John: Well thank you so much for coming Dave: Been a great catalyst. We can move faster, so Thank you for hosting me. after this short break,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Zhamak	PERSON	0.99+
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
John Furrier	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dav	PERSON	0.99+
two jobs	QUANTITY	0.99+
Supercloud	ORGANIZATION	0.99+
NextData	ORGANIZATION	0.99+
today	DATE	0.99+
Opencloud	ORGANIZATION	0.99+
last year	DATE	0.99+
Siri	TITLE	0.99+
ThoughtWorks	ORGANIZATION	0.98+
NextData.com	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
HelloFresh	ORGANIZATION	0.98+
first	QUANTITY	0.98+
millions of dollars	QUANTITY	0.96+
Snowflake	EVENT	0.96+
Oracle	ORGANIZATION	0.96+
SRE	TITLE	0.94+
Snowflake	ORGANIZATION	0.94+
Cube	PERSON	0.93+
Zhama	PERSON	0.92+
Data Mesh the Killer App	TITLE	0.92+
SiliconANGLE	ORGANIZATION	0.91+
Databricks	ORGANIZATION	0.9+
first class	QUANTITY	0.89+
Supercloud 2	ORGANIZATION	0.88+
theCUBE	ORGANIZATION	0.88+
hundreds of thousands	QUANTITY	0.85+
one point	QUANTITY	0.84+
Zham	PERSON	0.83+
Supercloud	EVENT	0.83+
ChatGPT	ORGANIZATION	0.72+
SRE	ORGANIZATION	0.72+
Borg	PERSON	0.7+
Snowflake	TITLE	0.66+
Supercloud	TITLE	0.65+
half	QUANTITY	0.64+

Today’s Data Challenges and the Emergence of Smart Data Fabrics

(intro music) >> Now, as we all know, businesses are awash with data, from financial services to healthcare to supply chain and logistics and more. Our activities, and increasingly, actions from machines are generating new and more useful information in much larger volumes than we've ever seen. Now, meanwhile, our data-hungry society's expectations for experiences are increasingly elevated. Everybody wants to leverage and monetize all this new data coming from smart devices and innumerable sources around the globe. All this data, it surrounds us, but more often than not, it lives in silos, which makes it very difficult to consume, share, and make valuable. These factors, combined with new types of data and analytics, make things even more complicated. Data from ERP systems to images, to data generated from deep learning and machine learning platforms, this is the reality that organizations are facing today. And as such, effectively leveraging all of this data has become an enormous challenge. So, today, we're going to be discussing these modern data challenges and the emergence of so-called "Smart Data Fabrics" as a key solution to said challenges. To do so, we're joined by thought leaders from InterSystems. This is a really creative technology provider that's attacking some of the most challenging data obstacles. InterSystems tells us that they're dedicated to helping customers address their critical scalability, interoperability, and speed-to-value challenges. And in this first segment, we welcome Scott Gnau, he's the global Head of Data Platforms at InterSystems, to discuss the context behind these issues and how smart data fabrics provide a solution. Scott, welcome. Good to see you again. >> Thanks a lot. It's good to be here. >> Yeah. So, look, you and I go back, you know, several years and, you know, you've worked in Tech, you've worked in Data Management your whole career. You've seen many data management solutions, you know, from the early days. And then we went through the hoop, the Hadoop era together and you've come across a number of customer challenges that sort of change along the way. And they've evolved. So, what are some of the most pressing issues that you see today when you're talking to customers and, you know, put on your technical hat if you want to. >> (chuckles) Well, Dave, I think you described it well. It's a perfect storm out there. You know, combined with there's just data everywhere and it's coming up on devices, it's coming from new different kinds of paradigms of processing and people are trying to capture and harness the value from this data. At the same time, you talked about silos and I've talked about data silos through my entire career. And I think, I think the interesting thing about it is for so many years we've talked about, "We've got to reduce the silos and we've got to integrate the data, we've got to consolidate the data." And that was a really good paradigm for a long time. But frankly, the perfect storm that you described? The sources are just too varied. The required agility for a business unit to operate and manage their customers is creating an enormous presser and I think ultimately, silos aren't going away. So, there's a realization that, "Okay, we're going to have these silos, we want to manage them, but how do we really take advantage of data that may live across different parts of our business and in different organizations?" And then of course, the expectation of the consumer is at an all-time high, right? They expect that we're going to treat them and understand their needs or they're going to find some other provider. So, you know, pulling all of this together really means that, you know, our customers and businesses around the world are struggling to keep up and it's forcing a real, a new paradigm shift in underlying data management, right? We started, you know, many, many years ago with data marts and then data warehouses and then we graduated to data lakes, where we expanded beyond just traditional transactional data into all kinds of different data. And at each step along the way, we help businesses to thrive and survive and compete and win. But with the perfect storm that you've described, I think those technologies are now just a piece of the puzzle that is really required for success. And this is really what's leading to data fabrics and data meshes in the industry. >> So what are data fabrics? What problems do they solve? How do they work? Can you just- >> Yeah. So the idea behind it is, and this is not to the exclusion of other technologies that I described in data warehouses and data lakes and so on, but data fabrics kind of take the best of those worlds but add in the notion of being able to do data connectivity with provenance as a way to integrate data versus data consolidation. And when you think about it, you know, data has gravity, right? It's expensive to move data. It's expensive in terms of human cost to do ETL processes where you don't have known provenance of data. So, being able to play data where it lies and connect the information from disparate systems to learn new things about your business is really the ultimate goal. You think about in the world today, we hear about issues with the supply chain and supply and logistics is a big issue, right? Why is that an issue? Because all of these companies are data-driven. They've got lots of access to data. They have formalized and automated their processes, they've installed software, and all of that software is in different systems within different companies. But being able to connect that information together, without changing the underlying system, is an important way to learn and optimize for supply and logistics, as an example. And that's a key use case for data fabrics. Being able to connect, have provenance, not interfere with the operational system, but glean additional knowledge by combining multiple different operational systems' data together. >> And to your point, data is by its very nature, you know, distributed around the globe, it's on different clouds, it's in different systems. You mentioned "data mesh" before. How do data fabrics relate to this concept of data mesh? Are they competing? Are they complimentary? >> Ultimately, we think that they're complimentary. And we actually like to talk about smart data fabrics as a way to kind of combine the best of the two worlds. >> What is that? >> The biggest thing really is there's a lot around data fabric architecture that talks about centralized processing. And in data meshes, it's more about distributed processing. Ultimately, we think a smart data fabric will support both and have them be interchangeable and be able to be used where it makes the most sense. There are some things where it makes sense to process, you know, for a local business unit, or even on a device for real-time kinds of implementations. There are some other areas where centralized processing of multiple different data sources make sense. And what we're saying is, "Your technology and the architecture that you define behind that technology should allow for both where they make the most sense." >> What's the bottom line business benefit of implementing a data fabric? What can I expect if I go that route? >> I think there are a couple of things, right? Certainly, being able to interact with customers in real time and being able to manage through changes in the marketplace is certainly a key concept. Time-to-value is another key concept. You know, if you think about the supply and logistics discussion that I had before, right? No company is going to rewrite their ERP operational system. It's how they manage and run their business. But being able to glean additional insights from that data combined with data from a partner combined with data from a customer or combined with algorithmic data that, you know, you may create some sort of forecast and that you want to fit into. And being able to combine that together without interfering with the operational process and get those answers quickly is an important thing. So, seeing through the silos and being able to do the connectivity, being able to have interoperability, and then, combining that with flexibility on the analytics and flexibility on the algorithms you might want to run against that data. Because in today's world, of course, you know, certainly there's the notion of predictive modeling and relational theory, but also now adding in machine learning, deep learning algorithms, and have all of those things kind of be interchangeable is another important concept behind data fabric. So you're not relegated to one type of processing. You're saying, "It's data and I have multiple different processing engines and I may want to interchange them over time." >> So, I know, well actually, you know, when you said "real time", I infer from that, I don't have a zillion copies of the data and it's not in a bunch of silos. Is that a correct premise? >> You try to minimize your copies of the data? >> Yeah. Okay. >> There's certainly, there's a nirvana that says, "There's only ever one copy of data." That's probably impossible. But you certainly don't want to be forced into making multiple copies of data to support different processing engines unnecessarily. >> And so, you've recently made some enhancements to the data fabric capability that takes it, you know, ostensibly to the next level. Is that the smart piece? Is that machine intelligence? Can you describe what's in there? >> Well, you know, ultimately, the business benefit is be able to have a single source of the truth for a company. And so, what we're doing is combining multiple technologies in a single set of software that makes that software agile and supportable and not fragile for deployment of applications. At its core, what we're saying is, you know, we want to be able to consume any kind of data and I think your data fabric architecture is predicated on the fact that you're going to have relational data, you're going to have document data, you may have key-value store data, you may have images, you may have other things, and you want to be able to not be limited by the kind of data that you want to process. And so that certainly is what we build into our product set. And then, you want to be able to have any kind of algorithm, where appropriate, run against that data without having to do a bunch of massive ETL processes or make another copy of the data and move it somewhere else. And so, to that end, we have, taking our award-winning engine, which, you know, provides, you know, traditional analytic capabilities and relational capabilities, we've now integrated machine learning. So, you basically can bring machine learning algorithms to the data without having to move data to the machine learning algorithm. What does that mean? Well, number one, your application developer doesn't have to think differently to take advantage of the new algorithm. So that's a really good thing. The other thing that happens is if you, you're playing that algorithm where the data actually exists from your operational system, that means the round trip from running the model to inferring some decision you want to make to actually implementing that decision can happen instantaneously, as opposed to, you know, other kinds of architectures, where you may want to make a copy of the data and move it somewhere else. That takes time, latency. Now the data gets stale, your model may not be as efficient because you're running against stale data. We've now taken all of that off the table by being able to pull that processing inside the data fabric, inside of the single source of truth. >> And you got to manage all that complexity. So you got one system, so that makes it, you know, cost-effective, and you're bringing modern tooling to the platform. Is that right? >> That's correct. >> How can people learn more and maybe continue the conversation with you if they have other questions? (both chuckle) >> Call or write. >> Yeah. >> Yeah, I mean, certainly, check out our website. We've got a lot of information about the different kinds of solutions, the different industries, the different technologies. Reach out: scottg@intersystems.com. >> Excellent. Thank you, Scott. Really appreciate it and great to see you again. >> Good to see you. >> All right, keep it right there. We have a demo coming up next. You want to see smart data fabrics in action? Stay tuned. (ambient music)

Published Date : Feb 17 2023

SUMMARY :

Good to see you again. It's good to be here. and I go back, you know, and data meshes in the industry. and this is not to the exclusion data is by its very nature, you know, the best of the two worlds. and be able to be used where and that you want to fit into. and it's not in a bunch of silos. But you certainly don't want to be forced Is that the smart piece? and you want to be able to not be limited so that makes it, you about the different kinds of solutions, great to see you again. data fabrics in action?

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Scott Gnau	PERSON	0.99+
scottg@intersystems.com	OTHER	0.99+
one system	QUANTITY	0.99+
both	QUANTITY	0.99+
one copy	QUANTITY	0.99+
today	DATE	0.98+
first segment	QUANTITY	0.98+
single	QUANTITY	0.97+
each step	QUANTITY	0.96+
two worlds	QUANTITY	0.96+
single source	QUANTITY	0.96+
single set	QUANTITY	0.94+
Today	DATE	0.91+
many years ago	DATE	0.84+
zillion copies	QUANTITY	0.73+
one type	QUANTITY	0.71+
one	QUANTITY	0.64+

Applying Smart Data Fabrics Across Industries

(upbeat music) >> Today more than ever before, organizations are striving to gain a competitive advantage, deliver more value to customers, reduce risk, and respond more quickly to the needs of businesses. Now, to achieve these goals, organizations need easy access to a single view of accurate, consistent and very importantly, trusted data. If it's not trusted, nobody's going to use it and all in near real time. However, the growing volumes and complexities of data make this difficult to achieve in practice. Not to mention the organizational challenges that have evolved as data becomes increasingly important to winning in the marketplace. Specifically as data grows, so does the prevalence of data silos, making, integrating and leveraging data from internal and external sources a real challenge. Now, in this final segment, we'll hear from Joe Lichtenberg who's the global head of product and industry marketing, and he's going to discuss how smart data fabrics can be applied to different industries. And by way of these use cases, we'll probe Joe's vast knowledge base and ask him to highlight how InterSystems, which touts a next gen approach to Customer 360, how the company leverages a smart data fabric to provide organizations of varying sizes and sectors in financial services, supply chain, logistics and healthcare with a better, faster and easier way to deliver value to the business. Joe welcome, great to have you here. >> Thank you, it's great to be here. That was some intro. I could not have said it better myself, so thank you for that. >> Thank you. Well, we're happy to have you on this show now. I understand- >> It's great to be here. >> You you've made a career helping large businesses with technology solutions, small businesses, and then scale those solutions to meet whatever needs they had. And of course, you're a vocal advocate as is your company of data fabrics. We talked to Scott earlier about data fabrics, how it relates to data mesh big discussions in the industry. So tell us more about your perspective. >> Sure, so first I would say that I have been in this industry for a very long time so I've been like you, I'm sure, for decades working with customers and with technology, really to solve these same kinds of challenges. So for decades, companies have been working with lots and lots of data and trying to get business value to solve all sorts of different challenges. And I will tell you that I've seen many different approaches and different technologies over the years. So, early on, point to point connections with custom coding, and I've worked with integration platforms 20 years ago with the advent of web services and service-oriented architectures and exposing endpoints with wisdom and getting access to disparate data from across the organization. And more recently, obviously with data warehouses and data lakes and now moving workloads to the cloud with cloud-based data marts and data warehouses. Lots of approaches that I've seen over the years but yet still challenges remain in terms of getting access to a single trusted real-time view of data. And so, recently, we ran a survey of more than 500 different business users across different industries and 86% told us that they still lack confidence in using their data to make decisions. That's a huge number, right? And if you think about all of the work and all of the technology and approaches over the years, that is a surprising number and drilling into why that is, there were three main reasons. One is latency. So the amount of time that it takes to access the data and process the data and make it fit for purpose by the time the business has access to the data and the information that they need, the opportunity has passed. >> Elapsed time, not speed a light, right? But that too maybe. >> But it takes a long time if you think about these processes and you have to take the data and copy it and run ETL processes and prepare it. So that's one, one is just the amount of data that's disparate in data silos. So still struggling with data that is dispersed across different systems in different formats. And the third, is data democratization. So the business really wants to have access to the data so that they can drill into the data and ask ad hoc questions and the next question and drill into the information and see where it leads them rather than having sort of pre-structured data and pre-structured queries and having to go back to IT and put the request back on the queue again and waiting. >> So it takes too long, the data's too hard to get to 'cause it's in silos and the data lacks context because it's technical people that are serving up the data to the business people. >> Exactly. >> And there's a mismatch. >> Exactly right. So they call that data democratization or giving the business access to the data and the tools that they need to get the answers that they need in the moment. >> So the skeptic in me, 'cause you're right I have seen this story before and the problems seem like they keep coming up, year after year, decade after decade. But I'm an optimist and so. >> As am I. >> And so I sometimes say, okay, same wine new bottle, but it feels like it's different this time around with data fabrics. You guys talk about smart data fabrics from your perspective, what's different? >> Yeah, it's very exciting and it's a fundamentally different approach. So if you think about all of these prior approaches, and by the way, all of these prior approaches have added value, right? It's not like they were bad, but there's still limitations and the business still isn't getting access to all the data that they need in the moment, right? So data warehouses are terrific if you know the questions that you want answered and you take the data and you structure the data in advance. And so now you're serving the business with sort of pre-planned answers to pre-planned queries, right? The data fabric, what we call a smart data fabric is fundamentally different. It's a fundamentally different approach in that rather than sort of in batch mode, taking the data and making it fit for purpose with all the complexity and delays associated with it, with a data fabric where accessing the data on demand as it's needed, as it's requested, either by the business or by applications or by the data scientists directly from the source systems. >> So you're not copying it necessarily to that to make that you're not FTPing it, for instance. I've got it, you take it, you're basically using the same source. >> You're pulling the data on demand as it's being requested by the consumers. And then all of the data management processes that need to be applied for integration and transformation to get the data into a consistent format and business rules and analytic queries. And with Jess showed with machine learning, predictive prescriptive analytics all sorts of powerful capabilities are built into the fabric so that as you're pulling the data on demand, right, all of these processes are being applied and the net result is you're addressing these limitations around latency and silos that we've seen in the past. >> Okay, so you've talked about you have a lot of customers, InterSystems does in different industries supply chain, financial services, manufacturing. We heard from just healthcare. What are you seeing in terms of applications of smart data fabrics in the real world? >> Yeah, so we see it in every industry. So InterSystems, as you know, has been around now for 43 years, and we have tens of thousands of customers in every industry. And this architectural pattern now is providing value for really critical use cases in every industry. So I'm happy to talk to you about some that we're seeing. I could actually spend like three hours here and there but I'm very passionate about working with customers and there's all sorts of exciting. >> What are some of your favorites? >> So, obviously supply chain right now is going through a very challenging time. So the combination of what's happening with the pandemic and disruptions and now I understand eggs are difficult to come by I just heard on NPR. >> Yeah and it's in part a data problem and a big part of data problem, is that fair? >> Yeah and so, in supply chain, first there's supply chain visibility. So organizations want a real time or near real time expansive view of what's happening across the entire supply chain from a supply all the way through distribution, right? So that's only part of the issue but that's a huge sort of real-time data silos problem. So if you think about your extended supply chain, it's complicated enough with all the systems and silos inside your firewall, before all of your suppliers even just thinking about your tier one suppliers let alone tier two and tier three. And then building on top of real-time visibility is what the industry calls a control tower, what we call the ultimate control tower. And so it's built in analytics to be able to sense disruptions and exceptions as they occur and predict the likelihood of these disruptions occurring. And then having data driven and analytics driven guidance in terms of the best way to deal with these disruptions. So for example, an order is missing line items or a cargo ship is stuck off port somewhere. What do you do about it? Do you reroute a different cargo ship, right? Do you take an order that's en route to a different client and reroute that? What's the cost associated? What's the impact associated with it? So that's a huge issue right now around control towers for supply chain. So that's one. >> Can I ask you a question about that? Because you and I have both seen a lot but we've never seen, at least I haven't the economy completely shut down like it was in March of 2020, and now we're seeing this sort of slingshot effect almost like you're driving on the highway sometimes you don't know why, but all of a sudden you slow down and then you speed up, you think it's okay then you slow down again. Do you feel like you guys can help get a handle on that product because it goes on both sides. Sometimes you can't get the product, sometimes there's too much of a product as well and that's not good for business. >> Yeah, absolutely. You want to smooth out the peaks and valleys. >> Yeah. >> And that's a big business goal, business challenge for supply chain executives, right? So you want to make sure that you can respond to demand but you don't want to overstock because there's cost associated with that as well. So how do you optimize the supply chains and it's very much a data silo and a real time challenge. So it's a perfect fit for this new architectural pattern. >> All right, what else? >> So if we look at financial services, we have many, many customers in financial services and that's another industry where they have many different sources of data that all have information that organizations can use to really move the needle if they could just get to that single source of truth in real time. So we sort of bucket many different implementations and use cases that we do around what we call Business 360 and Customer 360. So Business 360, there's all sorts of ways to add business value in terms of having a real-time operational view across all of the different GOs and parts of the business, especially in these very large global financial services institutions like capital markets and investment firms and so forth. So around Business 360, having a realtime view of risk, operational performance regulatory compliance, things like that. Customer 360, there's a whole set of use cases around Customer 360 around hyper-personalization of customers and in realtime next best action looking to see how you can sell more increase share of wallet, cross-sell, upsell to customers. We also do a lot in terms of predicting customer churn. So if you have all the historical data and what's the likelihood of customers churning to be able to proactively intercede, right? It's much more cost effective to keep assets under management and keep clients rather than going and getting new clients to come to the firm. A very interesting use case from one of our customers in Latin America, so Banco do Brasil largest bank in all of Latin America and they have a very innovative CTO who's always looking for new ways to move the needle for the bank. And so one of their ideas and we're working with them to do this is how can they generate net new revenue streams by bringing in new business to the bank? And so they identified a large percentage of the population in Latin America that does no banking. So they have no banking history not only with Banco do Brasil, but with any bank. So there's a fair amount of risk associated with offering services to this segment of the population that's not associated with any banks or financial institutions. >> There is no historical data on them, there's no. >> So it's a data challenge. And so, they're bringing in data from a variety of different sources, social media, open source data that they find online and so forth. And with us running risk models to identify which are the citizens that there's acceptable risk to offer their services. >> It's going to be huge market of unbanked people in vision Latin America. >> Wow, that's interesting. >> Yeah, yeah, totally vision. >> And if you can lower the risk and you could tap that market and be first >> And they are, yeah. >> Yeah. >> So very exciting. Manufacturing, we know industry 4.0 which is about taking the OT data, so the data from the MES systems and the streaming data, real-time streaming data from the machine controllers and integrating it with the IT data, so your data warehouses and your ERP systems and so forth to have not only a real-time view of manufacturing from supply and source all the way through demand but also predictive maintenance and things like that. So that's very big right now in manufacturing. >> Kind of cool to hear these use cases beyond your healthcare, which is obviously, your wheelhouse, Scott defined this term of smart data fabrics, different than data fabrics, I guess. So when we think about these use cases what's the value add of so-called smart data fabrics? >> Yeah, it's a great question. So we did not define the term data fabric or enterprise data fabric. The analysts now are all over it. They're all saying it's the future of data management. It's a fundamentally different approach this architectural approach to be able to access the data on demand. The canonical definition of a data fabric is to access the data where it lies and apply a set of data management processes, but it does not include analytics, interestingly. And so we firmly believe that most of these use cases gain value from having analytics built directly into the fabric. So whether that's business rules or predictive analytics to predict the likelihood of a customer churn or a machine on the shop floor failing or prescriptive analytics. So if there's a problem in the supply chain, what's the guidance for the supply chain managers to take the best action, right? Prescriptive analytics based on data. So rather than taking the data and the data fabric and moving it to another environment to run those analytics where you have complexity and latency, having tall of those analytics capabilities built directly into the fabric, which is why we call it a smart data fabric, brings a lot of value to our customers. >> So simplifies the whole data lifecycle, data pipelining, the hyper-specialized roles that you have to have, you can really just focus on one platform, is that? >> Exactly, basically, yeah. And it's a simplicity of architecture and faster speed to production. So a big differentiator for our technology, for InterSystems, Iris, is most if not all of the capabilities that are needed are built into one engine, right? So you don't need to stitch together 10 or 15 or 20 different data management services for relational database in a non-relational database and a caching layer and a data warehouse and security and so forth. And so you can do that. There's many ways to build this data fabric architecture, right? InterSystems is not the only way. >> Right? >> But if you can speed and simplify the implementation of the fabric by having most of what you need in one engine, one product that gets you to where you need to go much, much faster. >> Joe, how can people learn more about smart data Fabric some of the use cases that you've presented here? >> Yeah, come to our website, intersystems.com. If you go to intersystems.com/smartdatafabric that'll take you there. >> I know that you have like probably dozens more examples but it would be cool- >> I do. >> If people reach out to you, how can they get in touch? >> Oh, I would love that. So feel free to reach out to me on LinkedIn. It's Joe Lichtenberg I think it's linkedin.com/joeLichtenberg and I'd love to connect. >> Awesome. Joe, thanks so much for your time. Really appreciate it. >> It was great to be here. Thank you, Dave. >> All right, I hope you've enjoyed our program today. You know, we heard Scott now he helped us understand this notion of data fabrics and smart data fabrics and how they can address the data challenges faced by the vast majority of organizations today. Jess Jody's demo was awesome. It was really a highlight of the program where she showed the smart data fabrics inaction and Joe Lichtenberg, we just heard from him dug in to some of the prominent use cases and proof points. We hope this content was educational and inspires you to action. Now, don't forget all these videos are available on Demand to watch, rewatch and share. Go to theCUBE.net, check out siliconangle.com for all the news and analysis and we'll summarize the highlights of this program and go to intersystems.com because there are a ton of resources there. In particular, there's a knowledge hub where you'll find some excellent educational content and online learning courses. There's a resource library with analyst reports, technical documentation videos, some great freebies. So check it out. This is Dave Vellante. On behalf of theCUBE and our supporter, InterSystems, thanks for watching and we'll see you next time. (upbeat music)

Published Date : Feb 15 2023

SUMMARY :

and ask him to highlight how InterSystems, so thank you for that. you on this show now. big discussions in the industry. and all of the technology and But that too maybe. and drill into the information and the data lacks context or giving the business access to the data and the problems seem And so I sometimes say, okay, and by the way, to that to make that you're and the net result is you're fabrics in the real world? So I'm happy to talk to you So the combination and predict the likelihood of but all of a sudden you slow the peaks and valleys. So how do you optimize the supply chains of the different GOs and parts data on them, there's no. risk models to identify It's going to be huge market and integrating it with the IT Kind of cool to hear these use cases and moving it to another if not all of the capabilities and simplify the Yeah, come to our and I'd love to connect. Joe, thanks so much for your time. It was great to be here. and go to intersystems.com

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Joe	PERSON	0.99+
Joe Lichtenberg	PERSON	0.99+
Dave	PERSON	0.99+
Banco do Brasil	ORGANIZATION	0.99+
Scott	PERSON	0.99+
March of 2020	DATE	0.99+
Jess Jody	PERSON	0.99+
Latin America	LOCATION	0.99+
InterSystems	ORGANIZATION	0.99+
Latin America	LOCATION	0.99+
Banco do Brasil	ORGANIZATION	0.99+
10	QUANTITY	0.99+
43 years	QUANTITY	0.99+
three hours	QUANTITY	0.99+
15	QUANTITY	0.99+
86%	QUANTITY	0.99+
Jess	PERSON	0.99+
one product	QUANTITY	0.99+
linkedin.com/joeLichtenberg	OTHER	0.99+
theCUBE.net	OTHER	0.99+
LinkedIn	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
intersystems.com/smartdatafabric	OTHER	0.99+
One	QUANTITY	0.99+
one engine	QUANTITY	0.99+
one	QUANTITY	0.99+
third	QUANTITY	0.98+
Today	DATE	0.98+
both	QUANTITY	0.98+
intersystems.com	OTHER	0.98+
more than 500 different business users	QUANTITY	0.98+
first	QUANTITY	0.98+
one platform	QUANTITY	0.98+
siliconangle.com	OTHER	0.98+
single	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
tens of thousands of customers	QUANTITY	0.95+
three main reasons	QUANTITY	0.94+
20 years ago	DATE	0.92+
dozens more examples	QUANTITY	0.9+
today	DATE	0.9+
NPR	ORGANIZATION	0.9+
tier one	QUANTITY	0.9+
single view	QUANTITY	0.89+
single source	QUANTITY	0.88+
Business 360	TITLE	0.82+
pandemic	EVENT	0.81+
one of	QUANTITY	0.77+
20 different data management services	QUANTITY	0.76+
tier	QUANTITY	0.74+
resources	QUANTITY	0.73+
Customer 360	ORGANIZATION	0.72+
tier three	OTHER	0.72+
Business 360	ORGANIZATION	0.72+
decade	QUANTITY	0.68+
Business	ORGANIZATION	0.68+
decades	QUANTITY	0.68+
Iris	ORGANIZATION	0.63+
360	TITLE	0.63+
two	OTHER	0.61+
Customer 360	TITLE	0.47+
ton	QUANTITY	0.43+
360	OTHER	0.24+

Today’s Data Challenges and the Emergence of Smart Data Fabrics

(upbeat music) >> Now, as we all know, businesses are awash with data, from financial services to healthcare to supply chain and logistics and more. Our activities, and increasingly, actions from machines are generating new and more useful information in much larger volumes than we've ever seen. Now, meanwhile, our data hungry society's expectations for experiences are increasingly elevated. Everybody wants to leverage and monetize all this new data coming from smart devices and innumerable sources around the globe. All this data, it surrounds us, but more often than not, it lives in silos, which makes it very difficult to consume, share, and make valuable. These factors combined with new types of data and analytics make things even more complicated. Data from ERP systems to images, to data generated from deep learning and machine learning platforms, this is the reality that organizations are facing today. And as such, effectively leveraging all of this data has become an enormous challenge. So today, we're going to be discussing these modern data challenges in the emergence of so-called smart data fabrics as a key solution to said challenges. To do so, we're joined by thought leaders from InterSystems. This is a really creative technology provider that's attacking some of the most challenging data obstacles. InterSystems tells us that they're dedicated to helping customers address their critical scalability, interoperability, and speed to value challenges. And in this first segment, we welcome Scott now. He's the global head of data platforms at InterSystems to discuss the context behind these issues and how smart data fabrics provide a solution. Scott, welcome, good to see you again. >> Thanks a lot. It's good to be here. >> Yeah, so look, you and I go back, you know, several years and you've worked in tech. You've worked in data management your whole career. You've seen many data management solutions, you know, from the early days. And then we went through the Hadoop era together. And you've come across a number of customer challenges that sort of changed along the way, and they've evolved. So what are some of the most pressing issues that you see today when you're talking to customers, and, you know, put on your technical hat if you want to? >> Well, Dave, I think you described it well. It's a perfect storm out there, you know, combined with, there's just data everywhere. And it's coming up on devices, it's coming from new different kinds of paradigms of processing and people are trying to capture and harness the value from this data. At the same time, you talked about silos, and I've talked about data silos through my entire career. And I think the interesting thing about it is for so many years we've talked about we've got to reduce the silos, and we've got to integrate the data, we've got to consolidate the data. And that was a really good paradigm for a long time. But frankly, the perfect storm that you described, the sources are just too varied. The required agility for a business unit to operate and manage their customers is creating an enormous pressure. And I think, ultimately, silos aren't going away. So there's a realization that, okay, we're going to have these silos, we want to manage them, but how do we really take advantage of data that may live across different parts of our business and in different organizations? And then, of course, the expectation of the consumer is at an all-time high, right? They expect that we're going to treat them and understand their needs, or they're going to find some other provider. So, you know, pulling all of this together really means that, you know, our customers and businesses around the world are struggling to keep up, and it's forcing a new paradigm shift in underlying data management, right? We started, you know, many, many years ago with data marts and then data warehouses, and then we graduated to data lakes where we expanded beyond just traditional transactional data into all kinds of different data. And at each step along the way, we help businesses to thrive and survive and compete and win. But with the perfect storm that you've described, I think those technologies are now just a piece of the puzzle that is really required for success. And this is really what's leading to data fabrics and data meshes in the industry. >> So what are data fabrics? What problems do they solve? How do they work? Can you just add- >> Yeah, so the idea behind it is, and this is not to the exclusion of other technologies that I described in data warehouses and data lakes and so on. But data fabrics kind of take the best of those worlds, but add in the notion of being able to do data connectivity with provenance as a way to integrate data versus data consolidation. And when you think about it, you know, data has gravity, right? It's expensive to move data. It's expensive in terms of human cost to do ETL processes where you don't have known provenance of data. So being able to play data where it lies and connect the information from disparate systems to learn new things about your business is really the ultimate goal. You think about in the world today, we hear about issues with the supply chain, and supply and logistics is a big issue, right? Why is that an issue? Because all of these companies are data driven. They've got lots of access to data. They have formalized and automated their processes. They've installed software. And all of that software is in different systems within different companies. But being able to connect that information together without changing the underlying system is an important way to learn and optimize for supply and logistics, as an example. And that's a key use case for data fabrics being able to connect, have provenance, not interfere with the operational system, but glean additional knowledge by combining multiple different operational systems' data together. >> And to your point, data is by its very nature, you're distributed around the globe, it's on different clouds, it's in different systems. You mentioned data mesh before. How do data fabrics relate to this concept of data mesh? Are they competing? Are they complimentary? >> Ultimately, we think that they're complimentary. And we actually like to talk about smart data fabrics as a way to kind of combine the best of the two worlds. >> What is that? I mean, the biggest thing really is there's a lot around data fabric architecture that talks about centralized processing. And in data meshes, it's more about distributed processing. Ultimately, we think a smart data fabric will support both and have them be interchangeable and be able to be used where it makes the most sense. There are some things where it makes sense to process, you know, for a local business unit, or even on a device for real time kinds of implementations. There are some other areas where centralized processing of multiple different data sources make sense. And what we're saying is your technology and the architecture that you define behind that technology should allow for both where they make the most sense. >> What's the bottom line business benefit of implementing a data fabric? What can I expect if I go that route? >> I think there are a couple of things, right? Certainly being able to interact with customers in real time and being able to manage through changes in the marketplace is certainly a key concept. Time to value is another key concept. You know, if you think about the supply and logistics discussion that I had before, right? No company is going to rewrite their ERP operational system. It's how they manage and run their business. But being able to glean additional insights from that data combined with data from a partner, combined with data from a customer, or combined with algorithmic data that, you know, you may create some sort of forecast and that you want to fit into. And being able to combine that together without interfering with the operational process and get those answers quickly is an important thing. So seeing through the silos and being able to do the connectivity being able to have interoperability, and then combining that with flexibility on the analytics and flexibility on the algorithms you might want to run against that data. Because in today's world, of course, certainly there's the notion of predictive modeling and relational theory, but also now adding in machine learning, deep learning algorithms, and have all of those things kind of be interchangeable is another important concept behind data fabrics. So you're not relegated to one type of processing. You're saying it's data, and I have multiple different processing engines and I may want to interchange them over time. >> So, I know, well actually, when you said real time, I infer from that I don't have a zillion copies of the data and it's not in a bunch of silos. Is that a correct premise? >> You try to minimize your copies of the data. There's a nirvana that says there's only ever one copy of data. That's probably impossible. But you certainly don't want to be forced into making multiple copies of data to support different processing engines unnecessarily. >> And so you've recently made some enhancements to the data fabric capability that takes it, you know, ostensibly to the next level. Is that the smart piece, is that machine intelligence? Can you describe what's in there? >> Well, you know, ultimately the business benefit is be able to have a single source of the truth for a company. And so what we're doing is combining multiple technologies in a single set of software that makes that software agile and supportable and not fragile for deployment of applications. At its core, what we're saying is, we want to be able to consume any kind of data, and I think your data fabric architecture is predicated on the fact that you're going to have relational data you're going to have document data, you may have key value store data, you may have images, you may have other things, and you want to be able to not be limited by the kind of data that you want to process. And so that certainly is what we build into our product set. And then you want to be able to have any kind of algorithm where appropriate run against that data without having to do a bunch of massive ETL processes or make another copy of the data and move it somewhere else. And so to that end, we have taken our award-winning engine, which, you know, provides traditional analytic capabilities and relational capabilities. We've now integrated machine learning. So you basically can bring machine learning algorithms to the data without having to move data to the machine learning algorithm. What does that mean? Well, number one, your application developer doesn't have to think differently to take advantage of the new algorithms. So that's a really good thing. The other thing that happens is if you're playing that algorithm where the data actually exists from your operational system, that means the roundtrip from running the model to inferring some decision you want to make to actually implementing that decision can happen instantaneously. As opposed to, you know, other kinds of architectures where you may want to make a copy of the data and move it somewhere else. That takes time, latency. Now the data gets stale. Your model may not be as efficient because you're running against stale data. We've now taken all of that off the table by being able to pull that processing inside the data fabric, inside of the single source of truth. >> And you got to manage all that complexity. So you got one system, so that makes it cost effective, and you're bringing modern tooling to the platform. Is that right? >> That's correct. How can people learn more and maybe continue the conversation with you if they have other questions? >> (Scott laughs) Call or write. Yeah, I mean, certainly check out our website. We've got a lot of information about the different kinds of solutions, the different industries, the different technologies. Reach out at scottg@intersystems.com. >> Excellent, thank you, Scott. Really appreciate it. And great to see you again. >> Good to see you. All right, keep it right there. We have a demo coming up next. If you want to see smart data fabrics in action, stay tuned. (upbeat music)

Published Date : Feb 15 2023

SUMMARY :

and innumerable sources around the globe. It's good to be here. that you see today when At the same time, you talked about silos, and this is not to the exclusion And to your point, data the best of the two worlds. and the architecture that you define and that you want to fit into. and it's not in a bunch of silos. But you certainly don't want to be forced Is that the smart piece, is and you want to be able to not be limited And you got to manage the conversation with you if about the different kinds of solutions, And great to see you again. If you want to see smart

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Dave	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
scottg@intersystems.com	OTHER	0.99+
both	QUANTITY	0.99+
one system	QUANTITY	0.99+
one copy	QUANTITY	0.99+
today	DATE	0.98+
first segment	QUANTITY	0.98+
each step	QUANTITY	0.97+
single source	QUANTITY	0.93+
two worlds	QUANTITY	0.92+
many years ago	DATE	0.87+
zillion copies	QUANTITY	0.86+
single set	QUANTITY	0.84+
one type	QUANTITY	0.83+
Today	DATE	0.67+
one	QUANTITY	0.33+

How to Make a Data Fabric "Smart": A Technical Demo With Jess Jowdy

>> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities including data exploration, business intelligence natural language processing, and machine learning directly within the fabric, makes it faster and easier for organizations to gain new insights and power intelligence, predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi. Yeah, thank you so much for having me. And so for this demo we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software or adverse reaction warnings from a clinical risk grouping application and so much more. So I'm really going to be assimilating a patient logging on in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here and I'm going to be looking for information where the last name of this patient is Simmons and their medical record number their patient identifier in the system is 32345. And so as you can see I have this single JSON payload that showed up here of just relevant clinical information for my patient whose last name is Simmons all within a single response. So fantastic, right? Typically though when we see responses that look like this there is an assumption that this service is interacting with a single backend system and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here we have our data fabric coordinator which is going to be in charge of this refinement and analysis those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end we do also support full lifecycle API management within this platform. When you're dealing with APIs I always like to make a little shout out on this that you really want to make sure you have enough like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what contact. >> Can I just interrupt you for a second? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you can have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry and API securities are really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of security. >> And that's been designed in, >> Absolutely, yes. it's not a bolt-on as they like to say. Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like FIRE, interactions with a homegrown enterprise data warehouse for instance may use SQL for a cloud-based solutions managed by a vendor. They may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and and applications. And I'm about to log out so I'm going to keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST or SOAP or SQL or FTP regardless of that protocol there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as, in healthcare we have H7, we have FIRE we have CCDs across the industry. JSON is, you know, really hitting a market strong now and XML, payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR I'm leveraging a standard healthcare messaging format known as FIRE, which is a rest based protocol. And then when I'm working with my health record management system I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So let's, why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out of the box or black box approach to be able to develop things that are specific to their data fabric or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you cannot, you not only get an opportunity to view how we're establishing these connections or how we're building out these processes but you have the opportunity to inject your own kind of processing your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out of the box code that is provided in this data fabric platform from IRIS combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out of the box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. >> It's a lot here. You know, actually, if I could >> I can pause. >> If I just want to sort of play that back. So we went through the connect and the collect phase. >> And the collect, yes, we're going into refine. So it's a good place to stop. >> Yeah, so before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know the ability to bring in different dev tools. We heard about FIRE, which of course big in healthcare. >> Absolutely. >> And that's the standard and then SQL for traditional kind of structured data and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely, and I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refine. >> We're going into refinement, exciting. So how do we actually do refinement? Where does refinement happen and how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator or stands for smart data fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information it's aggregating it and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like and as you can see it follows a flow chart like structure. So there's a start, there is an end and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL Logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce or we make this data fabric a bit smarter and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging 'cause you need to be able to know, you know, if there was an issue, where did that issue happen, in which connected process and how did it affect the other processes that are related to it. In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric to when data was sent back out from that smart data fabric. So I didn't record the time but I bet if you recorded the time it was this time that we sent that request in. And you can see my patient's name and their medical record number here and you can see that that instigated four different calls to four different systems and they're represented by these arrows going out. So we sent something to chart script to our health record management system, to our clinical risk grouping application into my EMR through their FIRE server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems and we bundle them together. And for my FIRE lovers, here's our FIRE bundle that we got back from our FIRE server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping or errors that were thrown alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Etcher, who did what, when, where what did the machine do? What went wrong and where did that go wrong? >> Exactly. >> Right in your fingertips. >> Right, and I'm a visual person so a bunch of log files to me is not the most helpful. Well, being able to see this happened at this time in this location gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information, it's transforming that data, in a format that your consumer's not going to understand it's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? This all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well we can keep going. 'Cause right now, I mean we can, oh, we're at 18 minutes. God help us. You can cut some of this. (laughs) I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this but essentially if we go back to our coordinator here we can see here's that original that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at and we're running it through a machine learning model that exists within the smart data fabric pipeline and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL like syntax to be able to construct and execute these predictions. So it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge. >> Yes. >> Right, because it directly affects the cost of for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment or you know, as an outpatient perhaps to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely, absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day which is what makes this so exciting. >> Awesome demo. >> Thank you. >> Fantastic people, are you cool? If people want to get in touch with you? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy and we'd love to hear from you. I always love talking about this topic, so would be happy to engage on that. >> Great stuff, thank you Jess, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment we're going to dig into the use cases where data fabric is driving business value. Stay right there.

Published Date : Feb 15 2023

SUMMARY :

for organizations to gain new insights And to that end we do also So you were showing hundreds of these APIs in the healthcare industry So the way that we handle that it's not a bolt-on as they like to say. that data fabric to ensure that we're able It's a lot here. So we went through the So it's a good place to stop. the ability to bring And so you have a rich collection So that platform needs to we're going into refine. that are related to it. so a bunch of log files to of being able to create this technology to support Anything else you want to show us? So in this scenario, we're Well that readmission and the patient, you know. to that smart data fabric So you can find me on you Jess, appreciate it. because in the next segment

ENTITIES

Entity	Category	Confidence
Jessica Jowdy	PERSON	0.99+
Joe Lichtenberg	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Jess Jowdy	PERSON	0.99+
Scott	PERSON	0.99+
Jess	PERSON	0.99+
18 minutes	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
32345	OTHER	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
each	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
third segment	QUANTITY	0.99+
Etcher	ORGANIZATION	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.98+
one	QUANTITY	0.98+
JSON	TITLE	0.96+
each data source	QUANTITY	0.96+
single	QUANTITY	0.95+
one step	QUANTITY	0.94+
one step	QUANTITY	0.94+
single backend	QUANTITY	0.92+
single response	QUANTITY	0.9+
two more	QUANTITY	0.85+
single payload	QUANTITY	0.84+
SQL Logic	TITLE	0.84+
a second	QUANTITY	0.83+
IRIS	TITLE	0.83+
four different segments	QUANTITY	0.82+
Postman	PERSON	0.78+
FIRE	TITLE	0.77+
SOAP	TITLE	0.76+
four different applications	QUANTITY	0.74+
one stop	QUANTITY	0.74+
Postman	TITLE	0.73+
one payload	QUANTITY	0.72+
each of	QUANTITY	0.71+
REST	TITLE	0.7+
Healthcare Field Engineering	ORGANIZATION	0.67+
next 30 days	DATE	0.65+
four	QUANTITY	0.63+
these APIs	QUANTITY	0.62+
second	QUANTITY	0.54+
God	PERSON	0.53+
every	QUANTITY	0.53+
services	QUANTITY	0.51+
H7	COMMERCIAL_ITEM	0.5+
application	QUANTITY	0.48+
FIRE	ORGANIZATION	0.38+
XML	TITLE	0.38+

Is Data Mesh the Next Killer App for Supercloud?

(upbeat music) >> Welcome back to our Supercloud 2 event live coverage here of stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We got exclusive news and a scoop here for SiliconANGLE in theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called Nextdata.com, Nextdata. She's a cube alumni and contributor to our supercloud initiative, as well as our coverage and Breaking Analysis with Dave Vellante on data, the killer app for supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> Wonderful. Your contributions to the data conversation has been well documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing cold water on it. Some are thinking it's the next big thing. Tell us about the data mesh, super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, the pain point that it surface were universal. Everybody said, "Oh, why didn't I think of that?" It was just an obvious next step and people are approaching it, implementing it. I guess the last few years I've been involved in many of those implementations and I guess supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include organizational boundaries, cloud technology boundaries, and trust boundaries. >> I want to bring that up because your venture, Nextdata, which is new just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Absolutely. Yes, so Nextdata is the result of, I suppose the pains that I suffered from implementing data mesh for many of the organizations. Basically a lot of organizations that I've worked with they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data, and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find the, I guess the common denominator that solves those problems and enables that developer experience for data sharing. >> Since you just announced the news, what's been the reaction? >> I just announced the news right now, so what's the reaction? >> But people in the industry know you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth mode so we haven't publicly talked about it, but folks that have been close to us, in fact have reached that we already have implementations of our pilot platform with early customers, which is super exciting. And we going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those, but we are going to have multiple pilot implementations of our platform in real world where real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak, when I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients, helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally, obviously not surprising, they don't include the big vision of inclusivity across clouds, across different data storage. But it seems like people are having to go through some gymnastics to get to the organizational reality of decentralizing data and at least pushing data ownership to the line of business. How are you approaching, or are you approaching solving that problem? Are you taking a narrow slice? What can you tell us about Nextdata? >> Yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that the data as you know resides on different platforms, it's owned by different people, is processed by pipelines that who knows who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in these autonomous units. We call them data products, I guess in data mesh. That constitutes computation. That governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is data in different places, decentralization, and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation APIs to get to it in a technology agnostic way, in an open way. And then sit on top and use existing tech, Snowflake, Databricks, whatever exists, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here. The language and the modeling that we use is really native to data mesh, which is that I'm building a data product I'm sharing a data product, and that encapsulates I'm providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side, connected to peer-to-peer data sharing with data product as a primitive first class concept. >> So the idea would be developers would build applications leveraging those data products, which are discoverable and governed. Now today you see some companies, take a Snowflake for example, attempting to do that within their own little walled garden. They even at one point used the term mesh. I don't know if they pull back on that. And then they became aware of some of your work. But a lot of the things that they're doing within their little insulated environment support that governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realized that, and this is a reality, like you go to organizations, they have a Snowflake and half of the organization happily operates on Snowflake. And on the other half, "oh, we are on Bare infrastructure on AWS or we are on Databricks." This is the reality. This supercloud that's written up here, it's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use Nextdata, data mesh operating system. People will have different platforms." So you have to build with openness in mind and in case of Snowflake, I think, they have very, I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So it's worth reviewing that basically the concept of data mesh is that whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember I wrote a blog post in 2007 called "Data as the New Developer Kit" back then we used to call them developer kits if you remember. And that we said at that time, whoever can code data will have a competitive advantage. >> Aren't the machines going to be doing that? Didn't we just hear that? >> Well, we have. Hey, Siri. Hey, Cube, find me that best video for data mesh. There it is. But this is the point, like what's happening is that now data has to be addressable. for machines and for coding because as you need to call the data. So the question is how do you manage the complexity of big things as promiscuous as possible, making it available, as well as then governing it? Because it's a trade off. The more you make open, the better the machine learning. But yet the governance issue, so this is the, you need an OS to handle this maybe. >> Yes. So yes, well we call, our mental model for our platform is an OS operating system. Operating systems have shown us how you can abstract what's complex and take care of a lot of complexities, but yet provide an open and dynamic enough interface. So we think about it that way. Just, we try to solve the problem of policies live with the data, an enforcement of the policies happens at the most granular level, which is in this concept of the data product. And that would happen whether you read, write or access a data product. But we can never imagine what are these policies could be. So our thinking is we should have a policy, open policy framework that can allow organizations write their own policy drivers and policy definitions and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now the primitives that we work with to train machine learning model are still bits and bytes and data. They're fields, rows, columns and that creates quite a large surface area and attack area for privacy of the data. So perhaps one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area. So you can really leave the control of the data to the sovereign owners of that data. So that data product. So I think that evolution of our data APIs perhaps will become more and more computational. So you describe what you want and the data owner decides how to manage. >> That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment we had with you. It was a machine learning have been around the industry. It's almost as if you're starting to see reason come into, the data reasoning is like starting to see not just metadata. Using the data to reason so that you don't have to expose the raw data. So almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that? 'Cause that seems to be where the dots are connecting. >> Yes, perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge making mode. However, by just the basic notion of saying, "I'm going to put an API in front of my data." And that API today might be as primitive as a level of indirection, as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage and insert all of this intelligence that need to happen. And then today, I will still give you a file. But by just defining that API and standardizing it now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can evolve that. Now you have a point of evolution to this very futuristic, I guess, future where you just described the question that you're asking from the ChatGPT. >> Well, this is the supercloud, go ahead, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has and he wants your feedback on this, is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products? How do you respond to that? How do you see? Is that a problem? Is that something that is overstated or do you have an answer for that? >> Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not overburdening them with the complexity of the application and application logic and yet reducing their cognitive load by localizing what they need to know about, which is that domain where they're operating within. Because what's happening right now? What's happening right now is that data engineers with, a ton of empathy for them for their high threshold of pain that they can deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curate it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers. These are still separately moving units. Your app and your data products are independent, but yet tightly closed with each other, tightly coupled with each other based on the context of the domain. So reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application, but yet have them separate from app because app provides a very different service. Transactional data for my e-commerce transaction. Data product provides a very different service. Longitudinal data for the variety of this intelligent analysis that I can do on the data. But yet it's all within the domain of e-commerce or sales or whatnot. >> It's a lot of decoupling and coupling create that cohesiveness architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server data center days and cloud, SRE, Google coined the term, site reliability engineer, for someone to look over the hundreds of thousands of servers. We asked the question to data engineering community who have been suffering, by the way, I agree. Is there an SRE like role for data? Because in a way data engineering, that platform engineer, they are like the SRE for data. In other words managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Yes, exactly. So maybe we go through that history of how SRE came to be. So we had the first DevOps movement, which was remove the wall between dev and ops and bring them together. So you have one unit of one cross-functional units of the organization that's responsible for you build it, you run it. So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that and then we said, okay, there is a ton, as we decentralized and had these many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing, and running a lot while giving autonomy to this cross-functional team. And that's where the SRE, a new generation of engineers came to exist. So I think if I just look at. >> Hence, Kubernetes. >> Hence, hence, exactly. Hence, chaos engineering. Hence, embracing the complexity and messiness. And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain-oriented cross-functional teams full stop and still have a very advanced maybe at the platform level, infrastructure level operational team that they're not busy doing two jobs, which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> So you see similarities? >> I see, absolutely. But I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening, yet a little bit fast and loose with some complexities to clean up. >> Yes. This is a different restructure. As you said, the job of this industry as a whole, an architect, is decompose recompose, decompose recompose in new way and now we're like decomposing centralized team, recomposing them as domains. >> So is data mesh the killer app for supercloud? >> You had to do this to me. >> Sorry, I couldn't resist. >> I know. Of course you want me to say this. >> Yes. >> Yes, of course. I mean, supercloud, I think it's really, the terminology supercloud, open cloud, but I think in spirits of it this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> Well, thank you so much for coming on Supercloud 2. We really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. >> That's now going well. We can move faster. So thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Supercloud 2 live here in Palo Alto, our stage performance. I'm John Furrier with Dave Vellante. We'll back with more after this short break. Stay with us all day for Supercloud 2. (upbeat music)

Published Date : Jan 25 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few What's the pain point? for many of the organizations. But people in the industry know you did but folks that have been close to us, at least the ones that I've is that the data as you know But a lot of the things that they're doing and half of the organization that basically the concept of data mesh And that we said at that time, is that now data has to be addressable. and the data owner decides how to manage. the data reasoning is like starting to see 'Cause that seems to be where What's a logic that I need to go Well, this is the So the concern that he has into the domain closer to We asked the question to of the organization that's responsible So I think if we look at that evolution, in a more early days of that movement. So it's a data DevOps As you said, the job of Of course you want me to say this. assume the world would work. the conversation and exposed So thanks for coming on. Thank you for hosting me. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
George Gilbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Nextdata	ORGANIZATION	0.99+
Zhamak	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
one	QUANTITY	0.99+
Nextdata.com	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
today	DATE	0.99+
HelloFresh	ORGANIZATION	0.99+
ThoughtWorks	ORGANIZATION	0.99+
last year	DATE	0.99+
Supercloud 2	EVENT	0.99+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Siri	TITLE	0.98+
Cube	PERSON	0.98+
Databricks	ORGANIZATION	0.98+
Snowflake	ORGANIZATION	0.97+
Supercloud	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one unit	QUANTITY	0.97+
Snowflake	TITLE	0.96+
SRE	TITLE	0.95+
millions of dollars	QUANTITY	0.94+
first class	QUANTITY	0.94+
hundreds of thousands of servers	QUANTITY	0.92+
supercloud	ORGANIZATION	0.92+
one point	QUANTITY	0.92+
Supercloud 2	TITLE	0.89+
ChatGPT	ORGANIZATION	0.81+
half	QUANTITY	0.81+
Data Mesh the Next Killer App	TITLE	0.78+
supercloud	TITLE	0.75+
a ton	QUANTITY	0.73+
Supercloud 2	ORGANIZATION	0.72+
SiliconANGLE	ORGANIZATION	0.7+
DevOps	TITLE	0.66+
Snowflake	EVENT	0.59+
S3	TITLE	0.54+
last	DATE	0.54+
supercloud	EVENT	0.48+
Kubernetes	TITLE	0.47+

Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps

>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)

Published Date : Jan 14 2023

SUMMARY :

with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal

ENTITIES

Entity	Category	Confidence
Bob Muglia	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
David Flynn	PERSON	0.99+
Veronica	PERSON	0.99+
Jack	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Nick Taylor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Veronica Durgin	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Warner Media	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ionis Pharmaceutical	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Bob Muglia	PERSON	0.99+
David Flore	PERSON	0.99+
DBT Labs	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Palo Alto	LOCATION	0.99+
21 sessions	QUANTITY	0.99+
Darren Bramberm	PERSON	0.99+
33 guests	QUANTITY	0.99+
Nir Zuk	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Harveer Singh	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Supercloud 2	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
last year	DATE	0.99+
Western Union	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
200 locations	QUANTITY	0.99+
August	DATE	0.99+
Keith Townsend	PERSON	0.99+
Data Mesh	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
next week	DATE	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
second	QUANTITY	0.99+
first point	QUANTITY	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
VMware	ORGANIZATION	0.98+
Silicon Angle	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Eric Bradley	PERSON	0.98+
two	QUANTITY	0.98+
today	DATE	0.98+
Sachs	ORGANIZATION	0.98+
SAKS	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
last August	DATE	0.98+
each week	QUANTITY	0.98+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Harveer Singh, Western Union | Western Union When Data Moves Money Moves

(upbeat music) >> Welcome back to Supercloud 2, which is an open industry collaboration between technologists, consultants, analysts, and of course, practitioners, to help shape the future of cloud. And at this event, one of the key areas we're exploring is the intersection of cloud and data, and how building value on top of hyperscale clouds and across clouds is evolving, a concept we call supercloud. And we're pleased to welcome Harvir Singh, who's the chief data architect and global head of data at Western Union. Harvir, it's good to see you again. Thanks for coming on the program. >> Thanks, David, it's always a pleasure to talk to you. >> So many things stand out from when we first met, and one of the most gripping for me was when you said to me, "When data moves, money moves." And that's the world we live in today, and really have for a long time. Money has moved as bits, and when it has to move, we want it to move quickly, securely, and in a governed manner. And the pressure to do so is only growing. So tell us how that trend is evolved over the past decade in the context of your industry generally, and Western Union, specifically. Look, I always say to people that we are probably the first ones to introduce digital currency around the world because, hey, somebody around the world needs money, we move data to make that happen. That trend has actually accelerated quite a bit. If you look at the last 10 years, and you look at all these payment companies, digital companies, credit card companies that have evolved, majority of them are working on the same principle. When data moves, money moves. When data is stale, the money goes away, right? I think that trend is continuing, and it's not just the trend is in this space, it's also continuing in other spaces, specifically around, you know, acquisition of customers, communication with customers. It's all becoming digital, and it's, at the end of the day, it's all data being moved from one place or another. At the end of the day, you're not seeing the customer, but you're looking at, you know, the data that he's consuming, and you're making actionable items on it, and be able to respond to what they need. So I think 10 years, it's really, really evolved. >> Hmm, you operate, Western Union operates in more than 200 countries, and you you have what I would call a pseudo federated organization. You're trying to standardize wherever possible on the infrastructure, and you're curating the tooling and doing the heavy lifting in the data stack, which of course lessens the burden on the developers and the line of business consumers, so my question is, in operating in 200 countries, how do you deal with all the diversity of laws and regulations across those regions? I know you're heavily involved in AWS, but AWS isn't everywhere, you still have some on-prem infrastructure. Can you paint a picture of, you know, what that looks like? >> Yeah, a few years ago , we were primarily mostly on-prem, and one of the biggest pain points has been managing that infrastructure around the world in those countries. Yes, we operate in 200 countries, but we don't have infrastructure in 200 countries, but we do have agent locations in 200 countries. United Nations says we only have like 183 are countries, but there are countries which, you know, declare themselves countries, and we are there as well because somebody wants to send money there, right? Somebody has an agent location down there as well. So that infrastructure is obviously very hard to manage and maintain. We have to comply by numerous laws, you know. And the last few years, specifically with GDPR, CCPA, data localization laws in different countries, it's been a challenge, right? And one of the things that we did a few years ago, we decided that we want to be in the business of helping our customers move money faster, security, and with complete trust in us. We don't want to be able to, we don't want to be in the business of managing infrastructure. And that's one of the reasons we started to, you know, migrate and move our journey to the cloud. AWS, obviously chosen first because of its, you know, first in the game, has more locations, and more data centers around the world where we operate. But we still have, you know, existing infrastructure, which is in some countries, which is still localized because AWS hasn't reached there, or we don't have a comparable provider there. We still manage those. And we have to comply by those laws. Our data privacy and our data localization tech stack is pretty good, I would say. We manage our data very well, we manage our customer data very well, but it comes with a lot of complexity. You know, we get a lot of requests from European Union, we get a lot of requests from Asia Pacific every pretty much on a weekly basis to explain, you know, how we are taking controls and putting measures in place to make sure that the data is secured and is in the right place. So it's a complex environment. We do have exposure to other clouds as well, like Google and Azure. And as much as we would love to be completely, you know, very, very hybrid kind of an organization, it's still at a stage where we are still very heavily focused on AWS yet, but at some point, you know, we would love to see a world which is not reliant on a single provider, but it's more a little bit more democratized, you know, as and when what I want to use, I should be able to use, and pay-per-use. And the concept started like that, but it's obviously it's now, again, there are like three big players in the market, and, you know, they're doing their own thing. Would love to see them come collaborate at some point. >> Yeah, wouldn't we all. I want to double-click on the whole multi-cloud strategy, but if I understand it correctly, and in a perfect world, everything on-premises would be in the cloud is, first of all, is that a correct statement? Is that nirvana for you or not necessarily? >> I would say it is nirvana for us, but I would also put a caveat, is it's very tricky because from a regulatory perspective, we are a regulated entity in many countries. The regulators would want to see some control if something happens with a relationship with AWS in one country, or with Google in another country, and it keeps happening, right? For example, Russia was a good example where we had to switch things off. We should be able to do that. But if let's say somewhere in Asia, this country decides that they don't want to partner with AWS, and majority of our stuff is on AWS, where do I go from there? So we have to have some level of confidence in our own infrastructure, so we do maintain some to be able to fail back into and move things it needs to be. So it's a tricky question. Yes, it's nirvana state that I don't have to manage infrastructure, but I think it's far less practical than it said. We will still own something that we call it our own where we have complete control, being a financial entity. >> And so do you try to, I'm sure you do, standardize between all the different on-premise, and in this case, the AWS cloud or maybe even other clouds. How do you do that? Do you work with, you know, different vendors at the various places of the stack to try to do that? Some of the vendors, you know, like a Snowflake is only in the cloud. You know, others, you know, whether it's whatever, analytics, or storage, or database, might be hybrid. What's your strategy with regard to creating as common an experience as possible between your on-prem and your clouds? >> You asked a question which I asked when I joined as well, right? Which question, this is one of the most important questions is how soon when I fail back, if I need to fail back? And how quickly can I, because not everything that is sitting on the cloud is comparable to on-prem or is backward compatible. And the reason I say backward compatible is, you know, there are, our on-prem cloud is obviously behind. We haven't taken enough time to kind of put it to a state where, because we started to migrate and now we have access to infrastructure on the cloud, most of the new things are being built there. But for critical application, I would say we have chronology that could be used to move back if need to be. So, you know, technologies like Couchbase, technologies like PostgreSQL, technologies like Db2, et cetera. We still have and maintain a fairly large portion of it on-prem where critical applications could potentially be serviced. We'll give you one example. We use Neo4j very heavily for our AML use cases. And that's an important one because if Neo4j on the cloud goes down, and it's happened in the past, again, even with three clusters, having all three clusters going down with a DR, we still need some accessibility of that because that's one of the biggest, you know, fraud and risk application it supports. So we do still maintain some comparable technology. Snowflake is an odd one. It's obviously there is none on-prem. But then, you know, Snowflake, I also feel it's more analytical based technology, not a transactional-based technology, at least in our ecosystem. So for me to replicate that, yes, it'll probably take time, but I can live with that. But my business will not stop because our transactional applications can potentially move over if need to. >> Yeah, and of course, you know, all these big market cap companies, so the Snowflake or Databricks, which is not public yet, but they've got big aspirations. And so, you know, we've seen things like Snowflake do a deal with Dell for on-prem object store. I think they do the same thing with Pure. And so over time, you see, Mongo, you know, extending its estate. And so over time all these things are coming together. I want to step out of this conversation for a second. I just ask you, given the current macroeconomic climate, what are the priorities? You know, obviously, people are, CIOs are tapping the breaks on spending, we've reported on that, but what is it? Is it security? Is it analytics? Is it modernization of the on-prem stack, which you were saying a little bit behind. Where are the priorities today given the economic headwinds? >> So the most important priority right now is growing the business, I would say. It's a different, I know this is more, this is not a very techy or a tech answer that, you know, you would expect, but it's growing the business. We want to acquire more customers and be able to service them as best needed. So the majority of our investment is going in the space where tech can support that initiative. During our earnings call, we released the new pillars of our organization where we will focus on, you know, omnichannel digital experience, and then one experience for customer, whether it's retail, whether it's digital. We want to open up our own experience stores, et cetera. So we are investing in technology where it's going to support those pillars. But the spend is in a way that we are obviously taking away from the things that do not support those. So it's, I would say it's flat for us. We are not like in heavily investing or aggressively increasing our tech budget, but it's more like, hey, switch this off because it doesn't make us money, but now switch this on because this is going to support what we can do with money, right? So that's kind of where we are heading towards. So it's not not driven by technology, but it's driven by business and how it supports our customers and our ability to compete in the market. >> You know, I think Harvir, that's consistent with what we heard in some other work that we've done, our ETR partner who does these types of surveys. We're hearing the same thing, is that, you know, we might not be spending on modernizing our on-prem stack. Yeah, we want to get to the cloud at some point and modernize that. But if it supports revenue, you know, we'll invest in that, and get the, you know, instant ROI. I want to ask you about, you know, this concept of supercloud, this abstracted layer of value on top of hyperscale infrastructure, and maybe on-prem. But we were talking about the integration, for instance, between Snowflake and Salesforce, where you got different data sources and you were explaining that you had great interest in being able to, you know, have a kind of, I'll say seamless, sorry, I know it's an overused word, but integration between the data sources and those two different platforms. Can you explain that and why that's attractive to you? >> Yeah, I'm a big supporter of action where the data is, right? Because the minute you start to move, things are already lost in translation. The time is lost, you can't get to it fast enough. So if, for example, for us, Snowflake, Salesforce, is our actionable platform where we action, we send marketing campaigns, we send customer communication via SMS, in app, as well as via email. Now, we would like to be able to interact with our customers pretty much on a, I would say near real time, but the concept of real time doesn't work well with me because I always feel that if you're observing something, it's not real time, it's already happened. But how soon can I react? That's the question. And given that I have to move that data all the way from our, let's say, engagement platforms like Adobe, and particles of the world into Snowflake first, and then do my modeling in some way, and be able to then put it back into Salesforce, it takes time. Yes, you know, I can do it in a few hours, but that few hours makes a lot of difference. Somebody sitting on my website, you know, couldn't find something, walked away, how soon do you think he will lose interest? Three hours, four hours, he'll probably gone, he will never come back. I think if I can react to that as fast as possible without too much data movement, I think that's a lot of good benefit that this kind of integration will bring. Yes, I can potentially take data directly into Salesforce, but I then now have two copies of data, which is, again, something that I'm not a big (indistinct) of. Let's keep the source of the data simple, clean, and a single source. I think this kind of integration will help a lot if the actions can be brought very close to where the data resides. >> Thank you for that. And so, you know, it's funny, we sometimes try to define real time as before you lose the customer, so that's kind of real time. But I want to come back to this idea of governed data sharing. You mentioned some other clouds, a little bit of Azure, a little bit of Google. In a world where, let's say you go more aggressively, and we know that for instance, if you want to use Google's AI tools, you got to use BigQuery. You know, today, anyway, they're not sort of so friendly with Snowflake, maybe different for the AWS, maybe Microsoft's going to be different as well. But in an ideal world, what I'm hearing is you want to keep the data in place. You don't want to move the data. Moving data is expensive, making copies is badness. It's expensive, and it's also, you know, changes the state, right? So you got governance issues. So this idea of supercloud is that you can leave the data in place and actually have a common experience across clouds. Let's just say, let's assume for a minute Google kind of wakes up, my words, not yours, and says, "Hey, maybe, you know what, partnering with a Snowflake or a Databricks is better for our business. It's better for the customers," how would that affect your business and the value that you can bring to your customers? >> Again, I would say that would be the nirvana state that, you know, we want to get to. Because I would say not everyone's perfect. They have great engineers and great products that they're developing, but that's where they compete as well, right? I would like to use the best of breed as much as possible. And I've been a person who has done this in the past as well. I've used, you know, tools to integrate. And the reason why this integration has worked is primarily because sometimes you do pick the best thing for that job. And Google's AI products are definitely doing really well, but, you know, that accessibility, if it's a problem, then I really can't depend on them, right? I would love to move some of that down there, but they have to make it possible for us. Azure is doing really, really good at investing, so I think they're a little bit more and more closer to getting to that state, and I know seeking our attention than Google at this point of time. But I think there will be a revelation moment because more and more people that I talk to like myself, they're also talking about the same thing. I'd like to be able to use Google's AdSense, I would like to be able to use Google's advertising platform, but you know what? I already have all this data, why do I need to move it? Can't they just go and access it? That question will keep haunting them (indistinct). >> You know, I think, obviously, Microsoft has always known, you know, understood ecosystems. I mean, AWS is nailing it, when you go to re:Invent, it's all about the ecosystem. And they think they realized they can make a lot more money, you know, together, than trying to have, and Google's got to figure that out. I think Google thinks, "All right, hey, we got to have the best tech." And that tech, they do have the great tech, and that's our competitive advantage. They got to wake up to the ecosystem and what's happening in the field and the go-to-market. I want to ask you about how you see data and cloud evolving in the future. You mentioned that things that are driving revenue are the priorities, and maybe you're already doing this today, but my question is, do you see a day when companies like yours are increasingly offering data and software services? You've been around for a long time as a company, you've got, you know, first party data, you've got proprietary knowledge, and maybe tooling that you've developed, and you're becoming more, you're already a technology company. Do you see someday pointing that at customers, or again, maybe you're doing it already, or is that not practical in your view? >> So data monetization has always been on the charts. The reason why it hasn't seen the light is regulatory pressure at this point of time. We are partnering up with certain agencies, again, you know, some pilots are happening to see the value of that and be able to offer that. But I think, you know, eventually, we'll get to a state where our, because we are trying to build accessible financial services, we will be in a state that we will be offering those to partners, which could then extended to their customers as well. So we are definitely exploring that. We are definitely exploring how to enrich our data with other data, and be able to complete a super set of data that can be used. Because frankly speaking, the data that we have is very interesting. We have trends of people migrating, we have trends of people migrating within the US, right? So if a new, let's say there's a new, like, I'll give you an example. Let's say New York City, I can tell you, at any given point of time, with my data, what is, you know, a dominant population in that area from migrant perspective. And if I see a change in that data, I can tell you where that is moving towards. I think it's going to be very interesting. We're a little bit, obviously, sometimes, you know, you're scared of sharing too much detail because there's too much data. So, but at the end of the day, I think at some point, we'll get to a state where we are confident that the data can be used for good. One simple example is, you know, pharmacies. They would love to get, you know, we've been talking to CVS and we are talking to Walgreens, and trying to figure out, if they would get access to this kind of data demographic information, what could they do be better? Because, you know, from a gene pool perspective, there are diseases and stuff that are very prevalent in one community versus the other. We could probably equip them with this information to be able to better, you know, let's say, staff their pharmacies or keep better inventory of products that could be used for the population in that area. Similarly, the likes of Walmarts and Krogers, they would like to have more, let's say, ethnic products in their aisles, right? How do you enable that? That data is primarily, I think we are the biggest source of that data. So we do take pride in it, but you know, with caution, we are obviously exploring that as well. >> My last question for you, Harvir, is I'm going to ask you to do a thought exercise. So in that vein, that whole monetization piece, imagine that now, Harvir, you are running a P&L that is going to monetize that data. And my question to you is a there's a business vector and a technology vector. So from a business standpoint, the more distribution channels you have, the better. So running on AWS cloud, partnering with Microsoft, partnering with Google, going to market with them, going to give you more revenue. Okay, so there's a motivation for multi-cloud or supercloud. That's indisputable. But from a technical standpoint, is there an advantage to running on multiple clouds or is that a disadvantage for you? >> It's, I would say it's a disadvantage because if my data is distributed, I have to combine it at some place. So the very first step that we had taken was obviously we brought in Snowflake. The reason, we wanted our analytical data and we want our historical data in the same place. So we are already there and ready to share. And we are actually participating in the data share, but in a private setting at the moment. So we are technically enabled to share, unless there is a significant, I would say, upside to moving that data to another cloud. I don't see any reason because I can enable anyone to come and get it from Snowflake. It's already enabled for us. >> Yeah, or if somehow, magically, several years down the road, some standard developed so you don't have to move the data. Maybe there's a new, Mogli is talking about a new data architecture, and, you know, that's probably years away, but, Harvir, you're an awesome guest. I love having you on, and really appreciate you participating in the program. >> I appreciate it. Thank you, and good luck (indistinct) >> Ah, thank you very much. This is Dave Vellante for John Furrier and the entire Cube community. Keep it right there for more great coverage from Supercloud 2. (uplifting music)

Published Date : Jan 6 2023

SUMMARY :

Harvir, it's good to see you again. a pleasure to talk to you. And the pressure to do so is only growing. and you you have what I would call But we still have, you know, you or not necessarily? that I don't have to Some of the vendors, you and it's happened in the past, And so, you know, we've and our ability to compete in the market. and get the, you know, instant ROI. Because the minute you start to move, and the value that you can that, you know, we want to get to. and cloud evolving in the future. But I think, you know, And my question to you So the very first step that we had taken and really appreciate you I appreciate it. Ah, thank you very much.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Walmarts	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Walgreens	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Harvir	PERSON	0.99+
Three hours	QUANTITY	0.99+
four hours	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
United Nations	ORGANIZATION	0.99+
Krogers	ORGANIZATION	0.99+
US	LOCATION	0.99+
one	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
Western Union	ORGANIZATION	0.99+
Harvir Singh	PERSON	0.99+
10 years	QUANTITY	0.99+
two copies	QUANTITY	0.99+
one country	QUANTITY	0.99+
183	QUANTITY	0.99+
European Union	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
three big players	QUANTITY	0.99+
first step	QUANTITY	0.99+
Snowflake	TITLE	0.98+
AdSense	TITLE	0.98+
more than 200 countries	QUANTITY	0.98+
today	DATE	0.98+
three clusters	QUANTITY	0.98+
Snowflake	ORGANIZATION	0.98+
Mogli	PERSON	0.98+
John Furrier	PERSON	0.98+
supercloud	ORGANIZATION	0.98+
one example	QUANTITY	0.97+
GDPR	TITLE	0.97+
Adobe	ORGANIZATION	0.97+
Salesforce	ORGANIZATION	0.97+
200 countries	QUANTITY	0.97+
one experience	QUANTITY	0.96+
Harveer Singh	PERSON	0.96+
one community	QUANTITY	0.96+
Pure	ORGANIZATION	0.95+
One simple example	QUANTITY	0.95+
two different platforms	QUANTITY	0.95+
Salesforce	TITLE	0.94+
first	QUANTITY	0.94+
Cube	ORGANIZATION	0.94+
BigQuery	TITLE	0.94+
nirvana	LOCATION	0.93+
single source	QUANTITY	0.93+
Asia Pacific	LOCATION	0.93+
first ones	QUANTITY	0.92+

Bob Muglia, George Gilbert & Tristan Handy | How Supercloud will Support a new Class of Data Apps

(upbeat music) >> Hello, everybody. This is Dave Vellante. Welcome back to Supercloud2, where we're exploring the intersection of data analytics and the future of cloud. In this segment, we're going to look at how the Supercloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications, they're not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend, George Gilbert, former Gartner Analyst, Wiki Bond market analyst, former equities analyst as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs and Bob Muglia, who's the former President of Microsoft's Enterprise business and former CEO of Snowflake. Welcome all, gentlemen. Thank you for coming on the program. >> Good to be here. >> Thanks for having us. >> Hey, look, I'm going to start actually with the SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input, some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? >> Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think it can be tightened a bit, and I think it's helpful to do it in as short a way as possible. And so in the last day I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. This is one day old, so gimme a break if it's going to change. And of course we have to follow the industry, and so that, and whatever the industry decides, but let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. >> Boom. Nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that or what are your thoughts on the whole SuperCloud concept? >> So as I read through this, I fully realize that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the Confluence, Snowflake, you know, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this, and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out Snowflake for being an example of this, but I think that there are like many folks, myself included, for whom this world like fully exists today. >> Yeah, I think that's a fair, I dunno if it's criticism, but people observe, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Burns Lee saying, well, this is what the internet was supposed to be. It was supposed to be Web 2.0, so maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you? When people talk about data products, is that what they mean? Are we talking about something more, different? What are data apps to you? >> Well, to understand data apps, it's useful to contrast them to something, and I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytic services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. >> Got it. You know, George, you and I were talking about, you know, it comes back to SuperCloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? >> Oh, geez. This is like a conversation that I don't know has an end. It's like been, I write a substack, and there's like this little community of people who all write substack. We argue with each other about these kinds of things. Like, you know, as many different takes on this question as you can find, but the way that I think about it is that data products are atomic units of functionality that are fundamentally data driven in nature. So a data product can be as simple as an interactive dashboard that is like actually had design thinking put into it and serves a particular user group and has like actually gone through kind of a product development life cycle. And then a data app or data application is a kind of cohesive end-to-end experience that often encompasses like many different data products. So from my perspective there, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in, you know, software engineering for, you know, as long as there have been computers. >> You know, Jamak Dagani oftentimes uses the, you know, she doesn't name Spotify, but I think it's Spotify as that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take, like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data, and then you got this system that kind of orchestrates the business process, like runs a forecast. But in this data driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain and then actually building a plan? George, is that how you see it? >> I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or driver. But the app then looks at the location of all the drivers in proximity, and it matches a driver to a rider. It calculates an ETA to the rider. It calculates an ETA then to the destination, and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to, you know, calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that, how does that take us down that path? >> So what we've seen in the past, I dunno, decade or so, is that one of the most successful business models in infrastructure is taking hard things and rolling 'em up behind APIs. You take messaging, you take payments, and you all of a sudden increase the capability of kind of your median application developer. And you say, you know, previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments, and now you can focus on your business logic, and just create the thing. One of, interestingly, one of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse, inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily, but in fact that's not the case. It's actually quite challenging to, and involves a lot of data engineering pipeline and all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes, and they don't need to. >> So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that, that's a well scoped sort of API that people can work within. How much can you extend that to say non-calculated business rules or governance information like data reliability rules, things like that, or even, you know, features for an AIML feature store. In other words, it starts, you started pragmatically, but how far can you grow? >> Bob is waiting with bated breath to answer this question. I'm, just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of a thing, get it out there, and see if people use it. But the idea that, the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. And what we're trying to do here is make it very easy to access the logical manifestation of the data, and metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that. And I'm sure that there will be more kind of logical structures as well. >> So, Bob, chime in on this. You know, what's your thoughts on the right architecture behind this, and how do we get there? >> Yeah, well first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never going to go away. It's very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're going to move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing, and you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step. And learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits and any form of computers. But what we do is what I would call implicit modeling, which means that it's the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happen and during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing. A number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs. You know, some of these incredible machine learning programs in foundation models like GPT and DALL-E and all of the things that are happening in these global scale models, but also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that, that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. >> Oh, oh wow. There's a lot to unpack there. So let me ask the Colombo question, Tristan, we've been making fun of your youth. We're just, we're just jealous. Colombo, I'll explain it offline maybe. >> I watch Colombo. >> Okay. All right, good. So but today if you think about the application stack and the data stack, which is largely an analytics pipeline. They're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack cause the data stack today is so, you know, hard to manage. But but do those worlds have to come together? And you know, through that model, I guess abstraction or translation that Bob was just describing, how do you guys think about that? Who wants to take that? >> I think it's inevitable that data and AI are going to become closer together? I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse portmanteau or not. There's also, there's a next generation of data tech that is still in the like early stage of being developed. There's a company that I love that is essentially Cross Cloud Lambda, and it's just a wonderful abstraction for computing. So I think that, you know, people have been predicting that these worlds are going to come together for awhile. A16Z wrote a great post on this back in I think 2020, predicting this, and I've been predicting this since since 2020. But what's not clear is the timeline, but I think that this is still just as inevitable as it's been. >> Who's that that does Cross Cloud? >> Let me follow up on. >> Who's that, Tristan, that does Cross Cloud Lambda? Can you name names? >> Oh, they're called Modal Labs. >> Modal Labs, yeah, of course. All right, go ahead, George. >> Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data centric, where data's not locked inside or behind the APIs of different applications so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics you're going to add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time then why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative? Or where, what type of limitations would you face? >> Well, I think that it is the way the world will evolve is to start with the modern data stack and, you know, which is operational applications going through a data pipeline into some form of data lake, data warehouse, the Lakehouse, whatever you want to call it. And then, you know, this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is of course building their data warehouse. Certainly Google, Microsoft and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not going to go away because that infrastructure is relatively new. It's just being deployed by companies, and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you need to, if you want to model all, if you want to talk about all the semantics of even data, you need to think about all of the relationships. You need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, you know, a large financial organization that is building out these semantic layers. They're further along than many companies are. And you know, I asked what they're building it on, and you know, it's not surprising they're using a, they're using combinations of some form of search together with, you know, textual based search together with a document oriented database. In this case it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They don't really, they can't manage the complicated relationships that are required. They can't issue the queries that are required. And so a new generation of database needs to be developed. And fortunately, you know, that is happening. The world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a big query, you know, you get tens of different joins coming together, and that query is going to take a really long time. Well, fortunately, technology is evolving, and it's possible with new join algorithms, worst case, optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important, and I think that will be a requirement to have this semantically reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run I think we'll see a new set of databases to support these models. >> So Tristan, you got to respond to that, right? You got to, so take the example of Snowflake. We know it doesn't deal well with complex joins, but they're, they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem, and others, but please, your thoughts on what Bob just shared. >> Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? >> No, I can talk about it. I mean, we can talk about it. Look, the company I've been working with is Relational AI, and they're doing this work to actually first of all work across the industry with academics and research, you know, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published, just like SQL, the underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that. And those algorithms actually go back to system R and what IBM developed in the 1970s. We're just, there's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, you know, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what, to Codd, and what he wrote, he defined two things. He defined a relational calculus and relational algebra. And essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics. And it's possible to have systems that can do everything and that can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. >> So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic layer using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry, if not more. But the wonderful thing about DBT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine grain detail, more procedural, then we're going to support that too. And so I'm excited about all of it. >> Yeah, so interpreting that short answer, you're basically saying, cause Bob was just kind of pointing to you as incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? >> I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to like use different workflows to actually make use of that technology. So yeah, yeah, we are a very incremental company. We're going to continue being that way. >> Well, I think Bob was using incremental as a pejorative. I mean, I, but to your point, a lot. >> No, I don't think so. I want to stop that. No, I don't think it's pejorative at all. I think incremental, incremental is usually the most successful path. >> Yes, of course. >> In my experience. >> We agree, we agree on that. >> Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's going to go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now you can argue whether it's two years, three years, five years, or 10 years, but I'd be shocked if it didn't happen in 10 years. >> Yeah, so we all agree that incremental is less disruptive. Boom, but Tristan, you're, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision, and then Bob, and I'm inferring from Bob's comments that maybe you don't think that's the case, but please. >> No, no, no. I think that, so Bob, let me put words into your mouth and you tell me if you disagree, DBT is completely useless in a world where a large scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large scale computing platforms. And to the extent that those platforms extend their functionality to bring more capabilities, we will also service those capabilities. >> Let me try and bridge the two. >> Yeah, yeah, so Bob, Bob, Bob, do you concur with what Tristan just said? >> Absolutely, I mean there's nothing to argue with in what Tristan just said. >> I wanted. >> And it's what he's doing. It'll continue to, I believe he'll continue to do it, and I think it's a very good thing for the industry. You know, I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about Cosmos with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay. That's the purpose of it. It becomes a poor man's join engine. And I kind of go, I know there's a better answer than that. I know there is, but that's kind of where we are state of the art right now. >> George, we got to wrap it. So give us the last word here. Go ahead, George. >> Okay, I just, I think there's a way to tie together what Tristan and Bob are both talking about, and I want them to validate it, which is for five years we're going to be adding or some number of years more and more semantics to the operational and analytic data that we have, starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by not having, by having the knowledge graph store sort of the joins, all the complex relationships among the data, but having the semantics in the DBT layer? >> Well, I think this, okay, I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they want to ask questions, they want to use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored and structured, or at least metadata is stored and structured. Again, I'm not saying that anytime in the next 10 years, SQL is going to go away. In fact, more SQL will be written in the future than has been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean that's what they are today. They're incredibly powerful at working with large amounts of data, and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. >> But Tristan, bring us home here. It sounds like, let me put pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a supercloud, sort of where data lives irrespective of physical location. We're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. >> 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers, that we can join data across different physical regions and that we as practitioners are going to kind of start forgetting about details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh I think really starts to get very, very critical as a concept. >> Guys, great conversation. It was really awesome to have you. I can't thank you enough for spending time with us. Really appreciate it. >> Thanks a lot. >> All right. This is Dave Vellante for George Gilbert, John Furrier, and the entire Cube community. Keep it right there for more content. You're watching SuperCloud2. (upbeat music)

Published Date : Jan 4 2023

SUMMARY :

and the future of cloud. And Bob, you have some really and I think it's helpful to do it I'm going to go back and And I noticed that you is that what they mean? that we're familiar with, you know, it comes back to SuperCloud, is that data products are George, is that how you see it? that don't require a human to is that one of the most And DBT has, you know, the And I'm sure that there will be more on the right architecture is that in the next 10 years, So let me ask the Colombo and the data stack, which is that is still in the like Modal Labs, yeah, of course. and that sits above the and that query is going to So Tristan, you got to and that can store all of the that there is to do for the pointing to you as incremental, but allowing practitioners to I mean, I, but to your point, a lot. the most successful path. that that's the way the that you believe you have the architecture and you tell me if you disagree, there's nothing to argue with And I kind of go, I know there's George, we got to wrap it. and more of those semantics and the semantics that that requires. is that whatever happens in the future, and that we as practitioners I can't thank you enough John Furrier, and the

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
Steve Mullaney	PERSON	0.99+
Katie	PERSON	0.99+
David Floyer	PERSON	0.99+
Charles	PERSON	0.99+
Mike Dooley	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris	PERSON	0.99+
Tristan Handy	PERSON	0.99+
Bob	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Wolf	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Merim	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Rossi	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Chris Wegmann	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Eric	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Caterpillar	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Josh	PERSON	0.99+
Europe	LOCATION	0.99+
Jerome	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lori MacVittie	PERSON	0.99+
2007	DATE	0.99+
Seattle	LOCATION	0.99+
10	QUANTITY	0.99+
five	QUANTITY	0.99+
Ali Ghodsi	PERSON	0.99+
Peter McKee	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
India	LOCATION	0.99+
Mike	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Tanuja Randery	PERSON	0.99+

Jesse Cugliotta & Nicholas Taylor | The Future of Cloud & Data in Healthcare

(upbeat music) >> Welcome back to Supercloud 2. This is Dave Vellante. We're here exploring the intersection of data and analytics in the future of cloud and data. In this segment, we're going to look deeper into the life sciences business with Jesse Cugliotta, who leads the Healthcare and Life Sciences industry practice at Snowflake. And Nicholas Nick Taylor, who's the executive director of Informatics at Ionis Pharmaceuticals. Gentlemen, thanks for coming in theCUBE and participating in the program. Really appreciate it. >> Thank you for having us- >> Thanks for having me. >> You're very welcome, okay, we're go really try to look at data sharing as a use case and try to understand what's happening in the healthcare industry generally and specifically, how Nick thinks about sharing data in a governed fashion whether tapping the capabilities of multiple clouds is advantageous long term or presents more challenges than the effort is worth. And to start, Jesse, you lead this industry practice for Snowflake and it's a challenging and vibrant area. It's one that's hyper-focused on data privacy. So the first question is, you know there was a time when healthcare and other regulated industries wouldn't go near the cloud. What are you seeing today in the industry around cloud adoption and specifically multi-cloud adoption? >> Yeah, for years I've heard that healthcare and life sciences has been cloud diverse, but in spite of all of that if you look at a lot of aspects of this industry today, they've been running in the cloud for over 10 years now. Particularly when you look at CRM technologies or HR or HCM, even clinical technologies like EDC or ETMF. And it's interesting that you mentioned multi-cloud as well because this has always been an underlying reality especially within life sciences. This industry grows through acquisition where companies are looking to boost their future development pipeline either by buying up smaller biotechs, they may have like a late or a mid-stage promising candidate. And what typically happens is the larger pharma could then use their commercial muscle and their regulatory experience to move it to approvals and into the market. And I think the last few decades of cheap capital certainly accelerated that trend over the last couple of years. But this typically means that these new combined institutions may have technologies that are running on multiple clouds or multiple cloud strategies in various different regions to your point. And what we've often found is that they're not planning to standardize everything onto a single cloud provider. They're often looking for technologies that embrace this multi-cloud approach and work seamlessly across them. And I think this is a big reason why we, here at Snowflake, we've seen such strong momentum and growth across this industry because healthcare and life science has actually been one of our fastest growing sectors over the last couple of years. And a big part of that is in fact that we run on not only all three major cloud providers, but individual accounts within each and any one of them, they had the ability to communicate and interoperate with one another, like a globally interconnected database. >> Great, thank you for that setup. And so Nick, tell us more about your role and Ionis Pharma please. >> Sure. So I've been at Ionis for around five years now. You know, when when I joined it was, the IT department was pretty small. There wasn't a lot of warehousing, there wasn't a lot of kind of big data there. We saw an opportunity with Snowflake pretty early on as a provider that would be a lot of benefit for us, you know, 'cause we're small, wanted something that was fairly hands off. You know, I remember the days where you had to get a lot of DBAs in to fine tune your databases, make sure everything was running really, really well. The notion that there's, you know, no indexes to tune, right? There's very few knobs and dials, you can turn on Snowflake. That was appealing that, you know, it just kind of worked. So we found a use case to bring the platform in. We basically used it as a logging replacement as a Splunk kind of replacement with a platform called Elysium Analytics as a way to just get it in the door and give us the opportunity to solve a real world use case, but also to help us start to experiment using Snowflake as a platform. It took us a while to A, get the funding to bring it in, but B, build the momentum behind it. But, you know, as we experimented we added more data in there, we ran a few more experiments, we piloted in few more applications, we really saw the power of the platform and now, we are becoming a commercial organization. And with that comes a lot of major datasets. And so, you know, we really see Snowflake as being a very important part of our ecology going forward to help us build out our infrastructure. >> Okay, and you are running, your group runs on Azure, it's kind of mono cloud, single cloud, but others within Ionis are using other clouds, but you're not currently, you know, collaborating in terms of data sharing. And I wonder if you could talk about how your data needs have evolved over the past decade. I know you came from another highly regulated industry in financial services. So what's changed? You sort of touched on this before, you had these, you know, very specialized individuals who were, you know, DBAs, and, you know, could tune databases and the like, so that's evolved, but how has generally your needs evolved? Just kind of make an observation over the last, you know, five or seven years. What have you seen? >> Well, we, I wasn't in a group that did a lot of warehousing. It was more like online trade capture, but, you know, it was very much on-prem. You know, being in the cloud is very much a dirty word back then. I know that's changed since I've left. But in, you know, we had major, major teams of everyone who could do everything, right. As I mentioned in the pharma organization, there's a lot fewer of us. So the data needs there are very different, right? It's, we have a lot of SaaS applications. One of the difficulties with bringing a lot of SaaS applications on board is obviously data integration. So making sure the data is the same between them. But one of the big problems is joining the data across those SaaS applications. So one of the benefits, one of the things that we use Snowflake for is to basically take data out of these SaaS applications and load them into a warehouse so we can do those joins. So we use technologies like Boomi, we use technologies like Fivetran, like DBT to bring this data all into one place and start to kind of join that basically, allow us to do, run experiments, do analysis, basically take better, find better use for our data that was siloed in the past. You mentioned- >> Yeah. And just to add on to Nick's point there. >> Go ahead. >> That's actually something very common that we're seeing across the industry is because a lot of these SaaS applications that you mentioned, Nick, they're with from vendors that are trying to build their own ecosystem in walled garden. And by definition, many of them do not want to integrate with one another. So from a, you know, from a data platform vendor's perspective, we see this as a huge opportunity to help organizations like Ionis and others kind of deal with the challenges that Nick is speaking about because if the individual platform vendors are never going to make that part of their strategy, we see it as a great way to add additional value to these customers. >> Well, this data sharing thing is interesting. There's a lot of walled gardens out there. Oracle is a walled garden, AWS in many ways is a walled garden. You know, Microsoft has its walled garden. You could argue Snowflake is a walled garden. But the, what we're seeing and the whole reason behind the notion of super-cloud is we're creating an abstraction layer where you actually, in this case for this use case, can share data in a governed manner. Let's forget about the cross-cloud for a moment. I'll come back to that, but I wonder, Nick, if you could talk about how you are sharing data, again, Snowflake sort of, it's, I look at Snowflake like the app store, Apple, we're going to control everything, we're going to guarantee with data clean rooms and governance and the standards that we've created within that platform, we're going to make sure that it's safe for you to share data in this highly regulated industry. Are you doing that today? And take us through, you know, the considerations that you have in that regard. >> So it's kind of early days for us in Snowflake in general, but certainly in data sharing, we have a couple of examples. So data marketplace, you know, that's a great invention. It's, I've been a small IT shop again, right? The fact that we are able to just bring down terabyte size datasets straight into our Snowflake and run analytics directly on that is huge, right? The fact that we don't have to FTP these massive files around run jobs that may break, being able to just have that on tap is huge for us. We've recently been talking to one of our CRO feeds- CRO organizations about getting their data feeds in. Historically, this clinical trial data that comes in on an FTP file, we have to process it, take it through the platforms, put it into the warehouse. But one of the CROs that we talked to recently when we were reinvestigate in what data opportunities they have, they were a Snowflake customer and we are, I think, the first production customer they have, have taken that feed. So they're basically exposing their tables of data that historically came in these FTP files directly into our Snowflake instance now. We haven't taken advantage of that. It only actually flipped the switch about three or four weeks ago. But that's pretty big for us again, right? We don't have to worry about maintaining those jobs that take those files in. We don't have to worry about the jobs that take those and shove them on the warehouse. We now have a feed that's directly there that we can use a tool like DBT to push through directly into our model. And then the third avenue that's came up, actually fairly recently as well was genetics data. So genetics data that's highly, highly regulated. We had to be very careful with that. And we had a conversation with Snowflake about the data white rooms practice, and we see that as a pretty interesting opportunity. We are having one organization run genetic analysis being able to send us those genetic datasets, but then there's another organization that's actually has the in quotes "metadata" around that, so age, ethnicity, location, et cetera. And being able to join those two datasets through some kind of mechanism would be really beneficial to the organization. Being able to build a data white room so we can put that genetic data in a secure place, anonymize it, and then share the amalgamated data back out in a way that's able to be joined to the anonymized metadata, that could be pretty huge for us as well. >> Okay, so this is interesting. So you talk about FTP, which was the common way to share data. And so you basically, it's so, I got it now you take it and do whatever you want with it. Now we're talking, Jesse, about sharing the same copy of live data. How common is that use case in your industry? >> It's become very common over the last couple of years. And I think a big part of it is having the right technology to do it effectively. You know, as Nick mentioned, historically, this was done by people sending files around. And the challenge with that approach, of course, while there are multiple challenges, one, every time you send a file around your, by definition creating a copy of the data because you have to pull it out of your system of record, put it into a file, put it on some server where somebody else picks it up. And by definition at that point you've lost governance. So this creates challenges in general hesitation to doing so. It's not that it hasn't happened, but the other challenge with it is that the data's no longer real time. You know, you're working with a copy of data that was as fresh as at the time at that when that was actually extracted. And that creates limitations in terms of how effective this can be. What we're starting to see now with some of our customers is live sharing of information. And there's two aspects of that that are important. One is that you're not actually physically creating the copy and sending it to someone else, you're actually exposing it from where it exists and allowing another consumer to interact with it from their own account that could be in another region, some are running in another cloud. So this concept of super-cloud or cross-cloud could becoming realized here. But the other important aspect of it is that when that other- when that other entity is querying your data, they're seeing it in a real time state. And this is particularly important when you think about use cases like supply chain planning, where you're leveraging data across various different enterprises. If I'm a manufacturer or if I'm a contract manufacturer and I can see the actual inventory positions of my clients, of my distributors, of the levels of consumption at the pharmacy or the hospital that gives me a lot of indication as to how my demand profile is changing over time versus working with a static picture that may have been from three weeks ago. And this has become incredibly important as supply chains are becoming more constrained and the ability to plan accurately has never been more important. >> Yeah. So the race is on to solve these problems. So it start, we started with, hey, okay, cloud, Dave, we're going to simplify database, we're going to put it in the cloud, give virtually infinite resources, separate compute from storage. Okay, check, we got that. Now we've moved into sort of data clean rooms and governance and you've got an ecosystem that's forming around this to make it safer to share data. And then, you know, nirvana, at least near term nirvana is we're going to build data applications and we're going to be able to share live data and then you start to get into monetization. Do you see, Nick, in the near future where I know you've got relationships with, for instance, big pharma like AstraZeneca, do you see a situation where you start sharing data with them? Is that in the near term? Is that more long term? What are the considerations in that regard? >> I mean, it's something we've been thinking about. We haven't actually addressed that yet. Yeah, I could see situations where, you know, some of these big relationships where we do need to share a lot of data, it would be very nice to be able to just flick a switch and share our data assets across to those organizations. But, you know, that's a ways off for us now. We're mainly looking at bringing data in at the moment. >> One of the things that we've seen in financial services in particular, and Jesse, I'd love to get your thoughts on this, is companies like Goldman or Capital One or Nasdaq taking their stack, their software, their tooling actually putting it on the cloud and facing it to their customers and selling that as a new monetization vector as part of their digital or business transformation. Are you seeing that Jesse at all in healthcare or is it happening today or do you see a day when that happens or is healthier or just too scary to do that? >> No, we're seeing the early stages of this as well. And I think it's for some of the reasons we talked about earlier. You know, it's a much more secure way to work with a colleague if you don't have to copy your data and potentially expose it. And some of the reasons that people have historically copied that data is that they needed to leverage some sort of algorithm or application that a third party was providing. So maybe someone was predicting the ideal location and run a clinical trial for this particular rare disease category where there are only so many patients around the world that may actually be candidates for this disease. So you have to pick the ideal location. Well, sending the dataset to do so, you know, would involve a fairly complicated process similar to what Nick was mentioning earlier. If the company who was providing the logic or the algorithm to determine that location could bring that algorithm to you and you run it against your own data, that's a much more ideal and a much safer and more secure way for this industry to actually start to work with some of these partners and vendors. And that's one of the things that we're looking to enable going into this year is that, you know, the whole concept should be bring the logic to your data versus your data to the logic and the underlying sharing mechanisms that we've spoken about are actually what are powering that today. >> And so thank you for that, Jesse. >> Yes, Dave. >> And so Nick- Go ahead please. >> Yeah, if I could add, yeah, if I could add to that, that's something certainly we've been thinking about. In fact, we'd started talking to Snowflake about that a couple of years ago. We saw the power there again of the platform to be able to say, well, could we, we were thinking in more of a data share, but could we share our data out to say an AI/ML vendor, have them do the analytics and then share the data, the results back to us. Now, you know, there's more powerful mechanisms to do that within the Snowflake ecosystem now, but you know, we probably wouldn't need to have onsite AI/ML people, right? Some of that stuff's very sophisticated, expensive resources, hard to find, you know, it's much better for us to find a company that would be able to build those analytics, maintain those analytics for us. And you know, we saw an opportunity to do that a couple years ago and we're kind of excited about the opportunity there that we can just basically do it with a no op, right? We share the data route, we have the analytics done, we get the result back and it's just fairly seamless. >> I mean, I could have a whole another Cube session on this, guys, but I mean, I just did a a session with Andy Thurai, a Constellation research about how difficult it's been for organization to get ROI because they don't have the expertise in house so they want to either outsource it or rely on vendor R&D companies to inject that AI and machine intelligence directly into applications. My follow-up question to you Nick is, when you think about, 'cause Jesse was talking about, you know, let the data basically stay where it is and you know bring the compute to that data. If that data lives on different clouds, and maybe it's not your group, but maybe it's other parts of Ionis or maybe it's your partners like AstraZeneca, or you know, the AI/ML partners and they're potentially on other clouds or that data is on other clouds. Do you see that, again, coming back to super-cloud, do you see it as an advantage to be able to have a consistent experience across those clouds? Or is that just kind of get in the way and make things more complex? What's your take on that, Nick? >> Well, from the vendors, so from the client side, it's kind of seamless with Snowflake for us. So we know for a fact that one of the datasets we have at the moment, Compile, which is a, the large multi terabyte dataset I was talking about. They're on AWS on the East Coast and we are on Azure on the West Coast. And they had to do a few tweaks in the background to make sure the data was pushed over from, but from my point of view, the data just exists, right? So for me, I think it's hugely beneficial that Snowflake supports this kind of infrastructure, right? We don't have to jump through hoops to like, okay, well, we'll download it here and then re-upload it here. They already have the mechanism in the background to do these multi-cloud shares. So it's not important for us internally at the moment. I could see potentially at some point where we start linking across different groups in the organization that do have maybe Amazon or Google Cloud, but certainly within our providers. We know for a fact that they're on different services at the moment and it just works. >> Yeah, and we learned from Benoit Dageville, who came into the studio on August 9th with first Supercloud in 2022 that Snowflake uses a single global instance across regions and across clouds, yeah, whether or not you can query across you know, big regions, it just depends, right? It depends on latency. You might have to make a copy or maybe do some tweaks in the background. But guys, we got to jump, I really appreciate your time. Really thoughtful discussion on the future of data and cloud, specifically within healthcare and pharma. Thank you for your time. >> Thanks- >> Thanks for having us. >> All right, this is Dave Vellante for theCUBE team and my co-host, John Furrier. Keep it right there for more action at Supercloud 2. (upbeat music)

Published Date : Jan 3 2023

SUMMARY :

and analytics in the So the first question is, you know And it's interesting that you Great, thank you for that setup. get the funding to bring it in, over the last, you know, So one of the benefits, one of the things And just to add on to Nick's point there. that you mentioned, Nick, and the standards that we've So data marketplace, you know, And so you basically, it's so, And the challenge with Is that in the near term? bringing data in at the moment. One of the things that we've seen that algorithm to you and you And so Nick- the results back to us. Or is that just kind of get in the way in the background to do on the future of data and cloud, All right, this is Dave Vellante

ENTITIES

Entity	Category	Confidence
Jesse Cugliotta	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Goldman	ORGANIZATION	0.99+
AstraZeneca	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Capital One	ORGANIZATION	0.99+
Jesse	PERSON	0.99+
Andy Thurai	PERSON	0.99+
AWS	ORGANIZATION	0.99+
August 9th	DATE	0.99+
Nick	PERSON	0.99+
Nasdaq	ORGANIZATION	0.99+
Nicholas Nick Taylor	PERSON	0.99+
five	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Ionis	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Ionis Pharma	ORGANIZATION	0.99+
Nicholas Taylor	PERSON	0.99+
Ionis Pharmaceuticals	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Benoit Dageville	PERSON	0.99+
Apple	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
2022	DATE	0.99+
today	DATE	0.99+
over 10 years	QUANTITY	0.98+
Snowflake	TITLE	0.98+
one	QUANTITY	0.98+
One	QUANTITY	0.98+
two aspects	QUANTITY	0.98+
first	QUANTITY	0.98+
this year	DATE	0.97+
each	QUANTITY	0.97+
two datasets	QUANTITY	0.97+
West Coast	LOCATION	0.97+
four weeks ago	DATE	0.97+
around five years	QUANTITY	0.97+
three	QUANTITY	0.95+
first production	QUANTITY	0.95+
East Coast	LOCATION	0.95+
third avenue	QUANTITY	0.95+
one organization	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
couple years ago	DATE	0.93+
single cloud	QUANTITY	0.92+
single cloud provider	QUANTITY	0.92+
hree weeks ago	DATE	0.91+
one place	QUANTITY	0.88+
Azure	TITLE	0.86+
last couple of years	DATE	0.85+

Veronika Durgin, Saks | The Future of Cloud & Data

(upbeat music) >> Welcome back to Supercloud 2, an open collaborative where we explore the future of cloud and data. Now, you might recall last August at the inaugural Supercloud event we validated the technical feasibility and tried to further define the essential technical characteristics, and of course the deployment models of so-called supercloud. That is, sets of services that leverage the underlying primitives of hyperscale clouds, but are creating new value on top of those clouds for organizations at scale. So we're talking about capabilities that fundamentally weren't practical or even possible prior to the ascendancy of the public clouds. And so today at Supercloud 2, we're digging further into the topic with input from real-world practitioners. And we're exploring the intersection of data and cloud, And importantly, the realities and challenges of deploying technology for a new business capability. I'm pleased to have with me in our studios, west of Boston, Veronika Durgin, who's the head of data at Saks. Veronika, welcome. Great to see you. Thanks for coming on. >> Thank you so much. Thank you for having me. So excited to be here. >> And so we have to say upfront, you're here, these are your opinions. You're not representing Saks in any way. So we appreciate you sharing your depth of knowledge with us. >> Thank you, Dave. Yeah, I've been doing data for a while. I try not to say how long anymore. It's been a while. But yeah, thank you for having me. >> Yeah, you're welcome. I mean, one of the highlights of this past year for me was hanging out at the airport with you after the Snowflake Summit. And we were just chatting about sort of data mesh, and you were saying, "Yeah, but." There was a yeah, but. You were saying there's some practical realities of actually implementing these things. So I want to get into some of that. And I guess starting from a perspective of how data has changed, you've seen a lot of the waves. I mean, even if we go back to pre-Hadoop, you know, that would shove everything into an Oracle database, or, you know, Hadoop was going to save our data lives. And the cloud came along and, you know, that was kind of a disruptive force. And, you know, now we see things like, whether it's Snowflake or Databricks or these other platforms on top of the clouds. How have you observed the change in data and the evolution over time? >> Yeah, so I started as a DBA in the data center, kind of like, you know, growing up trying to manage whatever, you know, physical limitations a server could give us. So we had to be very careful of what we put in our database because we were limited. We, you know, purchased that piece of hardware, and we had to use it for the next, I don't know, three to five years. So it was only, you know, we focused on only the most important critical things. We couldn't keep too much data. We had to be super efficient. We couldn't add additional functionality. And then Hadoop came along, which is like, great, we can dump all the data there, but then we couldn't get data out of it. So it was like, okay, great. Doesn't help either. And then the cloud came along, which was incredible. I was probably the most excited person. I'm lying, but I was super excited because I no longer had to worry about what I can actually put in my database. Now I have that, you know, scalability and flexibility with the cloud. So okay, great, that data's there, and I can also easily get it out of it, which is really incredible. >> Well, but so, I'm inferring from what you're saying with Hadoop, it was like, okay, no schema on write. And then you got to try to make sense out of it. But so what changed with the cloud? What was different? >> So I'll tell a funny story. I actually successfully avoided Hadoop. The only time- >> Congratulations. >> (laughs) I know, I'm like super proud of it. I don't know how that happened, but the only time I worked for a company that had Hadoop, all I remember is that they were running jobs that were taking over 24 hours to get data out of it. And they were realizing that, you know, dumping data without any structure into this massive thing that required, you know, really skilled engineers wasn't really helpful. So what changed, and I'm kind of thinking of like, kind of like how Snowflake started, right? They were marketing themselves as a data warehouse. For me, moving from SQL Server to Snowflake was a non-event. It was comfortable, I knew what it was, I knew how to get data out of it. And I think that's the important part, right? Cloud, this like, kind of like, vague, high-level thing, magical, but the reality is cloud is the same as what we had on prem. So it's comfortable there. It's not scary. You don't need super new additional skills to use it. >> But you're saying what's different is the scale. So you can throw resources at it. You don't have to worry about depreciating your hardware over three to five years. Hey, I have an asset that I have to take advantage of. Is that the big difference? >> Absolutely. Actually, from kind of like operational perspective, which it's funny. Like, I don't have to worry about it. I use what I need when I need it. And not to take this completely in the opposite direction, people stop thinking about using things in a very smart way, right? You like, scale and you walk away. And then, you know, the cool thing about cloud is it's scalable, but you also should not use it when you don't need it. >> So what about this idea of multicloud. You know, supercloud sort of tries to go beyond multicloud. it's like multicloud by accident. And now, you know, whether it's M&A or, you know, some Skunkworks is do, hey, I like Google's tools, so I'm going to use Google. And then people like you are called on to, hey, how do we clean up this mess? And you know, you and I, at the airport, we were talking about data mesh. And I love the concept. Like, doesn't matter if it's a data lake or a data warehouse or a data hub or an S3 bucket. It's just a node on the mesh. But then, of course, you've got to govern it. You've got to give people self-serve. But this multicloud is a reality. So from your perspective, from a practitioner's perspective, what are the advantages of multicloud? We talk about the disadvantages all the time. Kind of get that, but what are the advantages? >> So I think the first thing when I think multicloud, I actually think high-availability disaster recovery. And maybe it's just how I grew up in the data center, right? We were always worried that if something happened in one area, we want to make sure that we can bring business up very quickly. So to me that's kind of like where multicloud comes to mind because, you know, you put your data, your applications, let's pick on AWS for a second and, you know, US East in AWS, which is the busiest kind of like area that they have. If it goes down, for my business to continue, I would probably want to move it to, say, Azure, hypothetically speaking, again, or Google, whatever that is. So to me, and probably again based on my background, disaster recovery high availability comes to mind as multicloud first, but now the other part of it is that there are, you know, companies and tools and applications that are being built in, you know, pick your cloud. How do we talk to each other? And more importantly, how do we data share? You know, I work with data. You know, this is what I do. So if, you know, I want to get data from a company that's using, say, Google, how do we share it in a smooth way where it doesn't have to be this crazy, I don't know, SFTP file moving. So that's where I think supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> So you kind of answered my next question, is do you see use cases going beyond H? I mean, the HADR was, remember, that was the original cloud use case. That and bursting, you know, for, you know, Thanksgiving or, you know, for Black Friday. So you see an opportunity to go beyond that with practical use cases. >> Absolutely. I think, you know, we're getting to a world where every company is a data company. We all collect a lot of data. We want to use it for whatever that is. It doesn't necessarily mean sell it, but use it to our competitive advantage. So how do we do it in a very smooth, easy way, which opens additional opportunities for companies? >> You mentioned data sharing. And that's obviously, you know, I met you at Snowflake Summit. That's a big thing of Snowflake's. And of course, you've got Databricks trying to do similar things with open technology. What do you see as the trade-offs there? Because Snowflake, you got to come into their party, you're in their world, and you're kind of locked into that world. Now they're trying to open up. You know, and of course, Databricks, they don't know our world is wide open. Well, we know what that means, you know. The governance. And so now you're seeing, you saw Amazon come out with data clean rooms, which was, you know, that was a good idea that Snowflake had several years before. It's good. It's good validation. So how do you think about the trade-offs between kind of openness and freedom versus control? Is the latter just far more important? >> I'll tell you it depends, right? It's kind of like- >> Could be insulting to that. >> Yeah, I know. It depends because I don't know the answer. It depends, I think, because on the use case and application, ultimately every company wants to make money. That's the beauty of our like, capitalistic economy, right? We're driven 'cause we want to make money. But from the use, you know, how do I sell a product to somebody who's in Google if I am in AWS, right? It's like, we're limiting ourselves if we just do one cloud. But again, it's difficult because at the same time, every cloud provider wants for you to be locked in their cloud, which is why probably, you know, whoever has now data sharing because they want you to stay within their ecosystem. But then again, like, companies are limited. You know, there are applications that are starting to be built on top of clouds. How do we ensure that, you know, I can use that application regardless what cloud, you know, my company is using or I just happen to like. >> You know, and it's true they want you to stay in their ecosystem 'cause they'll make more money. But as well, you think about Apple, right? Does Apple do it 'cause they can make more money? Yes, but it's also they have more control, right? Am I correct that technically it's going to be easier to govern that data if it's all the sort of same standard, right? >> Absolutely. 100%. I didn't answer that question. You have to govern and you have to control. And honestly, it's like it's not like a nice-to-have anymore. There are compliances. There are legal compliances around data. Everybody at some point wants to ensure that, you know, and as a person, quite honestly, you know, not to be, you know, I don't like when my data's used when I don't know how. Like, it's a little creepy, right? So we have to come up with standards around that. But then I also go back in the day. EDI, right? Electronic data interchange. That was figured out. There was standards. Companies were sending data to each other. It was pretty standard. So I don't know. Like, we'll get there. >> Yeah, so I was going to ask you, do you see a day where open standards actually emerge to enable that? And then isn't that the great disruptor to sort of kind of the proprietary stack? >> I think so. I think for us to smoothly exchange data across, you know, various systems, various applications, we'll have to agree to have standards. >> From a developer perspective, you know, back to the sort of supercloud concept, one of the the components of the essential characteristics is you've got this PaaS layer that provides consistency across clouds, and it has unique attributes specific to the purpose of that supercloud. So in the instance of Snowflake, it's data sharing. In the case of, you know, VMware, it might be, you know, infrastructure or self-serve infrastructure that's consistent. From a developer perspective, what do you hear from developers in terms of what they want? Are we close to getting that across clouds? >> I think developers always want freedom and ability to engineer. And oftentimes it's not, (laughs) you know, just as an engineer, I always want to build something, and it's not always for the, to use a specific, you know, it's something I want to do versus what is actually applicable. I think we'll land there, but not because we are, you know, out of the kindness of our own hearts. I think as a necessity we will have to agree to standards, and that that'll like, move the needle. Yeah. >> What are the limitations that you see of cloud and this notion of, you know, even cross cloud, right? I mean, this one cloud can't do it all. You know, but what do you see as the limitations of clouds? >> I mean, it's funny, I always think, you know, again, kind of probably my background, I grew up in the data center. We were physically limited by space, right? That there's like, you can only put, you know, so many servers in the rack and, you know, so many racks in the data center, and then you run out space. Earth has a limited space, right? And we have so many data centers, and everybody's collecting a lot of data that we actually want to use. We're not just collecting for the sake of collecting it anymore. We truly can't take advantage of it because servers have enough power, right, to crank through it. We will run enough space. So how do we balance that? How do we balance that data across all the various data centers? And I know I'm like, kind of maybe talking crazy, but until we figure out how to build a data center on the Moon, right, like, we will have to figure out how to take advantage of all the compute capacity that we have across the world. >> And where does latency fit in? I mean, is it as much of a problem as people sort of think it is? Maybe it depends too. It depends on the use case. But do multiple clouds help solve that problem? Because, you know, even AWS, $80 billion company, they're huge, but they're not everywhere. You know, they're doing local zones, they're doing outposts, which is, you know, less functional than their full cloud. So maybe I would choose to go to another cloud. And if I could have that common experience, that's an advantage, isn't it? >> 100%, absolutely. And potentially there's some maybe pricing tiers, right? So we're talking about latency. And again, it depends on your situation. You know, if you have some sort of medical equipment that is very latency sensitive, you want to make sure that data lives there. But versus, you know, I browse on a website. If the website takes a second versus two seconds to load, do I care? Not exactly. Like, I don't notice that. So we can reshuffle that in a smart way. And I keep thinking of ways. If we have ways for data where it kind of like, oh, you are stuck in traffic, go this way. You know, reshuffle you through that data center. You know, maybe your data will live there. So I think it's totally possible. I know, it's a little crazy. >> No, I like it, though. But remember when you first found ways, you're like, "Oh, this is awesome." And then now it's like- >> And it's like crowdsourcing, right? Like, it's smart. Like, okay, maybe, you know, going to pick on US East for Amazon for a little bit, their oldest, but also busiest data center that, you know, periodically goes down. >> But then you lose your competitive advantage 'cause now it's like traffic socialism. >> Yeah, I know. >> Right? It happened the other day where everybody's going this way up. There's all the Wazers taking. >> And also again, compliance, right? Every country is going down the path of where, you know, data needs to reside within that country. So it's not as like, socialist or democratic as we wish for it to be. >> Well, that's a great point. I mean, when you just think about the clouds, the limitation, now you go out to the edge. I mean, everybody talks about the edge in IoT. Do you actually think that there's like a whole new stove pipe that's going to get created. And does that concern you, or do you think it actually is going to be, you know, connective tissue with all these clouds? >> I honestly don't know. I live in a practical world of like, how does it help me right now? How does it, you know, help me in the next five years? And mind you, in five years, things can change a lot. Because if you think back five years ago, things weren't as they are right now. I mean, I really hope that somebody out there challenges things 'cause, you know, the whole cloud promise was crazy. It was insane. Like, who came up with it? Why would I do that, right? And now I can't imagine the world without it. >> Yeah, I mean a lot of it is same wine, new bottle. You know, but a lot of it is different, right? I mean, technology keeps moving us forward, doesn't it? >> Absolutely. >> Veronika, it was great to have you. Thank you so much for your perspectives. If there was one thing that the industry could do for your data life that would make your world better, what would it be? >> I think standards for like data sharing, data marketplace. I would love, love, love nothing else to have some agreed upon standards. >> I had one other question for you, actually. I forgot to ask you this. 'Cause you were saying every company's a data company. Every company's a software company. We're already seeing it, but how prevalent do you think it will be that companies, you've seen some of it in financial services, but companies begin to now take their own data, their own tooling, their own software, which they've developed internally, and point that to the outside world? Kind of do what AWS did. You know, working backwards from the customer and saying, "Hey, we did this for ourselves. We can now do this for the rest of the world." Do you see that as a real trend, or is that Dave's pie in the sky? >> I think it's a real trend. Every company's trying to reinvent themselves and come up with new products. And every company is a data company. Every company collects data, and they're trying to figure out what to do with it. And again, it's not necessarily to sell it. Like, you don't have to sell data to monetize it. You can use it with your partners. You can exchange data. You know, you can create products. Capital One I think created a product for Snowflake pricing. I don't recall, but it just, you know, they built it for themselves, and they decided to kind of like, monetize on it. And I'm absolutely 100% on board with that. I think it's an amazing idea. >> Yeah, Goldman is another example. Nasdaq is basically taking their exchange stack and selling it around the world. And the cloud is available to do that. You don't have to build your own data center. >> Absolutely. Or for good, right? Like, we're talking about, again, we live in a capitalist country, but use data for good. We're collecting data. We're, you know, analyzing it, we're aggregating it. How can we use it for greater good for the planet? >> Veronika, thanks so much for coming to our Marlborough studios. Always a pleasure talking to you. >> Thank you so much for having me. >> You're really welcome. All right, stay tuned for more great content. From Supercloud 2, this is Dave Vellante. We'll be right back. (upbeat music)

Published Date : Dec 27 2022

SUMMARY :

and of course the deployment models Thank you so much. So we appreciate you sharing your depth But yeah, thank you for having me. And the cloud came along and, you know, So it was only, you know, And then you got to try I actually successfully avoided Hadoop. you know, dumping data So you can throw resources at it. And then, you know, the And you know, you and I, at the airport, to mind because, you know, That and bursting, you know, I think, you know, And that's obviously, you know, But from the use, you know, You know, and it's true they want you to ensure that, you know, you know, various systems, In the case of, you know, VMware, but not because we are, you know, and this notion of, you know, can only put, you know, which is, you know, less But versus, you know, But remember when you first found ways, Like, okay, maybe, you know, But then you lose your It happened the other day the path of where, you know, is going to be, you know, How does it, you know, help You know, but a lot of Thank you so much for your perspectives. to have some agreed upon standards. I forgot to ask you this. I don't recall, but it just, you know, And the cloud is available to do that. We're, you know, analyzing Always a pleasure talking to you. From Supercloud 2, this is Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Veronika	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
Saks	ORGANIZATION	0.99+
$80 billion	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
last August	DATE	0.99+
Capital One	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
M&A	ORGANIZATION	0.99+
Skunkworks	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Nasdaq	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
Earth	LOCATION	0.98+
Databricks	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
today	DATE	0.98+
Snowflake Summit	EVENT	0.98+
US East	LOCATION	0.98+
five years ago	DATE	0.97+
SQL Server	TITLE	0.97+
first thing	QUANTITY	0.96+
Boston	LOCATION	0.95+
Black Friday	EVENT	0.95+
Hadoop	TITLE	0.95+
over 24 hours	QUANTITY	0.95+
one	QUANTITY	0.94+
first	QUANTITY	0.94+
supercloud	ORGANIZATION	0.94+
one thing	QUANTITY	0.93+
Moon	LOCATION	0.93+
Thanksgiving	EVENT	0.93+
over three	QUANTITY	0.92+
one other question	QUANTITY	0.91+
one cloud	QUANTITY	0.9+
one area	QUANTITY	0.9+
Snowflake	TITLE	0.89+
multicloud	ORGANIZATION	0.86+
Azure	ORGANIZATION	0.85+
Supercloud 2	ORGANIZATION	0.83+
> 100%	QUANTITY	0.82+
Goldman	ORGANIZATION	0.81+
Snowflake	EVENT	0.8+
a second	QUANTITY	0.73+
several years before	DATE	0.72+
this past year	DATE	0.71+
second	QUANTITY	0.7+
Marlborough	LOCATION	0.7+
supercloud	TITLE	0.66+
next five years	DATE	0.65+
multicloud	TITLE	0.59+
PaaS	TITLE	0.55+

Charu Kapur, NTT Data & Rachel Mushahwar, AWS & Jumi Barnes, Goldman Sachs | AWS re:Invent 2022

>>Hey everyone. Hello from Las Vegas. Lisa Martin here with you, and I'm on the show floor at Reinvent. But we have a very special program series that the Cube has been doing called Women of the Cloud. It's brought to you by aws and I'm so pleased to have an excellent panel of women leaders in technology and in cloud to talk about their tactical recommendations for you, what they see as found, where they've helped organizations be successful with cloud. Please welcome my three guests, Tara Kapor, president and Chief Revenue Officer, consulting and Digital Transformations, NTT Data. We have Rachel Mu, aws, head of North America, partner sales from aws, and Jimmy Barnes joins us as well, managing director, investment banking engineering at Goldman Sachs. It is so great to have you guys on this power panel. I love it. Thank you for joining me. >>Thank >>You. Let's start with you. Give us a little bit of, of your background at NTT Data and I, and I understand NTT has a big focus on women in technology and in stem. Talk to us a little bit about that and then we'll go around the table. >>Perfect, thank you. Thank you. So brand new role for me at Entity Data. I started three months back and it's a fascinating company. We are about 22 billion in size. We work across industries on multiple innovative use cases. So we are doing a ton of work on edge analytics in the cloud, and that's where we are here with aws. We are also doing a ton of work on the private 5G that we are rolling out and essentially building out industry-wide use cases across financial services, manufacturing, tech, et cetera. Lots of women identity. We essentially have women run cloud program today. We have a gal called Nore Hanson who is our practice leader for cloud. We have Matine who's Latifa, who's our AWS cloud leader. We have Molly Ward who leads up a solutions on the cloud. We have an amazing lady in Mona who leads up our marketing programs. So a fantastic plethora of diverse women driving amazing work identity on cloud. >>That's outstanding to hear because it's one of those things that you can't be what you can't see. Right. We all talk about that. Rachel, talk a little bit about your role and some of the focus that AWS has. I know they're big customer obsession, I'm sure obsessed with other things as well. >>Sure. So Rachel Muir, pleased to be here again. I think this will be my third time. So a big fan of the Cube. I'm fortunate enough to lead our North America partner and channel business, and I'll tell you, I've been at AWS for a little under two years, and honestly, it's been probably the best two years of my career. Just in terms of where the cloud is, where it's headed, the business outcomes that we can deliver with our customers and with our partners is absolutely remarkable. We get to, you know, make the impossible possible every day. So I'm, I'm thrilled to be here and I'm thrilled to, to be part of this inaugural Women of the Cloud panel. >>Oh, I'm prepared to have all three of you. One of the things that feedback, kind of pivoting off what, Rachel, one of the things that you said that one of our guests, some of several of our guests have said is that coming out of Adams keynote this morning, it just seems limitless what AWS can do and I love that it gives me kind of chills what they can do with cloud computing and technology, with its ecosystem of partners with its customers like Goldman Sachs. Jimmy, talk to us a little bit about you, your role at Goldman Sachs. You know, we think of Goldman Sachs is a, is a huge financial institution, but it's also a technology company. >>Yeah. I mean, since the age of 15 I've been super passionate about how we can use technology to transform business and simplify modernized business processes. And it's, I'm so thrilled that I have the opportunity to do that at Goldman Sachs as an engineer. I recently moved about two years ago into the investment banking business and it's, you know, it's best in class, one of the top companies in terms of mergers and acquisitions, IPOs, et cetera. But what surprised me is how technology enables all the businesses across the board. Right? And I get to be leading the digital platform for building out the digital platform for in the investment banking business where we're modernizing and transforming existing businesses. These are not new businesses. It's like sometimes I liken it to trying to change the train while it's moving, right? These are existing businesses, but now we get to modernize and transform on the cloud. Right. Not just efficiency for the business by efficiency for technologists as >>Well. Right, right. Sticking with you, Jimmy. I wanna understand, so you've been, you've been interested in tech since you were young. I only got into tech and accidentally as an adult. I'm curious about your career path, but talk to us about that. What are some of the recommendations that you would have for other women who might be looking at, I wanna be in technology, but I wanna work for some of the big companies and they don't think about the Goldman Sachs or some of the other companies like Walmart that are absolutely technology driven. What's your advice for those women who want to grow their career? >>I also, growing up, I was, I was interested in various things. I, I loved doing hair. I used to do my own hair and I used to do hair for other students at school and I was also interested in running an entertainment company. And I used actually go around performing and singing and dancing with a group of friends, especially at church. But what amazed me is when I landed my first job at a real estate agent and everything was being done manually on paper, I was like, wow, technology can bring transformation anywhere and everywhere. And so whilst I have a myriad of interest, there's so many ways that technology can be applied. There's so many different types of disciplines within technology. It's not, there's hands on, like I'm colder, I like to code, but they're product managers, there are business analysts, there are infrastructure specialist. They're a security specialist. And I think it's about pursuing your passion, right? Pursuing your passion and identifying which aspects of technology peak your interest. And then diving in. >>Love that. Diving in. Rachel, you're shaking your head. You definitely are in alignment with a lot of what >>Duties I am. So, you know, interesting enough, I actually started my career as a civil engineer and eventually made it into, into technology. So very similar. I saw in, you know, heavy highway construction how manual some of these processes were. And mind you, this was before the cloud. And I sat down and wrote a little computer program to automate a lot of these manual tasks. And for me it was about simplification of the customer journey and really figuring out how do you deliver value. You know, on fast forward, say 20 plus years, here I am with AWS who has got this amazing cloud platform with over 200 services. And when I think about what we do in tech, from business transformation to modernizing to helping customers think about how do they create new business models, I've really found, I've really found my sweet spot, and I'll say for anyone who wants to get into tech or even switch careers, there's just a couple words of advice that I have. And it's really two words, just start. >>Yes, >>That's it. Just start. Because sometimes later becomes never. And you know, fuel your passion, be curious, think about new things. Yes. And just >>Start, I love that. Just start, you should get t-shirts made with that. Tell me a little bit about some of your recommendations. Obviously just start is great when follow your passion. What would you say to those out there looking to plan the letter? >>So, you know, my, my story's a little bit like jus because I did not want to be in tech. You know, I wanted an easy life. I did well in school and I wanted to actually be an air hostess. And when I broke that to my father, you know, the standard Indian person, now he did, he, you know, he wanted me to go in and be an engineer. Okay? So I was actually push into computer engineering, graduated. But then really two things today, right? When I look back, really two pieces, two areas I believe, which are really important for success. One is, you know, we need to be competent. And the second is we need to be confident, right? Yes, yes. It's so much easier to be competent because a lot of us diverse women, diverse people tend to over rotate on knowing their technical skills, right? Knowing technical skills important, but you need to know how to potentially apply those to business, right? Be able to define a business roi. And I see Julie nodding because she wants people to come in and give her a business ROI for programs that you're executing at Goldman Sachs. I presume the more difficult part though is confidence. >>Absolutely. It's so hard, especially when, when we're younger, we don't know. Raise your hand because I guarantee you either half the people in the, in the room or on the zoom these days weren't listening or have the same question and are too afraid to ask because they don't have the confidence. That's right. Give me, let's pivot on confidence for a minute, Jim, and let's go back to how would you advise your younger self to find your confidence? >>That's, that's a tough one because I feel like even this older self is still finding exercise to, to be real. But I think it's about, I would say it's not praise. I think it's about praising yourself, like recognizing your accomplishments. When I think about my younger self, I think I, I like to focus more on what I didn't do or what I didn't accomplish, instead of majoring and focusing on all the accomplishments and the achievements and reminding myself of those day after day after day. And I think it's about celebrating your wins. >>I love that. Celebrating your wins. Do you agree, Rachel? >>I do. Here's the hard part, and I look around this table of amazing business leaders and I can guarantee that every single one of us sometime this year woke up and said, oh my gosh, I don't know how to do that. Oh >>Yeah. But >>What we haven't followed that by is, I don't know how to do that yet. Right. And here's the other thing I would tell my younger self is there will be days where every single one of us falls apart. There will be days when we feel like we failed at work. There will be days when you feel like you failed as a parent or you failed as a spouse. There'll be days where you have a kid in the middle of target screaming and crying while you're trying to close a big business deal and you just like, oh my gosh, is this really my life? But what I would tell my younger self is, look, the crying, the chaos, the second guessing yourself, the successes, every single one of those are milestones. And it's triumphant, it's tragic, but every single thing that we have been through is fiercely worthwhile. And it's what got us >>Here. Absolutely. Absolutely. Think of all the trials and tribulations and six and Zacks that got you to this table right now. Yep. So Terry, you brought up confidence. How would you advise the women out there won't say you're gonna know stuff. The women out there now that are watching those that are watching right there. Hi. How would you advise them to really find their, their ability to praise themselves, recognize all of the trials and the tribulations as milestones as Rachel said, and really give themselves a seat at the table, raise their hand regardless of who else is in the room? >>You know, it's a, it's a more complex question just because confidence stems from courage, right? Confidence also stems from the belief that you're going to be treated fairly right now in an organization for you to be treated fairly. You need to have, be surrounded by supporters that are going to promote your voice. And very often women don't invest enough in building that support system around them. Yeah. Right. We have mentors, and mentors are great because they come in and they advise us and they'll tell us what we need to go out and do. We really need a team of sponsors Yes. Who come in and support us in the moment in the business. Give us the informal channel because very often we are not plugged into the informal channel, right. So we don't get those special projects or assignments or even opportunities to prove that we can do the tough task. Yeah. So, you know, my, my advice would be to go out and build a network of sponsors. Yes. And if you don't have one, be a sponsor for someone else. That's right. I love that. Great way to win sponsorship is by extending it todos. >>And sometimes too, it's about, honestly, I didn't even know the difference between a mentor and a sponsor until a few years ago. And I started thinking, who are I? And then I started realizing who they were. That's right. And some of the conversations that we've had on the cube about women in technology, women of the cloud with some of the women leaders have said, build, and this is kind of like, sort of what you were saying, build your own personal board of directors. Yeah. And that, oh, it gives me chills. It's just, it's so important for, for not just women, but anybody, for everybody. But it's so important to do that. And if you, you think about LinkedIn as an example, you have a network, it's there, utilize it, figure out who your mentors are, who your sponsors are, who are gonna help you land the next thing, start building that reputation. But having that board of directors that you can kind of answer to or have some accountability towards, I think is hugely very >>Important. Yeah. >>Very important. I think, you know, just for, just for those that are listening, a really important distinction for me was mentors are people that you have that help you with, Hey, here's the situation that you were just in. They advise you on the situation. Sponsors are the people that stick up for you when you're not in the room to them. Right. Sponsors are the ones that say, Hey, I think so and so not only needs to have a seat at the table, but they need to build the table. And that's a really important delineation. Yeah. Between mentors and sponsors. And everybody's gotta have a sponsor both within their company and outside of their company. Someone that's advocating for them on their behalf when they don't even know it. Yeah. Yeah. >>I love that you said that. Build the table. It reminds me of a quote that I heard from Will I am, I know, very random. It was a podcast he did with Oprah Winfrey on ai. He's very into ai and I was doing a panel on ai, so I was doing a lot of research and he said, similar for Rachel to build the table, don't wait for a door to open. You go build a door. And I just thought, God, that is such brilliant advice. It is. It's hard to do. It is. Especially when, you know, the four of us in this room, there's a lot of women around here, but we are in an environment where we are the minority women of color are also the minority. What do you guys think where tech is in terms of de and I and really focusing on De and I as as really a very focused strategic initiative. Turner, what do you think? >>So, you know, I just, I, I spoke earlier about the women that we have at Entity Data, right? We have a fabulous team of women. And joining this team has been a moment of revelation for me coming in. I think to promote dni, we all need to start giving back, right? Yes. So today, I would love to announce that we at Entity would like to welcome all of you out there. You know, folks that have diverse ideas, you know, ISV, partners with diverse solutions, thought leaders out there who want to contribute into the ecosystem, right? Customers out there who want to work with companies that are socially responsible, right? We want to work with all of you, come back, reach out to us and be a part of the ecosystem because we can build this together, right? AWS has an amazing platform that gives us an opportunity to do things differently. Yes. Right. Entity data is building a women powered cloud team. And I want to really extend that out to everyone else to be a part this ecosystem, >>But a fantastic opportunity. You know, when we talk about diversity and inclusion and equity, it needs to be intentional for organization. It sounds very intentional at ntt. I know that that intention is definitely there at AWS as well. What are your thoughts on where tech is with respect to diversity? Even thought diversity? Because a lot of times we tend to go to our comfort zones. We do. And so we tend to start creating these circles of kind of like, you know, think tanks and they think alike people to go outside of that comfort zone. It's part of building the table, of building the, is the table and getting people from outside your comfort zone to come in and bring in diverse thought. Because can you imagine the potential of technology if we have true thought diversity in an organization? >>Right? It's, it's incredible. So one of the things that I always share with my team is we've got the opportunity to really change the outcome, right? As you know, you talked about Will I am I'm gonna talk about Bono from you too, right? One of, one of his favorite quotes is, we are the people we've been waiting for. Oh, I love that. And when you think about that, that is us. There is no one else that's gonna change the outcome and continue to deliver some of the business outcomes and the innovation that we are if we don't continue to raise our hand and we don't continue to, to inspire the next generation of leaders to do the same thing. And what I've found is when you start openly sharing what your innovation ideas are or how you're leveraging your engineering background, your stories and your successes, and, and frankly, some of your failures become the inspiration for someone you might not even know. Absolutely. And that's the, you know, that's the key. You're right. Inclusion, diversity, equity and accessibility, yes. Have to be at the forefront of every business decision. And I think too often companies think that, you know, inclusion, diversity, equity and accessibility is one thing, and business outcomes are another. And they're not. No, they are one in the same. You can't build business outcomes without also focusing on inclusion, diversity, equity, accessibility. That's the deliberate piece. >>And, and it has to be deliberate. Jimmy, I wanna ask you, we only have a couple of minutes left, but you're a woman in tech, you're a woman of color. What was that like for you? You, you were very intentional knowing when you were quite young. Yeah. What you wanted to do, but how have you navigated that? Because I can't imagine that was easy. >>It wasn't. I remember, I always tell the story and the, the two things that I really wanted to emphasize today when I thought about this panel is rep representation matters and showing up matters, right? And there's a statement, there's a flow, I don't know who it's attributed to, but be the change you want to see. And I remember walking through the doors of Goldman Sachs 15 years ago and not seeing a black female engineer leader, right? And at that point in time, I had a choice. I could be like, oh, there's no one look like, there's no one that looks like me. I don't belong here. Or I could do what I actually did and say, well, I'm gonna be that person. >>Good, >>Right? I'm going to be the chain. I'm going to show up and I am going to have a seat at the table so that other people behind me can also have a seat at the table. And I think that I've had the privilege to work for a company who has been inclusive, who has had the right support system, the right structures in place, so that I can be that person who is the first black woman tech fellow at Goldman Sachs, who is one of the first black females to be promoted up the rank as a, from analysts to managing director at the company. You know, that was not just because I determined that I belong here, but because the company ensure that I felt that I belong. >>Right. >>That's a great point. They ensure that you felt that. Yeah. You need to be able to feel that. Last question, we've only got about a minute left. 2023 is just around the corner. What comes to your mind, Jimmy will stick with you as you head into the new year. >>Sorry, can you repeat >>What comes to mind priorities for 2023 that you're excited about? >>I'm excited about the democratization of data. Yeah. I'm excited about a lot of the announcements today and I, I think there is a, a huge shift going on with this whole concept of marketplaces and data exchanges and data sharing. And I think both internally and externally, people are coming together more. Companies are coming together more to really de democratize and make data available. And data is power. But a lot of our businesses are running, running on insights, right? And we need to bring that data together and I'm really excited about the trends that's going on in cloud, in technology to actually bring the data sets together. >>Touro, what are you most excited about as we head to 2023? >>I think I'm really excited about the possibilities that entity data has right here, right now, city of Las Vegas, we've actually rolled out a smart city project. So saving citizens life, using data edge analytics, machine learning, being able to predict adverse incidents before they happen, and then being able to take remediation action, right? So that's technology actually working in real time to give us tangible results. We also sponsor the Incar races. Lots of work happening there in delivering amazing customer experience across the platform to millions of users real time. So I think I'm just excited about technology coming together, but while that's happening, I think we really need to be mindful at this time that we don't push our planet into per right. We need to be sustainable, we need to be responsible. >>Absolutely. Rachel, take us out. What are you most excited about going into 2023? >>So, you know, there are so many trends that are, that we could talk about, but I'll tell you at aws, you know, we're big. We, we impact the world. So we've gotta be really thoughtful and humble about what it is that we do. So for me, what I'm most excited about is, you know, one of our leadership principles is about, you know, with what broad responsibility brings, you know, you've got to impact sustainability and many of those other things. And for me, I think it's about waking up every day for our customers, for our partners, and for the younger generations. And being better, doing better, and making better for this planet and for, you know, the future generations to come. So >>I think your tag line just start applies to all of that. It does. It has been an absolute pleasure. And then really an honor to talk to you on the program. Thank you all for joining me, sharing your experiences, sharing what you've accomplished, your recommendations for those others who might be our same generation or older or younger. All really beautiful advice. Thank you so much for your time and your insights. We appreciate it. >>Thank you. Thank you. >>For my guests, I'm Lisa Martin. You're watching The Cube, the leader in live enterprise and emerging tech coverage. Thanks for watching.

Published Date : Nov 30 2022

SUMMARY :

It is so great to have you guys on this power panel. Talk to us a little bit about that and then we'll go around the table. So we are doing a ton of work on edge analytics in the That's outstanding to hear because it's one of those things that you can't be what you can't see. the business outcomes that we can deliver with our customers and Jimmy, talk to us a little bit about you, your role at Goldman Sachs. And I get to be leading the digital platform What are some of the recommendations that you would have for other And I think it's about pursuing Rachel, you're shaking your head. So, you know, interesting enough, I actually started my career as a And you know, fuel your passion, be curious, What would you say to And when I broke that to my father, you know, the standard Indian Give me, let's pivot on confidence for a minute, Jim, and let's go back to how would you advise your And I think it's about celebrating your wins. Do you agree, Rachel? don't know how to do that. And here's the other thing I would tell my younger self is there and Zacks that got you to this table right now. And if you don't have one, be a sponsor for someone else. some of the women leaders have said, build, and this is kind of like, sort of what you were saying, build your own personal board Yeah. Sponsors are the people that stick up for you when you're not in the room I love that you said that. You know, folks that have diverse ideas, you know, ISV, And so we tend to start creating these circles of kind of like, you know, think tanks and they think alike And when you think about that, that What you wanted to do, but how have you navigated that? but be the change you want to see. And I think that I've Jimmy will stick with you as you head into the new year. And I think both internally and We need to be sustainable, we need to be responsible. What are you most excited about going into 2023? this planet and for, you know, the future generations to come. And then really an honor to talk to you on the program. Thank you. and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Julie	PERSON	0.99+
Rachel	PERSON	0.99+
Tara Kapor	PERSON	0.99+
Terry	PERSON	0.99+
Rachel Mu	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Jimmy	PERSON	0.99+
Jim	PERSON	0.99+
Turner	PERSON	0.99+
Molly Ward	PERSON	0.99+
NTT	ORGANIZATION	0.99+
Entity Data	ORGANIZATION	0.99+
Rachel Muir	PERSON	0.99+
Las Vegas	LOCATION	0.99+
NTT Data	ORGANIZATION	0.99+
Oprah Winfrey	PERSON	0.99+
Jimmy Barnes	PERSON	0.99+
Charu Kapur	PERSON	0.99+
today	DATE	0.99+
2023	DATE	0.99+
millions	QUANTITY	0.99+
two years	QUANTITY	0.99+
first job	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
two words	QUANTITY	0.99+
The Cube	TITLE	0.99+
Rachel Mushahwar	PERSON	0.99+
over 200 services	QUANTITY	0.99+
Nore Hanson	PERSON	0.99+
two things	QUANTITY	0.99+
20 plus years	QUANTITY	0.99+
third time	QUANTITY	0.99+
Entity	ORGANIZATION	0.99+
three	QUANTITY	0.99+
three guests	QUANTITY	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
four	QUANTITY	0.98+
second	QUANTITY	0.98+
six	QUANTITY	0.98+
three months back	DATE	0.98+
Touro	PERSON	0.98+
both	QUANTITY	0.98+
two areas	QUANTITY	0.97+
North America	LOCATION	0.97+
about 22 billion	QUANTITY	0.97+
Latifa	PERSON	0.97+
under two years	QUANTITY	0.96+
this year	DATE	0.95+

Noor Faraby & Brian Brunner, Stripe Data Pipeline | AWS re:Invent 2022

>>Hello, fabulous cloud community and welcome to Las Vegas. We are the Cube and we will be broadcasting live from the AWS Reinvent Show floor for the next four days. This is our first opening segment. I am joined by the infamous John Furrier. John, it is your 10th year being here at Reinvent. How does >>It feel? It's been a great to see you. It feels great. I mean, just getting ready for the next four days. It's, this is the marathon of all tech shows. It's, it's busy, it's crowd, it's loud and the content and the people here are really kind of changing the game and the stories are always plentiful and deep and just it's, it really is one of those shows you kind of get intoxicated on the show floor and in the event and after hours people are partying. I mean it is like the big show and 10 years been amazing run People getting bigger. You're seeing the changing ecosystem Next Gen Cloud and you got the Classics Classic still kind of doing its thing. So getting a lot data, a lot of data stories. And our guests here are gonna talk more about that. This is the year the cloud kind of goes next gen and you start to see the success Gen One cloud players go on the next level. It's gonna be really fun. Fun week. >>Yes, I'm absolutely thrilled and you can certainly feel the excitement. The show floor doors just opened, people pouring in the drinks are getting stacked behind us. As you mentioned, it is gonna be a marathon and very exciting. On that note, fantastic interview to kick us off here. We're starting the day with Stripe. Please welcome nor and Brian, how are you both doing today? Excited to be here. >>Really happy to be here. Nice to meet you guys. Yeah, >>Definitely excited to be here. Nice to meet you. >>Yeah, you know, you were mentioning you could feel the temperature and the energy in here. It is hot, it's a hot show. We're a hot crew. Let's just be honest about that. No shame in that. No shame in that game. But I wanna, I wanna open us up. You know, Stripe serving 2 million customers according to the internet. AWS with 1 million customers of their own, both leading companies in your industries. What, just in case there's someone in the audience who hasn't heard of Stripe, what is Stripe and how can companies use it along with AWS nor, why don't you start us off? >>Yeah, so Stripe started back in 2010 originally as a payments company that helped businesses accept and process their payments online. So that was something that traditionally had been really tedious, kind of difficult for web developers to set up. And what Stripe did was actually introduce a couple of lines of code that developers could really easily integrate into their websites and start accepting those payments online. So payments is super core to who Stripe is as a company. It's something that we still focus on a lot today, but we actually like to think of ourselves now as more than just a payments company but rather financial infrastructure for the internet. And that's just because we have expanded into so many different tools and technologies that are beyond payments and actually help businesses with just about anything that they might need to do when it comes to the finances of running an online company. So what I mean by that, couple examples being setting up online marketplaces to accept multi-party payments, running subscriptions and recurring payments, collecting sales tax accurately and compliantly revenue recognition and data and analytics. Importantly on all of those things, which is what Brian and I focus on at Stripe. So yeah, since since 2010 Stripes really grown to serve millions of customers, as you said, from your small startups to your large multinational companies, be able to not only run their payments but also run complex financial operations online. >>Interesting. Even the Cube, the customer of Stripe, it's so easy to integrate. You guys got your roots there, but now as you guys got bigger, I mean you guys have massive traction and people are doing more, you guys are gonna talk here on the data pipeline in front you, the engineering manager. What has it grown to, I mean, what are some of the challenges and opportunities your customers are facing as they look at that data pipeline that you guys are talking about here at Reinvent? >>Yeah, so Stripe Data Pipeline really helps our customers get their data out of Stripe and into, you know, their data warehouse into Amazon Redshift. And that has been something that for our customers it's super important. They have a lot of other data sets that they want to join our Stripe data with to kind of get to more complex, more enriched insights. And Stripe data pipeline is just a really seamless way to do that. It lets you, without any engineering, without any coding, with pretty minimal setup, just connect your Stripe account to your Amazon Redshift data warehouse, really secure. It's encrypted, you know, it's scalable, it's gonna meet all of the needs of kind of a big enterprise and it gets you all of your Stripe data. So anything in our api, a lot of our reports are just like there for you to take and this just overcomes a big hurdle. I mean this is something that would take, you know, multiple engineers months to build if you wanted to do this in house. Yeah, we give it to you, you know, with a couple clicks. So it's kind of a, a step change for getting data out of Stripe into your data work. >>Yeah, the topic of this chat is getting more data outta your data from Stripe with the pipelining, this is kind of an interesting point, I want to get your thoughts. You guys are in the, in the front lines with customers, you know, stripes started out with their roots line of code, get up and running, payment gateway, whatever you wanna call it. Developers just want to get cash on the door. Thank you very much. Now you're kind of turning in growing up and continue to grow. Are you guys like a financial cloud? I mean, would you categorize yourself as a, cuz you're on top of aws? >>Yeah, financial infrastructure of the internet was a, was a claim I definitely wanna touch on from your, earlier today it was >>Powerful. You guys are super financial cloud basically. >>Yeah, super cloud basically the way that AWS kind of is the superstar in cloud computing. That's how we feel at Stripe that we want to put forth as financial infrastructure for the internet. So yeah, a lot of similarities. Actually it's funny, we're, we're really glad to be at aws. I think this is the first time that we've participated in a conference like this. But just to be able to participate and you know, be around AWS where we have a lot of synergies both as companies. Stripe is a customer of AWS and you know, for AWS users they can easily process payments through Stripe. So a lot of synergies there. And yeah, at a company level as well, we find ourselves really aligned with AWS in terms of the goals that we have for our users, helping them scale, expand globally, all of those good things. >>Let's dig in there a little bit more. Sounds like a wonderful collaboration. We love to hear of technology partnerships like that. Brian, talk to us a little bit about the challenges that the data pipeline solves from Stripe for Redshift users. >>Yeah, for sure. So Stripe Data Pipeline uses Amazon RedShift's built in data sharing capabilities, which gives you kind of an instant view into your Stripe data. If you weren't using Stripe data pipeline, you would have to, you know, ingest the state out of our api, kind of pull yourself manually. And yeah, I think that's just like a big part of it really is just the simplicity with what you can pull the data. >>Yeah, absolutely. And I mean the, the complexity of data and the volume of it is only gonna get bigger. So tools like that that can make things a lot easier are what we're all looking for. >>What's the machine learning angle? Cause I know there's lots of big topic here this year. More machine learning, more ai, a lot more solutions on top of the basic building blocks and the primitives at adds, you guys fit right into that. Cause developers doing more, they're either building their own or rolling out solutions. How do you guys see you guys connecting into that with the pipeline? Because, you know, data pipelining people like, they like that's, it feels like a heavy lift. What's the challenge there? Because when people roll their own or try to get in, it's, it's, it could be a lot of muck as they say. Yeah. What's the, what's the real pain point that you guys solve? >>So in terms of, you know, AI and machine learning, what Stripe Data Pipeline is gonna give you is it gives you a lot of signals around your payments that you can incorporate into your models. We actually have a number of customers that use Stripe radar data, so our fraud product and they integrate it with their in-house data that they get from other sources, have a really good understanding of fraud within their whole business. So it's kind of a way to get that data without having to like go through the process of ingesting it. So like, yeah, your, your team doesn't have to think about the ingestion piece. They can just think about, you know, building models, enriching the data, getting insights on top >>And Adam, so let's, we call it etl, the nasty three letter word in my interview with them. And that's what we're getting to where data is actually connecting via APIs and pipelines. Yes. Seamlessly into other data. So the data mashup, it feels like we're back into in the old mashup days now you've got data mashing up together. This integration's now a big practice, it's a becoming an industry standard. What are some of the patterns and matches that you see around how people are integrating their data? Because we all know machine learning works better when there's more data available and people want to connect their data and integrate it without the hassle. What's the, what's some of the use cases that >>Yeah, totally. So as Brian mentioned, there's a ton of use case for engineering teams and being able to get that data reported over efficiently and correctly and that's, you know, something exactly like you touched on that we're seeing nowadays is like simply having access to the data isn't enough. It's all about consolidating it correctly and accurately and effectively so that you can draw the best insights from that. So yeah, we're seeing a lot of use cases for teams across companies, including, a big example is finance teams. We had one of our largest users actually report that they were able to close their books faster than ever from integrating all of their Stripe revenue data for their business with their, the rest of their data in their data warehouse, which was traditionally something that would've taken them days, weeks, you know, having to do the manual aspect. But they were able to, to >>Simplify that, Savannah, you know, we were talking at the last event we were at Supercomputing where it's more speeds and feeds as people get more compute power, right? They can do more at the application level with developers. And one of the things we've been noticing I'd love to get your reaction to is as you guys have customers, millions of customers, are you seeing customers doing more with Stripe that's not just customers where they're more of an ecosystem partner of Stripe as people see that Stripe is not just a, a >>More comprehensive solution. >>Yeah. What's going on with the customer base? I can see the developers embedding it in, but once you get Stripe, you're like a, you're the plumbing, you're the financial bloodline if you will for the all the applications. Are your customers turning into partners, ecosystem partners? How do you see that? >>Yeah, so we definitely, that's what we're hoping to do. We're really hoping to be everything that a user needs when they wanna run an online business, be able to come in and maybe initially they're just using payments or they're just using billing to set up subscriptions but down the line, like as they grow, as they might go public, we wanna be able to scale with them and be able to offer them all of the products that they need to do. So Data Pipeline being a really important one for, you know, if you're a smaller company you might not be needing to leverage all of this big data and making important product decisions that you know, might come down to the very details, but as you scale, it's really something that we've seen a lot of our larger users benefit from. >>Oh and people don't wanna have to factor in too many different variables. There's enough complexity scaling a business, especially if you're headed towards IPO or something like that. Anyway, I love that the Stripe data pipeline is a no code solution as well. So people can do more faster. I wanna talk about it cuz it struck me right away on our lineup that we have engineering and product marketing on the stage with us. Now for those who haven't worked in a very high growth, massive company before, these teams can have a tiny bit of tension only because both teams want a lot of great things for the end user and their community. Tell me a little bit about the culture at Stripe and what it's like collaborating on the data pipeline. >>Yeah, I mean I, I can kick it off, you know, from, from the standpoint like we're on the same team, like we want to grow Stripe data pipeline, that is the goal. So whatever it takes to kind of get that job done is what we're gonna do. And I think that is something that is just really core to all of Stripe is like high collaboration, high trust, you know, this is something where we can all win if we work together. You don't need to, you know, compete with like products for like resourcing or to get your stuff done. It's like no, what's the, what's the, the team goal here, right? Like we're looking for team wins, not, you know, individual wins. >>Awesome. Yeah. And at the end of the day we have the same goal of connecting the product and the user in a way that makes sense and delivering the best product to that target user. So it's, it's really, it's a great collaboration and as Brian mentioned, the culture at Stripe really aligns with that as >>Well. So you got the engineering teams that get value outta that you guys are dealing with, that's your customer. But the security angle really becomes a big, I think catalyst cuz not just engineering, they gotta build stuff in so they're always building, but the security angle's interesting cuz now you got that data feeding security teams, this is becoming very secure security ops oriented. >>Yeah, you know, we are really, really tight partners with our internal security folks. They review everything that we do. We have a really robust security team. But I think, you know, kind of tying back to the Amazon side, like Amazon, Redshift is a very secure product and the way that we share data is really secure. You know, the, the sharing mechanism only works between encrypted clusters. So your data is encrypted at rest, encrypted and transit and excuse me, >>You're allowed to breathe. You also swallow the audience as well as your team at Stripe and all of us here at the Cube would like your survival. First and foremost, the knowledge we'll get to the people. >>Yeah, for sure. Where else was I gonna go? Yeah, so the other thing like you kind of mentioned, you know, there are these ETLs out there, but they, you know that that requires you to trust your data to a third party. So that's another thing here where like your data is only going from stripe to your cluster. There's no one in the middle, no one else has seen what you're doing, there's no other security risks. So security's a big focus and it kind of runs through the whole process both on our side and Amazon side. >>What's the most important story for Stripe at this event? You guys hear? How would you say, how would you say, and if you're on the elevator, what's going on with Stripe? Why now? What's so important at Reinvent for Stripe? >>Yeah, I mean I'm gonna use this as an opportunity to plug data pipelines. That's what we focus on. We're here representing the product, which is the easiest way for any user of aws, a user of Amazon, Redshift and a user of Stripe be able to connect the dots and get their data in the best way possible so that they can draw important business insights from that. >>Right? >>Yeah, I think, you know, I would double what North said, really grow Stripe data pipeline, get it to more customers, get more value for our customers by connecting them with their data and with reporting. I think that's, you know, my goal here is to talk to folks, kind of understand what they want to see out of their data and get them onto Stripe data pipeline. >>And you know, former Mike Mikela, former eight executive now over there at Stripe leading the charge, he knows a lot about Amazon here at aws. The theme tomorrow, Adams Leslie keynote, it's gonna be a lot about data, data integration, data end to end Lifeing, you see more, we call it data as code where engineering infrastructure as code was cloud was starting to see a big trend towards data as code where it's more of an engineering opportunity and solution insights. This data as code is kinda like the next evolution. What do you guys think about that? >>Yeah, definitely there is a ton that you can get out of your data if it's in the right place and you can analyze it in the correct ways. You know, you look at Redshift and you can pull data from Redshift into a ton of other products to like, you know, visualize it to get machine learning insights and you need the data there to be able to do this. So again, Stripe Data Pipeline is a great way to take your data and integrate it into the larger data picture that you're building within your company. >>I love that you are supporting businesses of all sizes and millions of them. No. And Brian, thank you so much for being here and telling us more about the financial infrastructure of the internet. That is Stripe, John Furrier. Thanks as always for your questions and your commentary. And thank you to all of you for tuning in to the Cubes coverage of AWS Reinvent Live here from Las Vegas, Nevada. I'm Savannah Peterson and we look forward to seeing you all week.

Published Date : Nov 29 2022

SUMMARY :

I am joined by the infamous John Furrier. kind of goes next gen and you start to see the success Gen One cloud players go Yes, I'm absolutely thrilled and you can certainly feel the excitement. Nice to meet you guys. Definitely excited to be here. Yeah, you know, you were mentioning you could feel the temperature and the energy in here. as you said, from your small startups to your large multinational companies, I mean you guys have massive traction and people are doing more, you guys are gonna talk here and it gets you all of your Stripe data. you know, stripes started out with their roots line of code, get up and running, payment gateway, whatever you wanna call it. You guys are super financial cloud basically. But just to be able to participate and you know, be around AWS We love to hear of technology of it really is just the simplicity with what you can pull the data. And I mean the, the complexity of data and the volume of it is only gonna get bigger. blocks and the primitives at adds, you guys fit right into that. So in terms of, you know, AI and machine learning, what Stripe Data Pipeline is gonna give you is matches that you see around how people are integrating their data? that would've taken them days, weeks, you know, having to do the manual aspect. Simplify that, Savannah, you know, we were talking at the last event we were at Supercomputing where it's more speeds and feeds as people I can see the developers embedding it in, but once you get Stripe, decisions that you know, might come down to the very details, but as you scale, Anyway, I love that the Stripe data pipeline is Yeah, I mean I, I can kick it off, you know, from, So it's, it's really, it's a great collaboration and as Brian mentioned, the culture at Stripe really aligns they gotta build stuff in so they're always building, but the security angle's interesting cuz now you Yeah, you know, we are really, really tight partners with our internal security folks. You also swallow the audience as well as your team at Stripe Yeah, so the other thing like you kind of mentioned, We're here representing the product, which is the easiest way for any user I think that's, you know, my goal here is to talk to folks, kind of understand what they want And you know, former Mike Mikela, former eight executive now over there at Stripe leading the charge, Yeah, definitely there is a ton that you can get out of your data if it's in the right place and you can analyze I love that you are supporting businesses of all sizes and millions of them.

ENTITIES

Entity	Category	Confidence
Brian	PERSON	0.99+
Mike Mikela	PERSON	0.99+
2010	DATE	0.99+
Brian Brunner	PERSON	0.99+
Stripe	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
Las Vegas	LOCATION	0.99+
John Furrier	PERSON	0.99+
Adam	PERSON	0.99+
John	PERSON	0.99+
10th year	QUANTITY	0.99+
Stripes	ORGANIZATION	0.99+
Savannah	PERSON	0.99+
Noor Faraby	PERSON	0.99+
1 million customers	QUANTITY	0.99+
10 years	QUANTITY	0.99+
both	QUANTITY	0.99+
Redshift	ORGANIZATION	0.99+
stripes	ORGANIZATION	0.99+
2 million customers	QUANTITY	0.99+
Las Vegas, Nevada	LOCATION	0.99+
both teams	QUANTITY	0.98+
first time	QUANTITY	0.98+
today	DATE	0.98+
First	QUANTITY	0.98+
aws	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
Stripe Data Pipeline	ORGANIZATION	0.97+
this year	DATE	0.97+
one	QUANTITY	0.97+
eight executive	QUANTITY	0.96+
tomorrow	DATE	0.96+
first opening segment	QUANTITY	0.96+
millions of customers	QUANTITY	0.96+
stripe	ORGANIZATION	0.91+
Adams Leslie	PERSON	0.9+

Breaking Analysis: re:Invent 2022 marks the next chapter in data & cloud

from the cube studios in Palo Alto in Boston bringing you data-driven insights from the cube and ETR this is breaking analysis with Dave vellante the ascendancy of AWS under the leadership of Andy jassy was marked by a tsunami of data and corresponding cloud services to leverage that data now those Services they mainly came in the form of Primitives I.E basic building blocks that were used by developers to create more sophisticated capabilities AWS in the 2020s being led by CEO Adam solipski will be marked by four high-level Trends in our opinion one A Rush of data that will dwarf anything we've previously seen two a doubling or even tripling down on the basic elements of cloud compute storage database security Etc three a greater emphasis on end-to-end integration of AWS services to simplify and accelerate customer adoption of cloud and four significantly deeper business integration of cloud Beyond it as an underlying element of organizational operations hello and welcome to this week's wikibon Cube insights powered by ETR in this breaking analysis we extract and analyze nuggets from John furrier's annual sit-down with the CEO of AWS we'll share data from ETR and other sources to set the context for the market and competition in cloud and we'll give you our glimpse of what to expect at re invent in 2022. now before we get into the core of our analysis Alibaba has announced earnings they always announced after the big three you know a month later and we've updated our Q3 slash November hyperscale Computing forecast for the year as seen here and we're going to spend a lot of time on this as most of you have seen the bulk of it already but suffice to say alibaba's cloud business is hitting that same macro Trend that we're seeing across the board but a more substantial slowdown than we expected and more substantial than its peers they're facing China headwinds they've been restructuring its Cloud business and it's led to significantly slower growth uh in in the you know low double digits as opposed to where we had it at 15 this puts our year-end estimates for 2022 Revenue at 161 billion still a healthy 34 growth with AWS surpassing 80 billion in 2022 Revenue now on a related note one of the big themes in Cloud that we've been reporting on is how customers are optimizing their Cloud spend it's a technique that they use and when the economy looks a little shaky and here's a graphic that we pulled from aws's website which shows the various pricing plans at a high level as you know they're much more granular than that and more sophisticated but Simplicity we'll just keep it here basically there are four levels first one here is on demand I.E pay by the drink now we're going to jump down to what we've labeled as number two spot instances that's like the right place at the right time I can use that extra capacity in the moment the third is reserved instances or RIS where I pay up front to get a discount and the fourth is sort of optimized savings plans where customers commit to a one or three year term and for a better price now you'll notice we labeled the choices in a different order than AWS presented them on its website and that's because we believe that the order that we chose is the natural progression for customers this started on demand they maybe experiment with spot instances they move to reserve instances when the cloud bill becomes too onerous and if you're large enough you lock in for one or three years okay the interesting thing is the order in which AWS presents them we believe that on-demand accounts for the majority of AWS customer spending now if you think about it those on-demand customers they're also at risk customers yeah sure there's some switching costs like egress and learning curve but many customers they have multiple clouds and they've got experience and so they're kind of already up to a learning curve and if you're not married to AWS with a longer term commitment there's less friction to switch now AWS here presents the most attractive plan from a financial perspective second after on demand and it's also the plan that makes the greatest commitment from a lock-in standpoint now In fairness to AWS it's also true that there is a trend towards subscription-based pricing and we have some data on that this chart is from an ETR drill down survey the end is 300. pay attention to the bars on the right the left side is sort of busy but the pink is subscription and you can see the trend upward the light blue is consumption based or on demand based pricing and you can see there's a steady Trend toward subscription now we'll dig into this in a later episode of Breaking analysis but we'll share with you a little some tidbits with the data that ETR provides you can select which segment is and pass or you can go up the stack Etc but so when you choose is and paths 44 of customers either prefer or are required to use on-demand pricing whereas around 40 percent of customers say they either prefer or are required to use subscription pricing again that's for is so now the further mu you move up the stack the more prominent subscription pricing becomes often with sixty percent or more for the software-based offerings that require or prefer subscription and interestingly cyber security tracks along with software at around 60 percent that that prefer subscription it's likely because as with software you're not shutting down your cyber protection on demand all right let's get into the expectations for reinvent and we're going to start with an observation in data in this 2018 book seeing digital author David michella made the point that whereas most companies apply data on the periphery of their business kind of as an add-on function successful data companies like Google and Amazon and Facebook have placed data at the core of their operations they've operationalized data and they apply machine intelligence to that foundational element why is this the fact is it's not easy to do what the internet Giants have done very very sophisticated engineering and and and cultural discipline and this brings us to reinvent 2022 in the future of cloud machine learning and AI will increasingly be infused into applications we believe the data stack and the application stack are coming together as organizations build data apps and data products data expertise is moving from the domain of Highly specialized individuals to Everyday business people and we are just at the cusp of this trend this will in our view be a massive theme of not only re invent 22 but of cloud in the 2020s the vision of data mesh We Believe jamachtagani's principles will be realized in this decade now what we'd like to do now is share with you a glimpse of the thinking of Adam solipsky from his sit down with John Furrier each year John has a one-on-one conversation with the CEO of AWS AWS he's been doing this for years and the outcome is a better understanding of the directional thinking of the leader of the number one Cloud platform so we're now going to share some direct quotes I'm going to run through them with some commentary and then bring in some ETR data to analyze the market implications here we go this is from solipsky quote I.T in general and data are moving from departments into becoming intrinsic parts of how businesses function okay we're talking here about deeper business integration let's go on to the next one quote in time we'll stop talking about people who have the word analyst we inserted data he meant data data analyst in their title rather will have hundreds of millions of people who analyze data as part of their day-to-day job most of whom will not have the word analyst anywhere in their title we're talking about graphic designers and pizza shop owners and product managers and data scientists as well he threw that in I'm going to come back to that very interesting so he's talking about here about democratizing data operationalizing data next quote customers need to be able to take an end-to-end integrated view of their entire data Journey from ingestion to storage to harmonizing the data to being able to query it doing business Intelligence and human-based Analysis and being able to collaborate and share data and we've been putting together we being Amazon together a broad Suite of tools from database to analytics to business intelligence to help customers with that and this last statement it's true Amazon has a lot of tools and you know they're beginning to become more and more integrated but again under jassy there was not a lot of emphasis on that end-to-end integrated view we believe it's clear from these statements that solipsky's customer interactions are leading him to underscore that the time has come for this capability okay continuing quote if you have data in one place you shouldn't have to move it every time you want to analyze that data couldn't agree more it would be much better if you could leave that data in place avoid all the ETL which has become a nasty three-letter word more and more we're building capabilities where you can query that data in place end quote okay this we see a lot in the marketplace Oracle with mySQL Heatwave the entire Trend toward converge database snowflake [ __ ] extending their platforms into transaction and analytics respectively and so forth a lot of the partners are are doing things as well in that vein let's go into the next quote the other phenomenon is infusing machine learning into all those capabilities yes the comments from the michelleographic come into play here infusing Ai and machine intelligence everywhere next one quote it's not a data Cloud it's not a separate Cloud it's a series of broad but integrated capabilities to help you manage the end-to-end life cycle of your data there you go we AWS are the cloud we're going to come back to that in a moment as well next set of comments around data very interesting here quote data governance is a huge issue really what customers need is to find the right balance of their organization between access to data and control and if you provide too much access then you're nervous that your data is going to end up in places that it shouldn't shouldn't be viewed by people who shouldn't be viewing it and you feel like you lack security around that data and by the way what happens then is people overreact and they lock it down so that almost nobody can see it it's those handcuffs there's data and asset are reliability we've talked about that for years okay very well put by solipsky but this is a gap in our in our view within AWS today and we're we're hoping that they close it at reinvent it's not easy to share data in a safe way within AWS today outside of your organization so we're going to look for that at re invent 2022. now all this leads to the following statement by solipsky quote data clean room is a really interesting area and I think there's a lot of different Industries in which clean rooms are applicable I think that clean rooms are an interesting way of enabling multiple parties to share and collaborate on the data while completely respecting each party's rights and their privacy mandate okay again this is a gap currently within AWS today in our view and we know snowflake is well down this path and databricks with Delta sharing is also on this curve so AWS has to address this and demonstrate this end-to-end data integration and the ability to safely share data in our view now let's bring in some ETR spending data to put some context around these comments with reference points in the form of AWS itself and its competitors and partners here's a chart from ETR that shows Net score or spending momentum on the x-axis an overlap or pervasiveness in the survey um sorry let me go back up the net scores on the y-axis and overlap or pervasiveness in the survey is on the x-axis so spending momentum by pervasiveness okay or should have share within the data set the table that's inserted there with the Reds and the greens that informs us to how the dots are positioned so it's Net score and then the shared ends are how the plots are determined now we've filtered the data on the three big data segments analytics database and machine learning slash Ai and we've only selected one company with fewer than 100 ends in the survey and that's databricks you'll see why in a moment the red dotted line indicates highly elevated customer spend at 40 percent now as usual snowflake outperforms all players on the y-axis with a Net score of 63 percent off the charts all three big U.S cloud players are above that line with Microsoft and AWS dominating the x-axis so very impressive that they have such spending momentum and they're so large and you see a number of other emerging data players like rafana and datadog mongodbs there in the mix and then more established players data players like Splunk and Tableau now you got Cisco who's gonna you know it's a it's a it's a adjacent to their core networking business but they're definitely into you know the analytics business then the really established players in data like Informatica IBM and Oracle all with strong presence but you'll notice in the red from the momentum standpoint now what you're going to see in a moment is we put red highlights around databricks Snowflake and AWS why let's bring that back up and we'll explain so there's no way let's bring that back up Alex if you would there's no way AWS is going to hit the brakes on innovating at the base service level what we call Primitives earlier solipsky told Furrier as much in their sit down that AWS will serve the technical user and data science Community the traditional domain of data bricks and at the same time address the end-to-end integration data sharing and business line requirements that snowflake is positioned to serve now people often ask Snowflake and databricks how will you compete with the likes of AWS and we know the answer focus on data exclusively they have their multi-cloud plays perhaps the more interesting question is how will AWS compete with the likes of Specialists like Snowflake and data bricks and the answer is depicted here in this chart AWS is going to serve both the technical and developer communities and the data science audience and through end-to-end Integrations and future services that simplify the data Journey they're going to serve the business lines as well but the Nuance is in all the other dots in the hundreds or hundreds of thousands that are not shown here and that's the AWS ecosystem you can see AWS has earned the status of the number one Cloud platform that everyone wants to partner with as they say it has over a hundred thousand partners and that ecosystem combined with these capabilities that we're discussing well perhaps behind in areas like data sharing and integrated governance can wildly succeed by offering the capabilities and leveraging its ecosystem now for their part the snowflakes of the world have to stay focused on the mission build the best products possible and develop their own ecosystems to compete and attract the Mind share of both developers and business users and that's why it's so interesting to hear solipski basically say it's not a separate Cloud it's a set of integrated Services well snowflake is in our view building a super cloud on top of AWS Azure and Google when great products meet great sales and marketing good things can happen so this will be really fun to watch what AWS announces in this area at re invent all right one other topic that solipsky talked about was the correlation between serverless and container adoption and you know I don't know if this gets into there certainly their hybrid place maybe it starts to get into their multi-cloud we'll see but we have some data on this so again we're talking about the correlation between serverless and container adoption but before we get into that let's go back to 2017 and listen to what Andy jassy said on the cube about serverless play the clip very very earliest days of AWS Jeff used to say a lot if I were starting Amazon today I'd have built it on top of AWS we didn't have all the capability and all the functionality at that very moment but he knew what was coming and he saw what people were still able to accomplish even with where the services were at that point I think the same thing is true here with Lambda which is I think if Amazon were starting today it's a given they would build it on the cloud and I think we with a lot of the applications that comprise Amazon's consumer business we would build those on on our serverless capabilities now we still have plenty of capabilities and features and functionality we need to add to to Lambda and our various serverless services so that may not be true from the get-go right now but I think if you look at the hundreds of thousands of customers who are building on top of Lambda and lots of real applications you know finra has built a good chunk of their market watch application on top of Lambda and Thompson Reuters has built you know one of their key analytics apps like people are building real serious things on top of Lambda and the pace of iteration you'll see there will increase as well and I really believe that to be true over the next year or two so years ago when Jesse gave a road map that serverless was going to be a key developer platform going forward and so lipsky referenced the correlation between serverless and containers in the Furrier sit down so we wanted to test that within the ETR data set now here's a screen grab of The View across 1300 respondents from the October ETR survey and what we've done here is we've isolated on the cloud computing segment okay so you can see right there cloud computing segment now we've taken the functions from Google AWS Lambda and Microsoft Azure functions all the serverless offerings and we've got Net score on the vertical axis we've got presence in the data set oh by the way 440 by the way is highly elevated remember that and then we've got on the horizontal axis we have the presence in the data center overlap okay that's relative to each other so remember 40 all these guys are above that 40 mark okay so you see that now what we're going to do this is just for serverless and what we're going to do is we're going to turn on containers to see the correlation and see what happens so watch what happens when we click on container boom everything moves to the right you can see all three move to the right Google drops a little bit but all the others now the the filtered end drops as well so you don't have as many people that are aggressively leaning into both but all three move to the right so watch again containers off and then containers on containers off containers on so you can see a really major correlation between containers and serverless okay so to get a better understanding of what that means I call my friend and former Cube co-host Stu miniman what he said was people generally used to think of VMS containers and serverless as distinctly different architectures but the lines are beginning to blur serverless makes things simpler for developers who don't want to worry about underlying infrastructure as solipsky and the data from ETR indicate serverless and containers are coming together but as Stu and I discussed there's a spectrum where on the left you have kind of native Cloud VMS in the middle you got AWS fargate and in the rightmost anchor is Lambda AWS Lambda now traditionally in the cloud if you wanted to use containers developers would have to build a container image they have to select and deploy the ec2 images that they or instances that they wanted to use they have to allocate a certain amount of memory and then fence off the apps in a virtual machine and then run the ec2 instances against the apps and then pay for all those ec2 resources now with AWS fargate you can run containerized apps with less infrastructure management but you still have some you know things that you can you can you can do with the with the infrastructure so with fargate what you do is you'd build the container images then you'd allocate your memory and compute resources then run the app and pay for the resources only when they're used so fargate lets you control the runtime environment while at the same time simplifying the infrastructure management you gotta you don't have to worry about isolating the app and other stuff like choosing server types and patching AWS does all that for you then there's Lambda with Lambda you don't have to worry about any of the underlying server infrastructure you're just running code AS functions so the developer spends their time worrying about the applications and the functions that you're calling the point is there's a movement and we saw in the data towards simplifying the development environment and allowing the cloud vendor AWS in this case to do more of the underlying management now some folks will still want to turn knobs and dials but increasingly we're going to see more higher level service adoption now re invent is always a fire hose of content so let's do a rapid rundown of what to expect we talked about operate optimizing data and the organization we talked about Cloud optimization there'll be a lot of talk on the show floor about best practices and customer sharing data solipsky is leading AWS into the next phase of growth and that means moving beyond I.T transformation into deeper business integration and organizational transformation not just digital transformation organizational transformation so he's leading a multi-vector strategy serving the traditional peeps who want fine-grained access to core services so we'll see continued Innovation compute storage AI Etc and simplification through integration and horizontal apps further up to stack Amazon connect is an example that's often cited now as we've reported many times databricks is moving from its stronghold realm of data science into business intelligence and analytics where snowflake is coming from its data analytics stronghold and moving into the world of data science AWS is going down a path of snowflake meet data bricks with an underlying cloud is and pass layer that puts these three companies on a very interesting trajectory and you can expect AWS to go right after the data sharing opportunity and in doing so it will have to address data governance they go hand in hand okay price performance that is a topic that will never go away and it's something that we haven't mentioned today silicon it's a it's an area we've covered extensively on breaking analysis from Nitro to graviton to the AWS acquisition of Annapurna its secret weapon new special specialized capabilities like inferential and trainium we'd expect something more at re invent maybe new graviton instances David floyer our colleague said he's expecting at some point a complete system on a chip SOC from AWS and maybe an arm-based server to eventually include high-speed cxl connections to devices and memories all to address next-gen applications data intensive applications with low power requirements and lower cost overall now of course every year Swami gives his usual update on machine learning and AI building on Amazon's years of sagemaker innovation perhaps a focus on conversational AI or a better support for vision and maybe better integration across Amazon's portfolio of you know large language models uh neural networks generative AI really infusing AI everywhere of course security always high on the list that reinvent and and Amazon even has reinforce a conference dedicated to it uh to security now here we'd like to see more on supply chain security and perhaps how AWS can help there as well as tooling to make the cio's life easier but the key so far is AWS is much more partner friendly in the security space than say for instance Microsoft traditionally so firms like OCTA and crowdstrike in Palo Alto have plenty of room to play in the AWS ecosystem we'd expect of course to hear something about ESG it's an important topic and hopefully how not only AWS is helping the environment that's important but also how they help customers save money and drive inclusion and diversity again very important topics and finally come back to it reinvent is an ecosystem event it's the Super Bowl of tech events and the ecosystem will be out in full force every tech company on the planet will have a presence and the cube will be featuring many of the partners from the serial floor as well as AWS execs and of course our own independent analysis so you'll definitely want to tune into thecube.net and check out our re invent coverage we start Monday evening and then we go wall to wall through Thursday hopefully my voice will come back we have three sets at the show and our entire team will be there so please reach out or stop by and say hello all right we're going to leave it there for today many thanks to Stu miniman and David floyer for the input to today's episode of course John Furrier for extracting the signal from the noise and a sit down with Adam solipski thanks to Alex Meyerson who was on production and manages the podcast Ken schiffman as well Kristen Martin and Cheryl Knight helped get the word out on social and of course in our newsletters Rob hoef is our editor-in-chief over at siliconangle does some great editing thank thanks to all of you remember all these episodes are available as podcasts wherever you listen you can pop in the headphones go for a walk just search breaking analysis podcast I published each week on wikibon.com at siliconangle.com or you can email me at david.valante at siliconangle.com or DM me at di vallante or please comment on our LinkedIn posts and do check out etr.ai for the best survey data in the Enterprise Tech business this is Dave vellante for the cube insights powered by ETR thanks for watching we'll see it reinvent or we'll see you next time on breaking analysis [Music]

Published Date : Nov 26 2022

SUMMARY :

so now the further mu you move up the

ENTITIES

Entity	Category	Confidence
David michella	PERSON	0.99+
Alex Meyerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Dave vellante	PERSON	0.99+
David floyer	PERSON	0.99+
Kristen Martin	PERSON	0.99+
John	PERSON	0.99+
sixty percent	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Adam solipski	PERSON	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2022	DATE	0.99+
Andy jassy	PERSON	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
2017	DATE	0.99+
Palo Alto	LOCATION	0.99+
40 percent	QUANTITY	0.99+
alibaba	ORGANIZATION	0.99+
Lambda	TITLE	0.99+
63 percent	QUANTITY	0.99+
1300 respondents	QUANTITY	0.99+
Super Bowl	EVENT	0.99+
80 billion	QUANTITY	0.99+
John furrier	PERSON	0.99+
Thursday	DATE	0.99+
Cisco	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Monday evening	DATE	0.99+
Jesse	PERSON	0.99+
Stu miniman	PERSON	0.99+
siliconangle.com	OTHER	0.99+
October	DATE	0.99+
thecube.net	OTHER	0.99+
fourth	QUANTITY	0.99+
a month later	DATE	0.99+
third	QUANTITY	0.99+
hundreds of thousands	QUANTITY	0.99+
fargate	ORGANIZATION	0.99+

Dell Technologies |The Future of Multicloud Data Protection is Here 11-14

>>Prior to the pandemic, organizations were largely optimized for efficiency as the best path to bottom line profits. Many CIOs tell the cube privately that they were caught off guard by the degree to which their businesses required greater resiliency beyond their somewhat cumbersome disaster recovery processes. And the lack of that business resilience has actually cost firms because they were unable to respond to changing market forces. And certainly we've seen this dynamic with supply chain challenges and there's a little doubt. We're also seeing it in the area of cybersecurity generally, and data recovery. Specifically. Over the past 30 plus months, the rapid adoption of cloud to support remote workers and build in business resilience had the unintended consequences of expanding attack vectors, which brought an escalation of risk from cybercrime. Well, security in the public clouds is certainly world class. The result of multi-cloud has brought with it multiple shared responsibility models, multiple ways of implementing security policies across clouds and on-prem. >>And at the end of the day, more, not less complexity, but there's a positive side to this story. The good news is that public policy industry collaboration and technology innovation is moving fast to accelerate data protection and cybersecurity strategies with a focus on modernizing infrastructure, securing the digital supply chain, and very importantly, simplifying the integration of data protection and cybersecurity. Today there's heightened awareness that the world of data protection is not only an adjacency to, but it's becoming a fundamental component of cybersecurity strategies. In particular, in order to build more resilience into a business, data protection, people, technologies, and processes must be more tightly coordinated with security operations. Hello and welcome to the future of Multi-Cloud Data Protection Made Possible by Dell in collaboration with the Cube. My name is Dave Ante and I'll be your host today. In this segment, we welcome into the cube, two senior executives from Dell who will share details on new technology announcements that directly address these challenges. >>Jeff Boudreau is the president and general manager of Dell's Infrastructure Solutions Group, isg, and he's gonna share his perspectives on the market and the challenges he's hearing from customers. And we're gonna ask Jeff to double click on the messages that Dell is putting into the marketplace and give us his detailed point of view on what it means for customers. Now, Jeff is gonna be joined by Travis Vhi. Travis is the senior Vice President of product management for ISG at Dell Technologies, and he's gonna give us details on the products that are being announced today and go into the hard news. Now, we're also gonna challenge our guests to explain why Dell's approach is unique and different in the marketplace. Thanks for being with us. Let's get right into it. We're here with Jeff Padre and Travis Behill. We're gonna dig into the details about Dell's big data protection announcement. Guys, good to see you. Thanks >>For coming in. Good to see you. Thank you for having us. >>You're very welcome. Right. Let's start off, Jeff, with the high level, you know, I'd like to talk about the customer, what challenges they're facing. You're talking to customers all the time, What are they telling you? >>Sure. As you know, we do, we spend a lot of time with our customers, specifically listening, learning, understanding their use cases, their pain points within their specific environments. They tell us a lot. Notice no surprise to any of us, that data is a key theme that they talk about. It's one of their most important, important assets. They need to extract more value from that data to fuel their business models, their innovation engines, their competitive edge. So they need to make sure that that data is accessible, it's secure in its recoverable, especially in today's world with the increased cyber attacks. >>Okay. So maybe we could get into some of those, those challenges. I mean, when, when you talk about things like data sprawl, what do you mean by that? What should people know? Sure. >>So for those big three themes, I'd say, you know, you have data sprawl, which is the big one, which is all about the massive amounts of data. It's the growth of that data, which is growing at an unprecedented rates. It's the gravity of that data and the reality of the multi-cloud sprawl. So stuff is just everywhere, right? Which increases that service a tax base for cyber criminals. >>And by gravity you mean the data's there and people don't wanna move it. >>It's everywhere, right? And so when it lands someplace, I think edge, core or cloud, it's there and that's, it's something we have to help our customers with. >>Okay, so just it's nuanced cuz complexity has other layers. What are those >>Layers? Sure. When we talk to our customers, they tell us complexity is one of their big themes. And specifically it's around data complexity. We talked about that growth and gravity of the data. We talk about multi-cloud complexity and we talk about multi-cloud sprawl. So multiple vendors, multiple contracts, multiple tool chains, and none of those work together in this, you know, multi-cloud world. Then that drives their security complexity. So we talk about that increased attack surface, but this really drives a lot of operational complexity for their teams. Think about we're lack consistency through everything. So people, process, tools, all that stuff, which is really wasting time and money for our customers. >>So how does that affect the cyber strategies and the, I mean, I've often said the ciso now they have this shared responsibility model, they have to do that across multiple clouds. Every cloud has its own security policies and, and frameworks and syntax. So maybe you could double click on your perspective on that. >>Sure. I'd say the big, you know, the big challenge customers have seen, it's really inadequate cyber resiliency. And specifically they're feeling, feeling very exposed. And today as the world with cyber tax being more and more sophisticated, if something goes wrong, it is a real challenge for them to get back up and running quickly. And that's why this is such a, a big topic for CEOs and businesses around the world. >>You know, it's funny, I said this in my open, I, I think that prior to the pandemic businesses were optimized for efficiency and now they're like, wow, we have to actually put some headroom into the system to be more resilient. You know, I you hearing >>That? Yeah, we absolutely are. I mean, the customers really, they're asking us for help, right? It's one of the big things we're learning and hearing from them. And it's really about three things, one's about simplifying it, two, it is really helping them to extract more value from their data. And then the third big, big piece is ensuring their data is protected and recoverable regardless of where it is going back to that data gravity and that very, you know, the multi-cloud world just recently, I don't know if you've seen it, but the global data protected, excuse me, the global data protection index gdp. >>I, Yes. Jesus. Not to be confused with gdpr, >>Actually that was released today and confirms everything we just talked about around customer challenges, but also it highlights an importance of having a very cyber, a robust cyber resilient data protection strategy. >>Yeah, I haven't seen the latest, but I, I want to dig into it. I think this, you've done this many, many years in a row. I like to look at the, the, the time series and see how things have changed. All right. At, at a high level, Jeff, can you kind of address why Dell and from your point of view is best suited? >>Sure. So we believe there's a better way or a better approach on how to handle this. We think Dell is uniquely positioned to help our customers as a one stop shop, if you will, for that cyber resilient multi-cloud data protection solution and needs. We take a modern, a simple and resilient approach. >>What does that mean? What, what do you mean by modern? >>Sure. So modern, we talk about our software defined architecture, right? It's really designed to meet the needs not only of today, but really into the future. And we protect data across any cloud and any workload. So we have a proven track record doing this today. We have more than 1700 customers that trust us to protect them more than 14 exabytes of their data in the cloud today. >>Okay, so you said modern, simple and resilient. What, what do you mean by simple? Sure. >>We wanna provide simplicity everywhere, going back to helping with the complexity challenge, and that's from deployment to consumption to management and support. So our offers will deploy in minutes. They are easy to operate and use, and we support flexible consumption models for whatever customer may desire. So traditional subscription or as a service. >>And when you, when you talk about resilient, I mean, I, I put forth that premise, but it's hard because people say, Well, that's gonna gonna cost us more. Well, it may, but you're gonna also reduce your, your risk. So what's your point of view on resilience? >>Yeah, I think it's, it's something all customers need. So we're gonna be providing a comprehensive and resilient portfolio of cyber solutions that are secured by design. We have some ver some unique capabilities and a combination of things like built in amenability, physical and logical isolation. We have intelligence built in with AI par recovery. And just one, I guess fun fact for everybody is we have our cyber vault is the only solution in the industry that is endorsed by Sheltered Harbor that meets all the needs of the financial sector. >>So it's interesting when you think about the, the NIST framework for cybersecurity, it's all about about layers. You're sort of bringing that now to, to data protection, correct? Yeah. All right. In a minute we're gonna come back with Travis and dig into the news. We're gonna take a short break. Keep it right there. Okay. We're back with Jeff and Travis Vhi to dig deeper into the news. Guys, again, good to see you. Travis, if you could, maybe you, before we get into the news, can you set the business context for us? What's going on out there? >>Yeah, thanks for that question, Dave. To set a little bit of the context, when you look at the data protection market, Dell has been a leader in providing solutions to customers for going on nearly two decades now. We have tens of thousands of people using our appliances. We have multiple thousands of people using our latest modern simple power protect data managers software. And as Jeff mentioned, we have, you know, 1700 customers protecting 14 exabytes of data in the public clouds today. And that foundation gives us a unique vantage point. We talked to a lot of customers and they're really telling us three things. They want simple solutions, they want us to help them modernize and they want us to add as the highest priority, maintain that high degree of resiliency that they expect from our data protection solutions. So tho that's the backdrop to the news today. And, and as we go through the news, I think you'll, you'll agree that each of these announcements deliver on those pillars. And in particular today we're announcing the Power Protect data manager appliance. We are announcing power protect cyber recovery enhancements, and we are announcing enhancements to our Apex data storage >>Services. Okay, so three pieces. Let's, let's dig to that. It's interesting appliance, everybody wants software, but then you talk to customers and they're like, Well, we actually want appliances because we just wanna put it in and it works, right? It performs great. So, so what do we need to know about the appliance? What's the news there? Well, >>You know, part of the reason I gave you some of those stats to begin with is that we have this strong foundation of, of experience, but also intellectual property components that we've taken that have been battle tested in the market. And we've put them together in a new simple integrated appliance that really combines the best of the target appliance capabilities we have with that modern simple software. And we've integrated it from the, you know, sort of taking all of those pieces, putting them together in a simple, easy to use and easy to scale interface for customers. >>So the premise that I've been putting forth for, you know, months now, probably well, well over a year, is that, that that data protection is becoming an extension of your, your cybersecurity strategies. So I'm interested in your perspective on cyber recovery, you specific news that you have there. >>Yeah, you know, we, we are, in addition to simplifying things via the, the appliance, we are providing solutions for customers no matter where they're deploying. And cyber recovery, especially when it comes to cloud deployments, is an increasing area of interest and deployment that we see with our customers. So what we're announcing today is that we're expanding our cyber recovery services to be available in Google Cloud with this announcement. It means we're available in all three of the major clouds and it really provides customers the flexibility to secure their data no matter if they're running, you know, on premises in a colo at the edge in the public cloud. And the other nice thing about this, this announcement is that you have the ability to use Google Cloud as a cyber recovery vault that really allows customers to isolate critical data and they can recover that critical data from the vault back to on premises or from that vault back to running their cyber cyber protection or their data protection solutions in the public cloud. >>I always invoke my, my favorite Matt Baker here. It's not a zero sum game, but this is a perfect example where there's opportunities for a company like Dell to partner with the public cloud provider. You've got capabilities that don't exist there. You've got the on-prem capabilities. We can talk about edge all day, but that's a different topic. Okay, so my, my other question Travis, is how does this all fit into Apex? We hear a lot about Apex as a service, it's sort of the new hot thing. What's happening there? What's the news around Apex? >>Yeah, we, we've seen incredible momentum with our Apex solutions since we introduced data protection options into them earlier this year. And we're really building on that momentum with this announcement being, you know, providing solutions that allow customers to consume flexibly. And so what we're announcing specifically is that we're expanding Apex data storage services to include a data protection option. And it's like with all Apex offers, it's a pay as you go solution really streamlines the process of customers purchasing, deploying, maintaining and managing their backup software. All a customer really needs to do is, you know, specify their base capacity, they specify their performance tier, they tell us do they want a a one year term or a three year term and we take it from there. We, we get them up and running so they can start deploying and consuming flexibly. And it's, as with many of our Apex solutions, it's a simple user experience all exposed through a unified Apex console. >>Okay. So it's you keeping it simple, like I think large, medium, small, you know, we hear a lot about t-shirt sizes. I I'm a big fan of that cuz you guys should be smart enough to figure out, you know, based on my workload, what I, what I need, how different is this? I wonder if you guys could, could, could address this. Jeff, maybe you can, >>You can start. Sure. I'll start and then pitch me, you know, Travis, you you jump in when I screw up here. So, awesome. So first I'd say we offer innovative multi-cloud data protection solutions. We provide that deliver performance, efficiency and scale that our customers demand and require. We support as Travis and all the major public clouds. We have a broad ecosystem of workload support and I guess the, the great news is we're up to 80% more cost effective than any of the competition. >>80%. 80%, That's a big number, right? Travis, what's your point of view on this? Yeah, >>I, I think number one, end to end data protection. We, we are that one stop shop that I talked about. Whether it's a simplified appliance, whether it's deployed in the cloud, whether it's at the edge, whether it's integrated appliances, target appliances, software, we have solutions that span the gamut as a service. I mentioned the Apex solution as well. So really we can, we can provide solutions that help support customers and protect them, any workload, any cloud, anywhere that data lives edge core to cloud. The other thing that we hear as a, as a, a big differentiator for Dell and, and Jeff touched on on this a little bit earlier, is our intelligent cyber resiliency. We have a unique combination in, in the market where we can offer immutability or protection against deletion as, as sort of that first line of defense. But we can also offer a second level of defense, which is isolation, talking, talking about data vaults or cyber vaults and cyber recovery. And the, at more importantly, the intelligence that goes around that vault. It can look at detecting cyber attacks, it can help customers speed time to recovery and really provides AI and ML to help early diagnosis of a cyber attack and fast recovery should a cyber attack occur. And, and you know, if you look at customer adoption of that solution specifically in the clouds, we have over 1300 customers utilizing power protect cyber recovery. >>So I think it's fair to say that your, I mean your portfolio has obvious been a big differentiator whenever I talk to, you know, your finance team, Michael Dell, et cetera, that end to end capability that that, that your ability to manage throughout the supply chain. We actually just did a a, an event recently with you guys where you went into what you're doing to make infrastructure trusted. And so my take on that is you, in a lot of respects, you're shifting, you know, the client's burden to your r and d now they have a lot of work to do, so it's, it's not like they can go home and just relax, but, but that's a key part of the partnership that I see. Jeff, I wonder if you could give us the, the, the final thoughts. >>Sure. Dell has a long history of being a trusted partner with it, right? So we have unmatched capabilities. Going back to your point, we have the broadest portfolio, we have, you know, we're a leader in every category that we participate in. We have a broad deep breadth of portfolio. We have scale, we have innovation that is just unmatched within data protection itself. We have the trusted market leader, no, if and or buts, we're number one for both data protection software in appliances per idc and we would just name for the 17th consecutive time the leader in the, the Gartner Magic Quadrant. So bottom line is customers can count on Dell. >>Yeah, and I think again, we're seeing the evolution of, of data protection. It's not like the last 10 years, it's really becoming an adjacency and really a key component of your cyber strategy. I think those two parts of the organization are coming together. So guys, really appreciate your time. Thanks for Thank you sir. Thanks Travis. Good to see you. All right, in a moment I'm gonna come right back and summarize what we learned today, what actions you can take for your business. You're watching the future of multi-cloud data protection made possible by Dell and collaboration with the cube, your leader in enterprise and emerging tech coverage right back >>In our data driven world. Protecting data has never been more critical to guard against everything from cyber incidents to unplanned outages. You need a cyber resilient, multi-cloud data protection strategy. >>It's not a matter of if you're gonna get hacked, it's a matter of when. And I wanna know that I can recover and continue to recover each day. >>It is important to have a cyber security and a cyber resiliency plan in place because the threat of cyber attack are imminent. >>Power protects. Data manager from Dell Technologies helps deliver the data protection and security confidence you would expect from a trusted partner and market leader. >>We chose Power Protect Data Manager because we've been a strategic partner with Dell Technologies for roughly 20 years now. Our partnership with Dell Technologies has provided us with the ability to scale and grow as we've transitioned from 10 billion in assets to 20 billion. >>With Power Protect Data Manager, you can enjoy exceptional ease of use to increase your efficiency and reduce costs. >>Got installed it by myself, learned it by myself with very intuitive >>While restoring a machine with Power Protect Data Manager is fast. We can fully manage Power Protect through the center. We can recover a whole machine in seconds. >>Data Manager offers innovation such as Transparent snapshots to simplify virtual machine backups and it goes beyond backup and restore to provide valuable insights and to protected data workloads and VMs. >>In our previous environment, it would take anywhere from three to six hours at night to do a single backup of each vm. Now we're backing up hourly and it takes two to three seconds with the transparent snapshots. >>With Power Protects Data Manager, you get the peace of mind knowing that your data is safe and available whenever you need it. >>Data is extremely important. We can't afford to lose any data. We need things just to work. >>Start your journey to modern data protection with Dell Power Protect Data manager. Visit dell.com/power Protect Data Manager. >>We put forth the premise in our introduction that the worlds of data protection in cybersecurity must be more integrated. We said that data recovery strategies have to be built into security practices and procedures and by default this should include modern hardware and software. Now in addition to reviewing some of the challenges that customers face, which have been pretty well documented, we heard about new products that Dell Technologies is bringing to the marketplace that specifically address these customer concerns. There were three that we talked about today. First, the Power Protect Data Manager Appliance, which is an integrated system taking advantage of Dell's history in data protection, but adding new capabilities. And I want to come back to that in the moment. Second is Dell's Power Protect cyber recovery for Google Cloud platform. This rounds out the big three public cloud providers for Dell, which joins AWS and and Azure support. >>Now finally, Dell has made its target backup appliances available in Apex. You might recall earlier this year we saw the introduction from Dell of Apex backup services and then in May at Dell Technologies world, we heard about the introduction of Apex Cyber Recovery Services. And today Dell is making its most popular backup appliances available and Apex. Now I wanna come back to the Power Protect data manager appliance because it's a new integrated appliance. And I asked Dell off camera really what is so special about these new systems and what's really different from the competition because look, everyone offers some kind of integrated appliance. So I heard a number of items, Dell talked about simplicity and efficiency and containers and Kubernetes. So I kind of kept pushing and got to what I think is the heart of the matter in two really important areas. One is simplicity. >>Dell claims that customers can deploy the system in half the time relative to the competition. So we're talking minutes to deploy and of course that's gonna lead to much simpler management. And the second real difference I heard was backup and restore performance for VMware workloads. In particular, Dell has developed transparent snapshot capabilities to fundamentally change the way VMs are protected, which leads to faster backup and restores with less impact on virtual infrastructure. Dell believes this new development is unique in the market and claims that in its benchmarks the new appliance was able to back up 500 virtual machines in 47% less time compared to a leading competitor. Now this is based on Dell benchmarks, so hopefully these are things that you can explore in more detail with Dell to see if and how they apply to your business. So if you want more information, go to the data protectionPage@dell.com. You can find that at dell.com/data protection. And all the content here and other videos are available on demand@thecube.net. Check out our series on the blueprint for trusted infrastructure, it's related and has some additional information. And go to silicon angle.com for all the news and analysis related to these and other announcements. This is Dave Valante. Thanks for watching the future of multi-cloud protection made possible by Dell in collaboration with the Cube, your leader in enterprise and emerging tech coverage.

Published Date : Nov 17 2022

SUMMARY :

And the lack of that business And at the end of the day, more, not less complexity, Jeff Boudreau is the president and general manager of Dell's Infrastructure Solutions Group, Good to see you. Let's start off, Jeff, with the high level, you know, I'd like to talk about the So they need to make sure that that data data sprawl, what do you mean by that? So for those big three themes, I'd say, you know, you have data sprawl, which is the big one, which is all about the massive amounts it's something we have to help our customers with. Okay, so just it's nuanced cuz complexity has other layers. We talked about that growth and gravity of the data. So how does that affect the cyber strategies and the, And today as the world with cyber tax being more and more sophisticated, You know, it's funny, I said this in my open, I, I think that prior to the pandemic businesses that very, you know, the multi-cloud world just recently, I don't know if you've seen it, but the global data protected, Not to be confused with gdpr, Actually that was released today and confirms everything we just talked about around customer challenges, At, at a high level, Jeff, can you kind of address why Dell and from your point of We think Dell is uniquely positioned to help our customers as a one stop shop, if you will, It's really designed to meet the needs What, what do you mean by simple? We wanna provide simplicity everywhere, going back to helping with the complexity challenge, and that's from deployment So what's your point of view on resilience? Harbor that meets all the needs of the financial sector. So it's interesting when you think about the, the NIST framework for cybersecurity, it's all about about layers. And as Jeff mentioned, we have, you know, 1700 customers protecting 14 exabytes but then you talk to customers and they're like, Well, we actually want appliances because we just wanna put it in and it works, You know, part of the reason I gave you some of those stats to begin with is that we have this strong foundation of, So the premise that I've been putting forth for, you know, months now, probably well, well over a year, is an increasing area of interest and deployment that we see with our customers. it's sort of the new hot thing. All a customer really needs to do is, you know, specify their base capacity, I I'm a big fan of that cuz you guys should be smart enough to figure out, you know, based on my workload, We support as Travis and all the major public clouds. Travis, what's your point of view on of that solution specifically in the clouds, So I think it's fair to say that your, I mean your portfolio has obvious been a big differentiator whenever I talk to, We have the trusted market leader, no, if and or buts, we're number one for both data protection software in what we learned today, what actions you can take for your business. Protecting data has never been more critical to guard against that I can recover and continue to recover each day. It is important to have a cyber security and a cyber resiliency Data manager from Dell Technologies helps deliver the data protection and security We chose Power Protect Data Manager because we've been a strategic partner with With Power Protect Data Manager, you can enjoy exceptional ease of use to increase your efficiency We can fully manage Power Data Manager offers innovation such as Transparent snapshots to simplify virtual Now we're backing up hourly and it takes two to three seconds with the transparent With Power Protects Data Manager, you get the peace of mind knowing that your data is safe and available We need things just to work. Start your journey to modern data protection with Dell Power Protect Data manager. We put forth the premise in our introduction that the worlds of data protection in cybersecurity So I kind of kept pushing and got to what I think is the heart of the matter in two really Dell claims that customers can deploy the system in half the time relative to the

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Dave Valante	PERSON	0.99+
Jeff Boudreau	PERSON	0.99+
Travis	PERSON	0.99+
Dave	PERSON	0.99+
Dell	ORGANIZATION	0.99+
10 billion	QUANTITY	0.99+
Dell Technologies	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Travis Behill	PERSON	0.99+
First	QUANTITY	0.99+
demand@thecube.net	OTHER	0.99+
AWS	ORGANIZATION	0.99+
20 billion	QUANTITY	0.99+
Dave Ante	PERSON	0.99+
two	QUANTITY	0.99+
Jeff Padre	PERSON	0.99+
Sheltered Harbor	ORGANIZATION	0.99+
Matt Baker	PERSON	0.99+
more than 1700 customers	QUANTITY	0.99+
May	DATE	0.99+
Second	QUANTITY	0.99+
1700 customers	QUANTITY	0.99+
more than 14 exabytes	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
Dell Technologies	ORGANIZATION	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
two senior executives	QUANTITY	0.99+
three seconds	QUANTITY	0.99+
second	QUANTITY	0.99+
Apex	ORGANIZATION	0.99+
each	QUANTITY	0.99+
three pieces	QUANTITY	0.99+
third	QUANTITY	0.99+
two parts	QUANTITY	0.99+
Today	DATE	0.99+
six hours	QUANTITY	0.99+
each day	QUANTITY	0.99+
both	QUANTITY	0.98+
over 1300 customers	QUANTITY	0.98+
Solutions Group	ORGANIZATION	0.98+
three things	QUANTITY	0.98+
dell.com/power	OTHER	0.98+
Jesus	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
thousands of people	QUANTITY	0.97+

Introduction The Future of Multicloud Data Protection is Here 11-14

Published Date : Nov 14 2022

SUMMARY :

And the lack of that And at the end of the day, more, not less complexity, Jeff Boudreau is the president and general manager of Dell's Infrastructure Solutions Group,

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Boudreau	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Travis	PERSON	0.99+
Dave Ante	PERSON	0.99+
two senior executives	QUANTITY	0.99+
Today	DATE	0.99+
today	DATE	0.98+
Solutions Group	ORGANIZATION	0.98+
ISG	ORGANIZATION	0.97+
Cube	COMMERCIAL_ITEM	0.96+
pandemic	EVENT	0.95+
Dell Technologies	ORGANIZATION	0.94+
11-14	DATE	0.77+
past 30 plus months	DATE	0.63+
isg	ORGANIZATION	0.63+
Cube	ORGANIZATION	0.62+

Tim Yocum, Influx Data | Evolving InfluxDB into the Smart Data Platform

(soft electronic music) >> Okay, we're back with Tim Yocum who is the Director of Engineering at InfluxData. Tim, welcome, good to see you. >> Good to see you, thanks for having me. >> You're really welcome. Listen, we've been covering opensource software on theCUBE for more than a decade and we've kind of watched the innovation from the big data ecosystem, the cloud is being built out on opensource, mobile, social platforms, key databases, and of course, InfluxDB. And InfluxData has been a big consumer and crontributor of opensource software. So my question to you is where have you seen the biggest bang for the buck from opensource software? >> So yeah, you know, Influx really, we thrive at the intersection of commercial services and opensource software, so OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service from our core storage engine technologies to web services, templating engines. Our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants. And like you've mentioned, even better, we contribute a lot back to the projects that we use, as well as our own product InfluxDB. >> But I got to ask you, Tim, because one of the challenge that we've seen, in particular, you saw this in the heyday of Hadoop, the innovations come so fast and furious, and as a software company, you got to place bets, you got to commit people, and sometimes those bets can be risky and not pay off. So how have you managed this challenge? >> Oh, it moves fast, yeah. That's a benefit, though, because the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we tend to do is we fail fast and fail often; we try a lot of things. You know, you look at Kubernetes, for example. That ecosystem is driven by thousands of intelligent developers, engineers, builders. They're adding value every day, so we have to really keep up with that. And as the stack changes, we try different technologies, we try different methods. And at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's something that we just do every day. >> So we have a survey partner down in New York City called Enterprise Technology Research, ETR, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes, is one of the areas that is kind of, it's been off the charts and seen the most significant adoption and velocity particularly along with cloud, but really, Kubernetes is just, you know, still up and to the right consistently, even with the macro headwinds and all of the other stuff that we're sick of talking about. So what do you do with Kubernetes in the platform? >> Yeah, it's really central to our ability to run the product. When we first started out, we were just on AWS and the way we were running was a little bit like containers junior. Now we're running Kubernetes everywhere at AWS, Azure, Google cloud. It allows us to have a consistent experience across three different cloud providers and we can manage that in code. So our developers can focus on delivering services not trying to learn the intricacies of Amazon, Azure, and Google, and figure out how to deliver services on those three clouds with all of their differences. >> Just a followup on that, is it now, so I presume it sounds like there's a PaaS layer there to allow you guys to have a consistent experience across clouds and out to the edge, wherever. Is that correct? >> Yeah, so we've basically built more or less platform engineering is this the new, hot phrase. Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on and they only have to learn one way of deploying their application, managing their application. And so that just gets all of the underlying infrastructure out of the way and lets them focus on delivering Influx cloud. >> And I know I'm taking a little bit of a tangent, but is that, I'll call it a PaaS layer, if I can use that term, are there specific attributes to InfluxDB or is it kind of just generally off-the-shelf PaaS? Is there any purpose built capability there that is value-add or is it pretty much generic? >> So we really build, we look at things with a build versus buy, through a build versus buy lens. Some things we want to leverage, cloud provider services, for instance, POSTGRES databases for metadata, perhaps. Get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can deliver on, that has consistency, that is all generated from code. that we can, as an SRE group, as an OPS team, that we can manage with very few people, really, and we can stamp out clusters across multiple regions in no time. >> So sometimes you build, sometimes you buy it. How do you make those decisions and what does that mean for the platform and for customers? >> Yeah, so what we're doing is, it's like everybody else will do. We're looking for trade-offs that make sense. We really want to protect our customers' data, so we look for services that support our own software with the most up-time reliability and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team and of course, for our customers; you don't even see that. But we don't want to try to reinvent the wheel, like I had mentioned with SQL datasource for metadata, perhaps. Let's build on top of what of these three large cloud providers have already perfected and we can then focus on our platform engineering and we can help our developers then focus on the InfluxData software, the Influx cloud software. >> So take it to the customer level. What does it mean for them, what's the value that they're going to get out of all these innovations that we've been talking about today, and what can they expect in the future? >> So first of all, people who use the OSS product are really going to be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you, but then you want to scale up. We have some 270 terabytes of data across over four billion series keys that people have stored, so there's a proven ability to scale. Now in terms of the opensource software and how we've developed the platform, you're getting highly available, high cardinality time-series platform. We manage it and really, as I had mentioned earlier, we can keep up with the state of the art. We keep reinventing, we keep deploying things in realtime. We deploy to our platform every day, repeatedly, all the time. And it's that continuous deployment that allow us to continue testing things in flight, rolling things out that change, new features, better ways of doing deployments, safer ways of doing deployments. All of that happens behind the scenes and like we had mentioned earllier, Kubernetes, I mean, that allows us to get that done. We couldn't do it without having that platform as a base layer for us to then put our software on. So we iterate quickly. When you're on the Influx cloud platform, you really are able to take advantage of new features immediately. We roll things out every day and as those things go into production, you have the ability to use them. And so in the then, we want you to focus on getting actual insights from your data instead of running infrastructure, you know, let us do that for you. >> That makes sense. Are the innovations that we're talking about in the evolution of InfluxDB, do you see that as sort of a natural evolution for existing customers? Is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >> Yeah, it really is. It's a little bit of both. Any engineer will say, "Well it depends." So cloud-native technologies are really the hot thing, IoT, industrial IoT especially. People want to just shove tons of data out there and be able to do queries immediately and they don't want to manage infrastructure. What we've started to see are people that use the cloud service as their datastore backbone and then they use edge computing with our OSS product to ingest data from say, multiple production lines, and down-sample that data, send the rest of that data off to Influx cloud where the heavy processing takes place. So really, us being in all the different clouds and iterating on that, and being in all sorts of different regions, allows for people to really get out of the business of trying to manage that big data, have us take care of that. And, of course, as we change the platform, endusers benefit from that immediately. >> And so obviously you've taken away a lot of the heavy lifting for the infrastructure. Would you say the same things about security, especially as you go out to IoT at the edge? How should we be thinking about the value that you bring from a security perspective? >> We take security super seriously. It's built into our DNA. We do a lot of work to ensure that our platform is secure, that the data that we store is kept private. It's, of course, always a concern, you see in the news all the time, companies being compromised. That's something that you can have an entire team working on which we do, to make sure that the data that you have, whether it's in transit, whether it's at rest is always kept secure, is only viewable by you. You look at things like software bill of materials, if you're running this yourself, you have to go vet all sorts of different pieces of software and we do that, you know, as we use new tools. That's something, that's just part of our jobs to make sure that the platform that we're running has fully vetted software. And you know, with opensource especially, that's a lot of work, and so it's definitely new territory. Supply chain attacks are definitely happening at a higher clip that they used to but that is really just part of a day in the life for folks like us that are building platforms. >> And that's key, especially when you start getting into the, you know, that we talk about IoT and the operations technologies, the engineers running that infrastrucutre. You know, historically, as you know, Tim, they would air gap everything; that's how they kept it safe. But that's not feasible anymore. Everything's-- >> Can't do that. >> connected now, right? And so you've got to have a partner that is, again, take away that heavy lifting to R&D so you can focus on some of the other activities. All right, give us the last word and the key takeaways from your perspective. >> Well, you know, from my perspective, I see it as a two-lane approach, with Influx, with any time-series data. You've got a lot of stuff that you're going to run on-prem. What you had mentioned, air gapping? Sure, there's plenty of need for that. But at the end of the day, people that don't want to run big datacenters, people that want to entrust their data to a company that's got a full platform set up for them that they can build on, send that data over to the cloud. The cloud is not going away. I think a more hybrid approach is where the future lives and that's what we're prepared for. >> Tim, really appreciate you coming to the program. Great stuff, good to see you. >> Thanks very much, appreciate it. >> Okay in a moment, I'll be back to wrap up today's session. You're watching theCUBE. (soft electronic music)

Published Date : Nov 8 2022

SUMMARY :

the Director of Engineering at InfluxData. So my question to you back to the projects that we use, in the heyday of Hadoop, And at the end of the day, we and all of the other stuff and the way we were and out to the edge, wherever. And so that just gets all of that we can manage with for the platform and for customers? and we can then focus on that they're going to get And so in the then, we want you to focus about in the evolution of InfluxDB, and down-sample that data, that you bring from a that the data that you have, and the operations technologies, and the key takeaways that data over to the cloud. you coming to the program. to wrap up today's session.

ENTITIES

Entity	Category	Confidence
Tim Yocum	PERSON	0.99+
Tim	PERSON	0.99+
InfluxData	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
two-lane	QUANTITY	0.99+
thousands	QUANTITY	0.99+
tomorrow	DATE	0.98+
today	DATE	0.98+
more than a decade	QUANTITY	0.98+
270 terabytes	QUANTITY	0.98+
InfluxDB	TITLE	0.98+
one	QUANTITY	0.97+
about 1500 CIOs	QUANTITY	0.97+
Influx	ORGANIZATION	0.96+
Azure	ORGANIZATION	0.94+
one way	QUANTITY	0.93+
single server	QUANTITY	0.93+
first	QUANTITY	0.92+
PaaS	TITLE	0.92+
Kubernetes	TITLE	0.91+
Enterprise Technology Research	ORGANIZATION	0.91+
Kubernetes	ORGANIZATION	0.91+
three clouds	QUANTITY	0.9+
ETR	ORGANIZATION	0.89+
tons of data	QUANTITY	0.87+
rsus	ORGANIZATION	0.87+
Hadoop	TITLE	0.85+
over four billion series	QUANTITY	0.85+
three large cloud providers	QUANTITY	0.74+
three different cloud providers	QUANTITY	0.74+
theCUBE	ORGANIZATION	0.66+
SQL	TITLE	0.64+
opensource	ORGANIZATION	0.63+
intelligent developers	QUANTITY	0.57+
POSTGRES	ORGANIZATION	0.52+
earllier	ORGANIZATION	0.5+
Azure	TITLE	0.49+
InfluxDB	OTHER	0.48+
cloud	TITLE	0.4+

Anais Dotis Georgiou, InfluxData | Evolving InfluxDB into the Smart Data Platform

>>Okay, we're back. I'm Dave Valante with The Cube and you're watching Evolving Influx DB into the smart data platform made possible by influx data. Anna East Otis Georgio is here. She's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into realtime analytics. Anna is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IO X is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory, of course for speed. It's a kilo store, so it gives you compression efficiency, it's gonna give you faster query speeds, it gonna use store files and object storages. So you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOCs is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's lift tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import, super useful. Also, broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so a lot there. Now we talked to Brian about how you're using Rust and and which is not a new programming language and of course we had some drama around Russ during the pandemic with the Mozilla layoffs, but the formation of the Russ Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Rust was chosen because of his exceptional performance and rebi reliability. So while rust is synt tactically similar to c c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers and dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on card for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ, Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fixed race conditions to protect against buffering overflows and to ensure thread safe ay caching structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learned about the the new engine and the, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you're really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data and so much of the efficiency and performance of IOCs comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of illustrate why calmer data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then neighbor each other and when they neighbor each other in the storage format. This provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the min and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one times stamp and do that for every single row. So you're scanning across a ton more data and that's why row oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, calmer data fit framework. So that's where a lot of the advantages come >>From. Okay. So you've basically described like a traditional database, a row approach, but I've seen like a lot of traditional databases say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native it, is it not as effective as the, is the form not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. >>Yeah. Got it. So let's talk about Arrow data fusion. What is data fusion? I know it's written in rust, but what does it bring to to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as its in memory format. So the way that it helps influx DB IOx is that okay, it's great if you can write unlimited amount of cardinality into influx cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PDA's data frames as well and all of the machine learning tools associated with pandas. >>Okay. You're also leveraging par K in the platform course. We heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Par K and why is it important? >>Sure. So Par K is the calm oriented durable file format. So it's important because it'll enable bulk import and bulk export. It has compatibility with Python and pandas so it supports a broader ecosystem. Parque files also take very little disc disc space and they're faster to scan because again they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and these, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call it the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOCs and I really encourage if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and I just wanna learn more, then I would encourage you to go to the monthly tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel. Look for the influx D DB underscore IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about IOCs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how influx TB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and you guys super responsive, so really appreciate that. All right, thank you so much and East for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yokum. He's the director of engineering for Influx Data and we're gonna talk about how you update a SaaS engine while the plane is flying at 30,000 feet. You don't wanna miss this.

Published Date : Nov 8 2022

SUMMARY :

to increase the granularity of time series analysis analysis and bring the world of data Hi, thank you so much. So you got very cost effective approach. it aims to have no limits on cardinality and also allow you to write any kind of event data that So lots of platforms, lots of adoption with rust, but why rust as an all the fine grain control, you need to take advantage of even to even today you do a lot of garbage collection in these, in these systems and And so you can picture this table where we have like two rows with the two temperature values for order to answer that question and you have those immediately available to you. to pluck out that one temperature value that you want at that one times stamp and do that for every about is really, you know, kind of native it, is it not as effective as the, Yeah, it's, it's not as effective because you have more expensive compression and because So let's talk about Arrow data fusion. It also has a PANDAS API so that you could take advantage of What are you doing with So it's important What's the value that you're bringing to the community? here is that the more you contribute and build those up, then the kind of summarize, you know, where what, what the big takeaways are from your perspective. So if there's a particular technology or stack that you wanna dive deeper into and want and you guys super responsive, so really appreciate that. I really appreciate it. Influx Data and we're gonna talk about how you update a SaaS engine while

ENTITIES

Entity	Category	Confidence
Tim Yokum	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Anna	PERSON	0.99+
James Bellenger	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
16 times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Python	TITLE	0.99+
mobile.twitter.com	OTHER	0.99+
Influx Data	ORGANIZATION	0.99+
iOS	TITLE	0.99+
Twitter	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
Russ Foundation	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Twitter Lite	TITLE	0.99+
two rows	QUANTITY	0.99+
200 megabyte	QUANTITY	0.99+
Node	TITLE	0.99+
Three months ago	DATE	0.99+
one application	QUANTITY	0.99+
both places	QUANTITY	0.99+
each row	QUANTITY	0.99+
Par K	TITLE	0.99+
Anais Dotis Georgiou	PERSON	0.99+
one language	QUANTITY	0.98+
first one	QUANTITY	0.98+
15 engineers	QUANTITY	0.98+
Anna East Otis Georgio	PERSON	0.98+
both	QUANTITY	0.98+
one second	QUANTITY	0.98+
25 engineers	QUANTITY	0.98+
About 800 people	QUANTITY	0.98+
sql	TITLE	0.98+
Node Summit 2017	EVENT	0.98+
two temperature values	QUANTITY	0.98+
one times	QUANTITY	0.98+
c plus plus	TITLE	0.97+
Rust	TITLE	0.96+
SQL	TITLE	0.96+
today	DATE	0.96+
Influx	ORGANIZATION	0.95+
under 600 kilobytes	QUANTITY	0.95+
first	QUANTITY	0.95+
c plus plus	TITLE	0.95+
Apache	ORGANIZATION	0.95+
par K	TITLE	0.94+
React	TITLE	0.94+
Russ	ORGANIZATION	0.94+
About three months ago	DATE	0.93+
8:30 AM Pacific time	DATE	0.93+
twitter.com	OTHER	0.93+
last decade	DATE	0.93+
Node	ORGANIZATION	0.92+
Hadoop	TITLE	0.9+
InfluxData	ORGANIZATION	0.89+
c c plus plus	TITLE	0.89+
Cube	ORGANIZATION	0.89+
each column	QUANTITY	0.88+
InfluxDB	TITLE	0.86+
Influx DB	TITLE	0.86+
Mozilla	ORGANIZATION	0.86+
DB IOx	TITLE	0.85+

Brian Gilmore, Influx Data | Evolving InfluxDB into the Smart Data Platform

Published Date : Nov 8 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
Brian Gilmore	PERSON	0.99+
Tim Yokum	PERSON	0.99+
Dave	PERSON	0.99+
Dave Valante	PERSON	0.99+
Brian	PERSON	0.99+
Tim	PERSON	0.99+
60,000 people	QUANTITY	0.99+
Influx	ORGANIZATION	0.99+
today	DATE	0.99+
Bryan	PERSON	0.99+
two	QUANTITY	0.99+
twice	QUANTITY	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.99+
three years ago	DATE	0.99+
Influx DB	TITLE	0.99+
Influx Data	ORGANIZATION	0.99+
tomorrow	DATE	0.98+
Apache	ORGANIZATION	0.98+
Anna East Dos Georgio	PERSON	0.98+
IOT	ORGANIZATION	0.97+
one	QUANTITY	0.97+
In Flux Data	ORGANIZATION	0.96+
Influx	TITLE	0.95+
The Cube	ORGANIZATION	0.95+
tons	QUANTITY	0.95+
Cube	ORGANIZATION	0.94+
Rust	TITLE	0.93+
both enterprises	QUANTITY	0.92+
iot T	TITLE	0.91+
second	QUANTITY	0.89+
Go	TITLE	0.88+
two thumbs	QUANTITY	0.87+
Anna East	PERSON	0.87+
Parque	TITLE	0.85+
a minute ago	DATE	0.84+
Influx State	ORGANIZATION	0.83+
Dos Georgio	ORGANIZATION	0.8+
influx data	ORGANIZATION	0.8+
Apache Arrow	ORGANIZATION	0.76+
GitHub	ORGANIZATION	0.75+
Bryan	LOCATION	0.74+
phase one	QUANTITY	0.71+
past May	DATE	0.69+
Go	ORGANIZATION	0.64+
number two	QUANTITY	0.64+
millisecond ago	DATE	0.61+
InfluxDB	TITLE	0.6+
Time	TITLE	0.55+
industrial	QUANTITY	0.54+
phase two	QUANTITY	0.54+
Parque	COMMERCIAL_ITEM	0.53+
couple	QUANTITY	0.5+
time	TITLE	0.5+
things	QUANTITY	0.49+
TSI	ORGANIZATION	0.4+
Arrow	TITLE	0.38+
PARQUE	OTHER	0.3+

Evolving InfluxDB into the Smart Data Platform

>>This past May, The Cube in collaboration with Influx data shared with you the latest innovations in Time series databases. We talked at length about why a purpose built time series database for many use cases, was a superior alternative to general purpose databases trying to do the same thing. Now, you may, you may remember the time series data is any data that's stamped in time, and if it's stamped, it can be analyzed historically. And when we introduced the concept to the community, we talked about how in theory, those time slices could be taken, you know, every hour, every minute, every second, you know, down to the millisecond and how the world was moving toward realtime or near realtime data analysis to support physical infrastructure like sensors and other devices and IOT equipment. A time series databases have had to evolve to efficiently support realtime data in emerging use cases in iot T and other use cases. >>And to do that, new architectural innovations have to be brought to bear. As is often the case, open source software is the linchpin to those innovations. Hello and welcome to Evolving Influx DB into the smart Data platform, made possible by influx data and produced by the Cube. My name is Dave Valante and I'll be your host today. Now in this program we're going to dig pretty deep into what's happening with Time series data generally, and specifically how Influx DB is evolving to support new workloads and demands and data, and specifically around data analytics use cases in real time. Now, first we're gonna hear from Brian Gilmore, who is the director of IOT and emerging technologies at Influx Data. And we're gonna talk about the continued evolution of Influx DB and the new capabilities enabled by open source generally and specific tools. And in this program you're gonna hear a lot about things like Rust, implementation of Apache Arrow, the use of par k and tooling such as data fusion, which powering a new engine for Influx db. >>Now, these innovations, they evolve the idea of time series analysis by dramatically increasing the granularity of time series data by compressing the historical time slices, if you will, from, for example, minutes down to milliseconds. And at the same time, enabling real time analytics with an architecture that can process data much faster and much more efficiently. Now, after Brian, we're gonna hear from Anna East Dos Georgio, who is a developer advocate at In Flux Data. And we're gonna get into the why of these open source capabilities and how they contribute to the evolution of the Influx DB platform. And then we're gonna close the program with Tim Yokum, he's the director of engineering at Influx Data, and he's gonna explain how the Influx DB community actually evolved the data engine in mid-flight and which decisions went into the innovations that are coming to the market. Thank you for being here. We hope you enjoy the program. Let's get started. Okay, we're kicking things off with Brian Gilmore. He's the director of i t and emerging Technology at Influx State of Bryan. Welcome to the program. Thanks for coming on. >>Thanks Dave. Great to be here. I appreciate the time. >>Hey, explain why Influx db, you know, needs a new engine. Was there something wrong with the current engine? What's going on there? >>No, no, not at all. I mean, I think it's, for us, it's been about staying ahead of the market. I think, you know, if we think about what our customers are coming to us sort of with now, you know, related to requests like sql, you know, query support, things like that, we have to figure out a way to, to execute those for them in a way that will scale long term. And then we also, we wanna make sure we're innovating, we're sort of staying ahead of the market as well and sort of anticipating those future needs. So, you know, this is really a, a transparent change for our customers. I mean, I think we'll be adding new capabilities over time that sort of leverage this new engine, but you know, initially the customers who are using us are gonna see just great improvements in performance, you know, especially those that are working at the top end of the, of the workload scale, you know, the massive data volumes and things like that. >>Yeah, and we're gonna get into that today and the architecture and the like, but what was the catalyst for the enhancements? I mean, when and how did this all come about? >>Well, I mean, like three years ago we were primarily on premises, right? I mean, I think we had our open source, we had an enterprise product, you know, and, and sort of shifting that technology, especially the open source code base to a service basis where we were hosting it through, you know, multiple cloud providers. That was, that was, that was a long journey I guess, you know, phase one was, you know, we wanted to host enterprise for our customers, so we sort of created a service that we just managed and ran our enterprise product for them. You know, phase two of this cloud effort was to, to optimize for like multi-tenant, multi-cloud, be able to, to host it in a truly like sass manner where we could use, you know, some type of customer activity or consumption as the, the pricing vector, you know, And, and that was sort of the birth of the, of the real first influx DB cloud, you know, which has been really successful. >>We've seen, I think like 60,000 people sign up and we've got tons and tons of, of both enterprises as well as like new companies, developers, and of course a lot of home hobbyists and enthusiasts who are using out on a, on a daily basis, you know, and having that sort of big pool of, of very diverse and very customers to chat with as they're using the product, as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction in terms of making sure we're continuously improving that and then also making these big leaps as we're doing with this, with this new engine. >>Right. So you've called it a transparent change for customers, so I'm presuming it's non-disruptive, but I really wanna understand how much of a pivot this is and what, what does it take to make that shift from, you know, time series, you know, specialist to real time analytics and being able to support both? >>Yeah, I mean, it's much more of an evolution, I think, than like a shift or a pivot. You know, time series data is always gonna be fundamental and sort of the basis of the solutions that we offer our customers, and then also the ones that they're building on the sort of raw APIs of our platform themselves. You know, the time series market is one that we've worked diligently to lead. I mean, I think when it comes to like metrics, especially like sensor data and app and infrastructure metrics, if we're being honest though, I think our, our user base is well aware that the way we were architected was much more towards those sort of like backwards looking historical type analytics, which are key for troubleshooting and making sure you don't, you know, run into the same problem twice. But, you know, we had to ask ourselves like, what can we do to like better handle those queries from a performance and a, and a, you know, a time to response on the queries, and can we get that to the point where the results sets are coming back so quickly from the time of query that we can like limit that window down to minutes and then seconds. >>And now with this new engine, we're really starting to talk about a query window that could be like returning results in, in, you know, milliseconds of time since it hit the, the, the ingest queue. And that's, that's really getting to the point where as your data is available, you can use it and you can query it, you can visualize it, and you can do all those sort of magical things with it, you know? And I think getting all of that to a place where we're saying like, yes to the customer on, you know, all of the, the real time queries, the, the multiple language query support, but, you know, it was hard, but we're now at a spot where we can start introducing that to, you know, a a limited number of customers, strategic customers and strategic availability zones to start. But you know, everybody over time. >>So you're basically going from what happened to in, you can still do that obviously, but to what's happening now in the moment? >>Yeah, yeah. I mean if you think about time, it's always sort of past, right? I mean, like in the moment right now, whether you're talking about like a millisecond ago or a minute ago, you know, that's, that's pretty much right now, I think for most people, especially in these use cases where you have other sort of components of latency induced by the, by the underlying data collection, the architecture, the infrastructure, the, you know, the, the devices and you know, the sort of highly distributed nature of all of this. So yeah, I mean, getting, getting a customer or a user to be able to use the data as soon as it is available is what we're after here. >>I always thought, you know, real, I always thought of real time as before you lose the customer, but now in this context, maybe it's before the machine blows up. >>Yeah, it's, it's, I mean it is operationally or operational real time is different, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, is just how many sort of operational customers we have. You know, everything from like aerospace and defense. We've got companies monitoring satellites, we've got tons of industrial users, users using us as a processes storing on the plant floor, you know, and, and if we can satisfy their sort of demands for like real time historical perspective, that's awesome. I think what we're gonna do here is we're gonna start to like edge into the real time that they're used to in terms of, you know, the millisecond response times that they expect of their control systems, certainly not their, their historians and databases. >>I, is this available, these innovations to influx DB cloud customers only who can access this capability? >>Yeah. I mean commercially and today, yes. You know, I think we want to emphasize that's a, for now our goal is to get our latest and greatest and our best to everybody over time. Of course. You know, one of the things we had to do here was like we double down on sort of our, our commitment to open source and availability. So like anybody today can take a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try to, you know, implement or execute some of it themselves in their own infrastructure. You know, we are, we're committed to bringing our sort of latest and greatest to our cloud customers first for a couple of reasons. Number one, you know, there are big workloads and they have high expectations of us. I think number two, it also gives us the opportunity to monitor a little bit more closely how it's working, how they're using it, like how the system itself is performing. >>And so just, you know, being careful, maybe a little cautious in terms of, of, of how big we go with this right away, just sort of both limits, you know, the risk of, of, you know, any issues that can come with new software rollouts. We haven't seen anything so far, but also it does give us the opportunity to have like meaningful conversations with a small group of users who are using the products, but once we get through that and they give us two thumbs up on it, it'll be like, open the gates and let everybody in. It's gonna be exciting time for the whole ecosystem. >>Yeah, that makes a lot of sense. And you can do some experimentation and, you know, using the cloud resources. Let's dig into some of the architectural and technical innovations that are gonna help deliver on this vision. What, what should we know there? >>Well, I mean, I think foundationally we built the, the new core on Rust. You know, this is a new very sort of popular systems language, you know, it's extremely efficient, but it's also built for speed and memory safety, which goes back to that us being able to like deliver it in a way that is, you know, something we can inspect very closely, but then also rely on the fact that it's going to behave well. And if it does find error conditions, I mean we, we've loved working with Go and, you know, a lot of our libraries will continue to, to be sort of implemented in Go, but you know, when it came to this particular new engine, you know, that power performance and stability rust was critical. On top of that, like, we've also integrated Apache Arrow and Apache Parque for persistence. I think for anybody who's really familiar with the nuts and bolts of our backend and our TSI and our, our time series merged Trees, this is a big break from that, you know, arrow on the sort of in MI side and then Par K in the on disk side. >>It, it allows us to, to present, you know, a unified set of APIs for those really fast real time inquiries that we talked about, as well as for very large, you know, historical sort of bulk data archives in that PARQUE format, which is also cool because there's an entire ecosystem sort of popping up around Parque in terms of the machine learning community, you know, and getting that all to work, we had to glue it together with aero flight. That's sort of what we're using as our, our RPC component. You know, it handles the orchestration and the, the transportation of the Coer data. Now we're moving to like a true Coer database model for this, this version of the engine, you know, and it removes a lot of overhead for us in terms of having to manage all that serialization, the deserialization, and, you know, to that again, like blurring that line between real time and historical data. It's, you know, it's, it's highly optimized for both streaming micro batch and then batches, but true streaming as well. >>Yeah. Again, I mean, it's funny you mentioned Rust. It is, it's been around for a long time, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. And, and we're gonna dig into to more of that, but give us any, is there anything else that we should know about Bryan? Give us the last word? >>Well, I mean, I think first I'd like everybody sort of watching just to like take a look at what we're offering in terms of early access in beta programs. I mean, if, if, if you wanna participate or if you wanna work sort of in terms of early access with the, with the new engine, please reach out to the team. I'm sure you know, there's a lot of communications going out and you know, it'll be highly featured on our, our website, you know, but reach out to the team, believe it or not, like we have a lot more going on than just the new engine. And so there are also other programs, things we're, we're offering to customers in terms of the user interface, data collection and things like that. And, you know, if you're a customer of ours and you have a sales team, a commercial team that you work with, you can reach out to them and see what you can get access to because we can flip a lot of stuff on, especially in cloud through feature flags. >>But if there's something new that you wanna try out, we'd just love to hear from you. And then, you know, our goal would be that as we give you access to all of these new cool features that, you know, you would give us continuous feedback on these products and services, not only like what you need today, but then what you'll need tomorrow to, to sort of build the next versions of your business. Because you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented stack of cloud services and enterprise databases and edge databases, you know, it's gonna be what we all make it together, not just, you know, those of us who were employed by Influx db. And then finally I would just say please, like watch in ICE in Tim's sessions, like these are two of our best and brightest, They're totally brilliant, completely pragmatic, and they are most of all customer obsessed, which is amazing. And there's no better takes, like honestly on the, the sort of technical details of this, then there's, especially when it comes to like the value that these investments will, will bring to our customers and our communities. So encourage you to, to, you know, pay more attention to them than you did to me, for sure. >>Brian Gilmore, great stuff. Really appreciate your time. Thank you. >>Yeah, thanks Dave. It was awesome. Look forward to it. >>Yeah, me too. Looking forward to see how the, the community actually applies these new innovations and goes, goes beyond just the historical into the real time really hot area. As Brian said in a moment, I'll be right back with Anna East dos Georgio to dig into the critical aspects of key open source components of the Influx DB engine, including Rust, Arrow, Parque, data fusion. Keep it right there. You don't wanna miss this >>Time series Data is everywhere. The number of sensors, systems and applications generating time series data increases every day. All these data sources producing so much data can cause analysis paralysis. Influx DB is an entire platform designed with everything you need to quickly build applications that generate value from time series data influx. DB Cloud is a serverless solution, which means you don't need to buy or manage your own servers. There's no need to worry about provisioning because you only pay for what you use. Influx DB Cloud is fully managed so you get the newest features and enhancements as they're added to the platform's code base. It also means you can spend time building solutions and delivering value to your users instead of wasting time and effort managing something else. Influx TVB Cloud offers a range of security features to protect your data, multiple layers of redundancy ensure you don't lose any data access controls ensure that only the people who should see your data can see it. >>And encryption protects your data at rest and in transit between any of our regions or cloud providers. InfluxDB uses a single API across the entire platform suite so you can build on open source, deploy to the cloud and then then easily query data in the cloud at the edge or on prem using the same scripts. And InfluxDB is schemaless automatically adjusting to changes in the shape of your data without requiring changes in your application. Logic. InfluxDB Cloud is production ready from day one. All it needs is your data and your imagination. Get started today@influxdata.com slash cloud. >>Okay, we're back. I'm Dave Valante with a Cube and you're watching evolving Influx DB into the smart data platform made possible by influx data. Anna ETOs Georgio is here, she's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into real-time analytics and is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IX is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory of course for speed. It's a kilo store, so it gives you a compression efficiency, it's gonna give you faster query speeds, you store files and object storage, so you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOx is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's live tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import super useful. Also broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so lot there. Now we talked to Brian about how you're using Rust and which is not a new programming language and of course we had some drama around Rust during the pandemic with the Mozilla layoffs, but the formation of the Rust Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, the adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Russ was chosen because of his exceptional performance and reliability. So while Russ is synt tactically similar to c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers. And dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on ality, for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fix race conditions, to protection against buffering overflows and to ensure thread safe async cashing structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learn about the, the new engine and, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It it's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data. And so much of the efficiency and performance of IOx comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of of illustrate why column or data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then enable each other and when they neighbor each other in the storage format, this provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the men and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one time stamp and do that for every single row. So you're scanning across a ton more data and that's why Rowe Oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, commoner data fit framework. So that's where a lot of the advantages come >>From. Okay. So you basically described like a traditional database, a row approach, but I've seen like a lot of traditional database say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native i, is it not as effective? Is the, is the foreman not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. Yeah. >>Got it. So let's talk about Arrow Data Fusion. What is data fusion? I know it's written in Rust, but what does it bring to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as it's in memory format. So the way that it helps in influx DB IOCs is that okay, it's great if you can write unlimited amount of cardinality into influx Cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So Data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PANDAS data frames as well and all of the machine learning tools associated with Pandas. >>Okay. You're also leveraging Par K in the platform cause we heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Parque and why is it important? >>Sure. So parque is the column oriented durable file format. So it's important because it'll enable bulk import, bulk export, it has compatibility with Python and Pandas, so it supports a broader ecosystem. Par K files also take very little disc disc space and they're faster to scan because again, they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and he's, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOx and I really encourage, if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and you just wanna learn more, then I would encourage you to go to the monthly Tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel look for the influx DDB unders IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about iacs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how INFLUX DB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and, and you guys super responsive, so really appreciate that. All right, thank you so much Anise for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yoakum, he's the director of engineering for Influx Data and we're gonna talk about how you update a SAS engine while the plane is flying at 30,000 feet. You don't wanna miss this. >>I'm really glad that we went with InfluxDB Cloud for our hosting because it has saved us a ton of time. It's helped us move faster, it's saved us money. And also InfluxDB has good support. My name's Alex Nada. I am CTO at Noble nine. Noble Nine is a platform to measure and manage service level objectives, which is a great way of measuring the reliability of your systems. You can essentially think of an slo, the product we're providing to our customers as a bunch of time series. So we need a way to store that data and the corresponding time series that are related to those. The main reason that we settled on InfluxDB as we were shopping around is that InfluxDB has a very flexible query language and as a general purpose time series database, it basically had the set of features we were looking for. >>As our platform has grown, we found InfluxDB Cloud to be a really scalable solution. We can quickly iterate on new features and functionality because Influx Cloud is entirely managed, it probably saved us at least a full additional person on our team. We also have the option of running InfluxDB Enterprise, which gives us the ability to even host off the cloud or in a private cloud if that's preferred by a customer. Influx data has been really flexible in adapting to the hosting requirements that we have. They listened to the challenges we were facing and they helped us solve it. As we've continued to grow, I'm really happy we have influx data by our side. >>Okay, we're back with Tim Yokum, who is the director of engineering at Influx Data. Tim, welcome. Good to see you. >>Good to see you. Thanks for having me. >>You're really welcome. Listen, we've been covering open source software in the cube for more than a decade, and we've kind of watched the innovation from the big data ecosystem. The cloud has been being built out on open source, mobile, social platforms, key databases, and of course influx DB and influx data has been a big consumer and contributor of open source software. So my question to you is, where have you seen the biggest bang for the buck from open source software? >>So yeah, you know, influx really, we thrive at the intersection of commercial services and open, so open source software. So OSS keeps us on the cutting edge. We benefit from OSS in delivering our own service from our core storage engine technologies to web services temping engines. Our, our team stays lean and focused because we build on proven tools. We really build on the shoulders of giants and like you've mentioned, even better, we contribute a lot back to the projects that we use as well as our own product influx db. >>You know, but I gotta ask you, Tim, because one of the challenge that that we've seen in particular, you saw this in the heyday of Hadoop, the, the innovations come so fast and furious and as a software company you gotta place bets, you gotta, you know, commit people and sometimes those bets can be risky and not pay off well, how have you managed this challenge? >>Oh, it moves fast. Yeah, that, that's a benefit though because it, the community moves so quickly that today's hot technology can be tomorrow's dinosaur. And what we, what we tend to do is, is we fail fast and fail often. We try a lot of things. You know, you look at Kubernetes for example, that ecosystem is driven by thousands of intelligent developers, engineers, builders, they're adding value every day. So we have to really keep up with that. And as the stack changes, we, we try different technologies, we try different methods, and at the end of the day, we come up with a better platform as a result of just the constant change in the environment. It is a challenge for us, but it's, it's something that we just do every day. >>So we have a survey partner down in New York City called Enterprise Technology Research etr, and they do these quarterly surveys of about 1500 CIOs, IT practitioners, and they really have a good pulse on what's happening with spending. And the data shows that containers generally, but specifically Kubernetes is one of the areas that has kind of, it's been off the charts and seen the most significant adoption and velocity particularly, you know, along with cloud. But, but really Kubernetes is just, you know, still up until the right consistently even with, you know, the macro headwinds and all, all of the stuff that we're sick of talking about. But, so what are you doing with Kubernetes in the platform? >>Yeah, it, it's really central to our ability to run the product. When we first started out, we were just on AWS and, and the way we were running was, was a little bit like containers junior. Now we're running Kubernetes everywhere at aws, Azure, Google Cloud. It allows us to have a consistent experience across three different cloud providers and we can manage that in code so our developers can focus on delivering services, not trying to learn the intricacies of Amazon, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. >>Just to follow up on that, is it, no. So I presume it's sounds like there's a PAs layer there to allow you guys to have a consistent experience across clouds and out to the edge, you know, wherever is that, is that correct? >>Yeah, so we've basically built more or less platform engineering, This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us because we've built a platform that our developers can lean on and they only have to learn one way of deploying their application, managing their application. And so that, that just gets all of the underlying infrastructure out of the way and, and lets them focus on delivering influx cloud. >>Yeah, and I know I'm taking a little bit of a tangent, but is that, that, I'll call it a PAs layer if I can use that term. Is that, are there specific attributes to Influx db or is it kind of just generally off the shelf paths? You know, are there, is, is there any purpose built capability there that, that is, is value add or is it pretty much generic? >>So we really build, we, we look at things through, with a build versus buy through a, a build versus by lens. Some things we want to leverage cloud provider services, for instance, Postgres databases for metadata, perhaps we'll get that off of our plate, let someone else run that. We're going to deploy a platform that our engineers can, can deliver on that has consistency that is, is all generated from code that we can as a, as an SRE group, as an ops team, that we can manage with very few people really, and we can stamp out clusters across multiple regions and in no time. >>So how, so sometimes you build, sometimes you buy it. How do you make those decisions and and what does that mean for the, for the platform and for customers? >>Yeah, so what we're doing is, it's like everybody else will do, we're we're looking for trade offs that make sense. You know, we really want to protect our customers data. So we look for services that support our own software with the most uptime, reliability, and durability we can get. Some things are just going to be easier to have a cloud provider take care of on our behalf. We make that transparent for our own team. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, like I had mentioned with SQL data stores for metadata, perhaps let's build on top of what of these three large cloud providers have already perfected. And we can then focus on our platform engineering and we can have our developers then focus on the influx data, software, influx, cloud software. >>So take it to the customer level, what does it mean for them? What's the value that they're gonna get out of all these innovations that we've been been talking about today and what can they expect in the future? >>So first of all, people who use the OSS product are really gonna be at home on our cloud platform. You can run it on your desktop machine, on a single server, what have you, but then you want to scale up. We have some 270 terabytes of data across, over 4 billion series keys that people have stored. So there's a proven ability to scale now in terms of the open source, open source software and how we've developed the platform. You're getting highly available high cardinality time series platform. We manage it and, and really as, as I mentioned earlier, we can keep up with the state of the art. We keep reinventing, we keep deploying things in real time. We deploy to our platform every day repeatedly all the time. And it's that continuous deployment that allows us to continue testing things in flight, rolling things out that change new features, better ways of doing deployments, safer ways of doing deployments. >>All of that happens behind the scenes. And like we had mentioned earlier, Kubernetes, I mean that, that allows us to get that done. We couldn't do it without having that platform as a, as a base layer for us to then put our software on. So we, we iterate quickly. When you're on the, the Influx cloud platform, you really are able to, to take advantage of new features immediately. We roll things out every day and as those things go into production, you have, you have the ability to, to use them. And so in the end we want you to focus on getting actual insights from your data instead of running infrastructure, you know, let, let us do that for you. So, >>And that makes sense, but so is the, is the, are the innovations that we're talking about in the evolution of Influx db, do, do you see that as sort of a natural evolution for existing customers? I, is it, I'm sure the answer is both, but is it opening up new territory for customers? Can you add some color to that? >>Yeah, it really is it, it's a little bit of both. Any engineer will say, well, it depends. So cloud native technologies are, are really the hot thing. Iot, industrial iot especially, people want to just shove tons of data out there and be able to do queries immediately and they don't wanna manage infrastructure. What we've started to see are people that use the cloud service as their, their data store backbone and then they use edge computing with R OSS product to ingest data from say, multiple production lines and downsample that data, send the rest of that data off influx cloud where the heavy processing takes place. So really us being in all the different clouds and iterating on that and being in all sorts of different regions allows for people to really get out of the, the business of man trying to manage that big data, have us take care of that. And of course as we change the platform end users benefit from that immediately. And, >>And so obviously taking away a lot of the heavy lifting for the infrastructure, would you say the same thing about security, especially as you go out to IOT and the Edge? How should we be thinking about the value that you bring from a security perspective? >>Yeah, we take, we take security super seriously. It, it's built into our dna. We do a lot of work to ensure that our platform is secure, that the data we store is, is kept private. It's of course always a concern. You see in the news all the time, companies being compromised, you know, that's something that you can have an entire team working on, which we do to make sure that the data that you have, whether it's in transit, whether it's at rest, is always kept secure, is only viewable by you. You know, you look at things like software, bill of materials, if you're running this yourself, you have to go vet all sorts of different pieces of software. And we do that, you know, as we use new tools. That's something that, that's just part of our jobs to make sure that the platform that we're running it has, has fully vetted software and, and with open source especially, that's a lot of work. And so it's, it's definitely new territory. Supply chain attacks are, are definitely happening at a higher clip than they used to, but that is, that is really just part of a day in the, the life for folks like us that are, are building platforms. >>Yeah, and that's key. I mean especially when you start getting into the, the, you know, we talk about IOT and the operations technologies, the engineers running the, that infrastructure, you know, historically, as you know, Tim, they, they would air gap everything. That's how they kept it safe. But that's not feasible anymore. Everything's >>That >>Connected now, right? And so you've gotta have a partner that is again, take away that heavy lifting to r and d so you can focus on some of the other activities. Right. Give us the, the last word and the, the key takeaways from your perspective. >>Well, you know, from my perspective I see it as, as a a two lane approach with, with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, what you had mentioned, air gaping. Sure there's plenty of need for that, but at the end of the day, people that don't want to run big data centers, people that want torus their data to, to a company that's, that's got a full platform set up for them that they can build on, send that data over to the cloud, the cloud is not going away. I think more hybrid approach is, is where the future lives and that's what we're prepared for. >>Tim, really appreciate you coming to the program. Great stuff. Good to see you. >>Thanks very much. Appreciate it. >>Okay, in a moment I'll be back to wrap up. Today's session, you're watching The Cube. >>Are you looking for some help getting started with InfluxDB Telegraph or Flux Check >>Out Influx DB University >>Where you can find our entire catalog of free training that will help you make the most of your time series data >>Get >>Started for free@influxdbu.com. >>We'll see you in class. >>Okay, so we heard today from three experts on time series and data, how the Influx DB platform is evolving to support new ways of analyzing large data sets very efficiently and effectively in real time. And we learned that key open source components like Apache Arrow and the Rust Programming environment Data fusion par K are being leveraged to support realtime data analytics at scale. We also learned about the contributions in importance of open source software and how the Influx DB community is evolving the platform with minimal disruption to support new workloads, new use cases, and the future of realtime data analytics. Now remember these sessions, they're all available on demand. You can go to the cube.net to find those. Don't forget to check out silicon angle.com for all the news related to things enterprise and emerging tech. And you should also check out influx data.com. There you can learn about the company's products. You'll find developer resources like free courses. You could join the developer community and work with your peers to learn and solve problems. And there are plenty of other resources around use cases and customer stories on the website. This is Dave Valante. Thank you for watching Evolving Influx DB into the smart data platform, made possible by influx data and brought to you by the Cube, your leader in enterprise and emerging tech coverage.

Published Date : Nov 2 2022

SUMMARY :

we talked about how in theory, those time slices could be taken, you know, As is often the case, open source software is the linchpin to those innovations. We hope you enjoy the program. I appreciate the time. Hey, explain why Influx db, you know, needs a new engine. now, you know, related to requests like sql, you know, query support, things like that, of the real first influx DB cloud, you know, which has been really successful. as they're giving us feedback, et cetera, has has, you know, pointed us in a really good direction shift from, you know, time series, you know, specialist to real time analytics better handle those queries from a performance and a, and a, you know, a time to response on the queries, you know, all of the, the real time queries, the, the multiple language query support, the, the devices and you know, the sort of highly distributed nature of all of this. I always thought, you know, real, I always thought of real time as before you lose the customer, you know, and that's one of the things that really triggered us to know that we were, we were heading in the right direction, a look at the, the libraries in on our GitHub and, you know, can ex inspect it and even can try And so just, you know, being careful, maybe a little cautious in terms And you can do some experimentation and, you know, using the cloud resources. You know, this is a new very sort of popular systems language, you know, really fast real time inquiries that we talked about, as well as for very large, you know, but it's popularity is, is you know, really starting to hit that steep part of the S-curve. going out and you know, it'll be highly featured on our, our website, you know, the whole database, the ecosystem as it expands out into to, you know, this vertically oriented Really appreciate your time. Look forward to it. goes, goes beyond just the historical into the real time really hot area. There's no need to worry about provisioning because you only pay for what you use. InfluxDB uses a single API across the entire platform suite so you can build on Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the Hi, thank you so much. it's gonna give you faster query speeds, you store files and object storage, it aims to have no limits on cardinality and also allow you to write any kind of event data that It's really, the adoption is really starting to get steep on all the control, all the fine grain control, you need to take you know, the community is modernizing the platform, but I wanna talk about Apache And so you can answer that question and you have those immediately available to you. out that one temperature value that you want at that one time stamp and do that for every talking about is really, you know, kind of native i, is it not as effective? Yeah, it's, it's not as effective because you have more expensive compression and So let's talk about Arrow Data Fusion. It also has a PANDAS API so that you could take advantage of PANDAS What are you doing with and Pandas, so it supports a broader ecosystem. What's the value that you're bringing to the community? And I think kind of the idea here is that if you can improve kind of summarize, you know, where what, what the big takeaways are from your perspective. the hard work questions and you All right, thank you so much Anise for explaining I really appreciate it. Data and we're gonna talk about how you update a SAS engine while I'm really glad that we went with InfluxDB Cloud for our hosting They listened to the challenges we were facing and they helped Good to see you. Good to see you. So my question to you is, So yeah, you know, influx really, we thrive at the intersection of commercial services and open, You know, you look at Kubernetes for example, But, but really Kubernetes is just, you know, Azure, and Google and figure out how to deliver services on those three clouds with all of their differences. to the edge, you know, wherever is that, is that correct? This is the new hot phrase, you know, it, it's, Kubernetes has made a lot of things easy for us Is that, are there specific attributes to Influx db as an SRE group, as an ops team, that we can manage with very few people So how, so sometimes you build, sometimes you buy it. And of course for customers you don't even see that, but we don't want to try to reinvent the wheel, and really as, as I mentioned earlier, we can keep up with the state of the art. the end we want you to focus on getting actual insights from your data instead of running infrastructure, So cloud native technologies are, are really the hot thing. You see in the news all the time, companies being compromised, you know, technologies, the engineers running the, that infrastructure, you know, historically, as you know, take away that heavy lifting to r and d so you can focus on some of the other activities. with influx, with Anytime series data, you know, you've got a lot of stuff that you're gonna run on-prem, Tim, really appreciate you coming to the program. Thanks very much. Okay, in a moment I'll be back to wrap up. brought to you by the Cube, your leader in enterprise and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Brian Gilmore	PERSON	0.99+
David Brown	PERSON	0.99+
Tim Yoakum	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Brian	PERSON	0.99+
Dave	PERSON	0.99+
Tim Yokum	PERSON	0.99+
Stu	PERSON	0.99+
Herain Oberoi	PERSON	0.99+
John	PERSON	0.99+
Dave Valante	PERSON	0.99+
Kamile Taouk	PERSON	0.99+
John Fourier	PERSON	0.99+
Rinesh Patel	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Santana Dasgupta	PERSON	0.99+
Europe	LOCATION	0.99+
Canada	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
ICE	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Jack Berkowitz	PERSON	0.99+
Australia	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
Venkat	PERSON	0.99+
Michael	PERSON	0.99+
Camille	PERSON	0.99+
Andy Jassy	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Venkat Krishnamachari	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Don Tapscott	PERSON	0.99+
thousands	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Intercontinental Exchange	ORGANIZATION	0.99+
Children's Cancer Institute	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
telco	ORGANIZATION	0.99+
Sabrina Yan	PERSON	0.99+
Tim	PERSON	0.99+
Sabrina	PERSON	0.99+
John Furrier	PERSON	0.99+
Google	ORGANIZATION	0.99+
MontyCloud	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Leo	PERSON	0.99+
COVID-19	OTHER	0.99+
Santa Ana	LOCATION	0.99+
UK	LOCATION	0.99+
Tushar	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Valente	PERSON	0.99+
JL Valente	PERSON	0.99+
1,000	QUANTITY	0.99+

Laura Sellers, Collibra | Data Citizens 22

>> Welcome to theCUBE's Virtual Coverage of Data Citizens 2022. My name is Dave Vellante and I'm here with Laura Sellers who is the Chief Product Officer at Collibra, the host of Data Citizens, Laura, welcome. Good to see you. >> Thank you. Nice to be here. >> Yeah, your keynote at Data Citizens this year focused on you know, your mission to drive ease of use and scale. Now, when I think about historically fast access to the right data at the right time in a form that's really easily consumable it's been kind of challenging especially for business users. Can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >> Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Collibra a little bit over a year ago, was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, oh, go ahead. >> I was going to say, you know, when I look back at like the last 10 years it was all about getting the technology to work and it was just so complicated, but, but please carry on. I'd love to hear more about this. >> Yeah, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data-driven culture that all companies are trying to create. And also helping with data literacy. With something like usage analytics, it's really about driving adoption of the Collibra platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called Workflow Designer. And we love our workflows at Collibra, it's a big differentiator to be able to automate business processes. The Designer is really about a way for more people to be able to create those workflows, collaborate on those workflows, as well as people to be able to easily interact with them. So a lot of of exciting things when it comes to ease of use to make it easier for all users to find data. >> Yes, there's definitely a lot to unpack there. You know, you mentioned this idea of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. Let's go with analogy. Why is it so important to data consumers? >> I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from and then be able to quickly access it, is the idea of being able to shop for it. Just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >> Yeah, I think you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations they're facing adoption challenges. You know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies you know, like the tech comes fast and furious. You got all these open source projects and you get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view, and how can organizations overcome these challenges? >> You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? Is just being able to stay ahead of and understand all the technologies that are coming. You also look at it as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is, is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up, it's not another machine for you to manage, it's just push-down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >> This topic of data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, GDPR, and you know, California Consumer Privacy Act all becomes so much important. The cloud is really changed things in terms of performance and scale. And of course partnering for, with Snowflake, it's all about sharing data and monetization anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically interested in sort of joint engineering and product innovation efforts, you know, beyond the standard go-to-market stuff? >> Definitely. So you mentioned there were a strategic investor in Collibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of, our unified platform are touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which which has always existed. We're able to profile and classify that data. We're announcing with Collibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforced. So again, people can get more value out of their Snowflake more quickly, as far as time to value with our policies for all business users to be able to create. We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed, within Snowflake as well as the data quality push-down, as I mentioned, data quality, you brought it up. It is a new, it is a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million dollars without having great data quality. So this push-down capability for Snowflake really is again a big ease of use push for us at Collibra of that ability to, to push it into Snowflake, take advantage of the data, the data source and the engine that already lives there, and get the right, and make sure you have the right quality. >> I mean the nice thing about Snowflake if you play in the Snowflake sandbox, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and and that governance that I, that I need. You know, we've said many times on theCUBE that one of the notable differences in cloud this decade versus last decade I mean there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in in the hyperscale offerings. Because you got more stack, you know, mature stack capabilities and you know, that creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google or whomever, and what do you see as your role and what's the Collibra sweet spot? >> Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and Collibra Protect there, but also tighter Dataplex integration. So similar to what you've seen with our strategic moves around Snowflake, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of Dataplex. We also have great partners in SI's. Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's, as they're really important to help people with their whole data strategy and driving that data-driven culture and and Collibra being the core of it. >> Hi Laura, we're going to, we're going to end it there but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >> Yeah, definitely. So I, I want to say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use, to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand and trust and get access to that data going forward. >> Well congratulations on all the progress. It was great to have you on theCUBE. First time, I believe. And really appreciate you, you taking the time with us. >> Yes, thank you, for your time. >> You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on theCUBE your leader in enterprise and emerging tech coverage.

Published Date : Nov 2 2022

SUMMARY :

the host of Data Citizens, Nice to be here. in the data ecosystem the ability to manage data the technology to work at Collibra is to help with Why is it so important to data consumers? and really speeding the time to value But of course then you have to have the ability to have data and really confined only to and the engine that already lives there, into the story allows me to and the world of Dataplex. of put a bow on, you know, and get access to that data going forward. on all the progress. of Data Citizens 2022 on theCUBE

ENTITIES

Entity	Category	Confidence
Laura	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Laura Sellers	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
California Consumer Privacy Act	TITLE	0.99+
AWS	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Infosys	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Dataplex	ORGANIZATION	0.99+
one	QUANTITY	0.99+
first	QUANTITY	0.98+
Data Citizens	ORGANIZATION	0.97+
this year	DATE	0.97+
this week	DATE	0.95+
Levi's	ORGANIZATION	0.94+
Snowflake	TITLE	0.94+
past year and a half	DATE	0.94+
First time	QUANTITY	0.94+
Gartner	ORGANIZATION	0.93+
last decade	DATE	0.93+
two more areas	QUANTITY	0.91+
today	DATE	0.91+
GCP	ORGANIZATION	0.86+
up to $15 million dollars	QUANTITY	0.86+
a year ago	DATE	0.85+
first thing	QUANTITY	0.83+
Data Citizens 22	ORGANIZATION	0.83+
about a year ago	DATE	0.83+
over a decade ago	DATE	0.82+
Collibra Protect	ORGANIZATION	0.82+
over a year	QUANTITY	0.81+
theCUBE	ORGANIZATION	0.81+
Snowflake	EVENT	0.8+
Snowf	TITLE	0.79+
Data Citizens 2022	EVENT	0.76+
over	DATE	0.72+
last 10 years	DATE	0.7+
Data	EVENT	0.67+
Snowflake Lineage 2.0	TITLE	0.64+
Protect	COMMERCIAL_ITEM	0.63+
decade	DATE	0.62+
single day	QUANTITY	0.62+
Data Citizens 2022	TITLE	0.53+
Citizens	ORGANIZATION	0.52+

Stijn Christiaens, Collibra, Data Citizens 22

(Inspiring rock music) >> Hey everyone, I'm Lisa Martin covering Data Citizens 22 brought to you by Collibra. This next conversation is going to focus on the importance of data culture. One of our Cube alumni is back; Stan Christians is Collibra's co-founder and it's Chief Data citizen. Stan, it's great to have you back on theCUBE. >> Hey Lisa, nice to be here. >> So we're going to be talking about the importance of data culture, data intelligence, maturity all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation; it also really requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >> Right. So as you know, our event is called Data Citizens because we believe that, in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations you have a lot of people, most of the employees in an organization, are somehow going to be a data citizen, right? So you need to make sure that these people are aware of it, you need to make sure that these people have the skills and competencies to do with data what is necessary, and that's on all levels, right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss we need to make this decision, that your boss is also open to and able to interpret, you know, the data presented in the dashboard to actually make that decision and take that action. Right? And once you have that "Why" to the organization that's when you have a good data culture. That's a continuous effort for most organizations because they're always moving somehow, they're hiring new people. And it has to be a continuous effort because we've seen that, on the one hand, organizations continue to be challenged with controlling their data sources and where all the data is flowing right? Which in itself creates lot of risk, but also on the other hand of the equation, you have the benefits, you know, you might look at regulatory drivers like we have to do this, right? But it's, it's much better right now to consider the competitive drivers for example. And we did an IDC study earlier this year, quite interesting, I can recommend anyone to read it, and one of the conclusions they found as they surveyed over a thousand people across organizations worldwide, is that the ones who are higher in maturity, so the organizations that really look at data as an asset, look at data as a product and actively try to be better at it don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, okay, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons. I'm doing this for regulatory reasons. You're trying to bring both of those together. And the ones that get data intelligence, right, are just going to be more successful and more competitive. That's our view and that's what we're seeing out there in the market. >> Absolutely. We know that just generally, Stan, right, The organizations that are really creating a a data culture and enabling everybody within the organization to become data citizens are, we know that, in theory, they're more competitive, they're more successful, But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >> Of course, of course it's difficult for an organization to adapt, but it's also necessary as you just said, imagine that, you know, you're a modern day organization, phones, laptops, what have you. You're not using those IT assets, right? Or you know, you're delivering them throughout the organization, but not enabling your colleagues to actually do something with that asset. Same thing is true with data today, right, if you're not properly using the data asset, and your competitors are, they're going to get more advantage. So as to how you get this done or how you establish this culture there's a few angles to look at, I would say. So one angle is obviously the leadership angle whereby whoever is the boss of data in the organization you typically have multiple bosses there, like a chief Data Officer, sometimes there's multiple, but they may have a different title, right? So I'm just going to summarize it as a data leader for a second. So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? >> Lisa: Yes. >> Now, that's one part because then you can clearly see the example of your leadership in the organization, and also the business value, and that's important because those people, their job, in essence, really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that go to right is it's not enough to just have that leadership out there but you also have to get the hearts and minds of the data champions across the organization. You really have to win them over. And if you have those two combined, and obviously good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like ePlus, then you have the pieces in place to really start upgrading that culture inch by inch, if you will. >> Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how, before we went live, we were talking about Collibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what, maybe some of the specific projects are that Collibra's data office is working on. >> Yes. And it is indeed data citizens. There are a ton of speakers here, very excited. You know, we have Barb from MIT speaking about data monetization. We have DJ Patil at the last minute on the agenda so really exciting agenda, can't wait to get back out there. But essentially you're right. So over the years at Collibra, we've been doing this now since 2008, so a good 15 years, and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around, as are we, and myself, you know, when you start a company we were four people in a garage, if you will, so everybody's wearing all sorts of hat at that time. But over the years I've run pre-sales at Collibra, I've run post sales, partnerships, product, et cetera, and as our company got a little bit biggish, we're now 1,200 something like that, people in the company I believe, systems and processes become a lot more important, right? So we said, you know, Collibra isn't the size of our customers yet, but we're getting there in terms of organization, structure, process systems et cetera. So we said it's really time for us to put our money where our mouth is, and to set up our own data office, which is what we were seeing that all of our customers are doing, and which is what we're seeing that organizations worldwide are doing and Gartner was predicting as well. They said, okay, organizations have an HR unit, they have a finance unit, and over time they'll all have a department, if you will, that is responsible somehow for the data. >> Lisa: Hm. >> So we said, okay, let's try to set an example with Collibra. Let's set up our own data office in such a way that other people can take away with it, right? Can take away from it? So we set up a data strategy, we started building data products, took care of the data infrastructure, that sort of good stuff, And in doing all of that, Lisa, exactly as you said, we said, okay, we need to also use our own products and our own practices, right? And from that use, learn how we can make the product better, learn how we can make the practice better and share that learning with all of the markets, of course. And on Monday mornings, we sometimes refer to that as eating our own dog foods, Friday evenings, we refer to that as drinking our own champagne. >> Lisa: I like it. >> So we, we had a (both chuckle) We had the drive do this, you know, there's a clear business reason, so we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should follow. This is just the organization that works at our company, but it can serve as an inspiration. So we have pillars, which is data science, The data product builders, if you will or the people who help the business build data products, we have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products, can run, the data can flow and, you know, the quality can be checked. And then we have a data intelligence or data governance pillar where we have those data governance data intelligence stakeholders who help the business as a sort of data partners to the business stakeholders. So that's how we've organized it. And then we started following the Collibra approach, which is, well, what are the challenges that our business stakeholders have in HR, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a roadmap, and started execution on use case after use case. And a few important ones there are very simple, we see them with all our customers as well, people love talking about the catalog, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in Deagle and privacy, So they have their process registry, and they can see how the data flows. So that's a popular starting place and that turns into a marketplace so that if new analysts and data citizens join Collibra, they immediately have a place to go to to look at what data is out there for me as an analyst or data scientist or whatever, to do my job, right? So they can immediately get access to the data. And another one that we did is around trusted business reporting. We're seeing that, since 2008, you know, self-service BI allowed everyone to make beautiful dashboards, you know, by pie charts. I always, my pet peeve is the pie charts because I love pie, and you shouldn't always be using pie charts, but essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report? They're reporting on the same thing but the numbers seem different, right? So that's why we have trusted business reporting. So we know if the reports, the dashboard, a data product essentially, is built, we know that all the right steps are being followed, and that whoever is consuming that can be quite confident in the result. >> Lisa: Right, and that confidence is absolutely key. >> Exactly. Yes. >> Absolutely. Talk a little bit about some of the the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >> KPIs and measuring is a big topic in the chief data officer profession I would say, and again, it always varies, with respect to your organization, but there's a few that we use that might be of interest to you. So remember you have those three pillars, right? And we have metrics across those pillars. So, for example, a pillar on the data engineering side is going to be more related to that uptime, right? Is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data signs and the products. Are people using them? Are they getting value from it? Can we calculate that value in a monetary perspective, right? >> Lisa: Yes. >> So that we can, to the rest of the business, continue to say, "We're tracking all those numbers and those numbers indicate that value is generated" and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example you have a number of domains in a data mesh [Indistinct] People talk about being the owner a data domain for example, like product or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open? Closed? How many data products are built according to process? And so on and so forth, so these are a set of examples of KPI's. There's a lot more but hopefully those can already inspire the audience. >> Absolutely. So we've, we've talked about the rise of cheap data offices, it's only accelerating. You mentioned this is like a 10-year journey. So if you were to look into a crystal ball, what do you see, in terms of the maturation of data offices over the next decade? >> So we, we've seen, indeed, the role sort of grow up. I think in 2010 there may have been like, 10 chief data officers or something, Gartner has exact numbers on them. But then they grew, you know, 400's they were like mostly in financial services, but they expanded them to all industries and the number is estimated to be about 20,000 right now. >> Wow. >> And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy, support for the digital program and now all about data products, right? So as a data leader, you now need all those competences and need to include them in your strategy. How is that going to evolve for the next couple of years? I wish I had one of those crystal balls, right? But essentially, I think for the next couple of years there's going to be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data officers. So you'll see, over the years that's going to evolve more digital and more data products. So for the next three, five years, my prediction is it's all going to be about data products because it's an immediate link between the data and the dollar essentially. >> Right. >> So that's going to be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up a few years. I think there's going to be a continued challenge for the chief data officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done, will be the ones that are successful, and the ones who get that done will be the ones that do it on the basis of data monetization, right? Connecting value to the data and making that very clear to all the data citizens in the organization, right? >> Right, really creating that value chain. >> In that sense they'll need to have both, you know, technical audiences and non-technical audiences aligned of course, and they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you are waking up data citizens across the organization and you make everyone in the organization think about data as an essence. >> Absolutely, because there's so much value that can be extracted if organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely going to keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show, that you mentioned in that IDC study you mentioned Gartner as well. Organizations have so much more likelihood of being successful and being competitive. So we're going to watch this space. Stan, thank you so much for joining me on theCUBE at Data Citizens 22. We appreciate it. >> Thanks for having me over. >> From Data Citizens 22, I'm Lisa Martin you're watching theCUBE, the leader in live tech coverage. (inspiring rock music) >> Okay, this concludes our coverage of Data Citizens 2022 brought to you by Collibra. Remember, all these videos are available on demand at theCUBE.net. And don't forget to check out siliconangle.com for all the news and wikibon.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR, Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to Collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on theCUBE Your leader in enterprise and emerging tech coverage. We'll see you soon. (inspiring rock music continues)

Published Date : Nov 2 2022

SUMMARY :

brought to you by Collibra. Talk to us about what you is that the ones who that you just mentioned demonstrates And that strategy needs to and minds of the data champions Talk to us about how you are building So we said, you know, of the data infrastructure, We had the drive do this, you know, Lisa: Right, and that Yes. little bit about some of the in the chief data officer profession So that we can, to So if you were to look the number is estimated to So for the next three, five that do it on the basis of that value chain. in the organization think And as the data show, that you you're watching theCUBE, the brought to you by Collibra.

ENTITIES

Entity	Category	Confidence
Collibra	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Lisa	PERSON	0.99+
Lisa Martin	PERSON	0.99+
2010	DATE	0.99+
Stan	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
1,200	QUANTITY	0.99+
Stan Christians	PERSON	0.99+
Barb	PERSON	0.99+
10-year	QUANTITY	0.99+
2008	DATE	0.99+
one angle	QUANTITY	0.99+
one part	QUANTITY	0.99+
ETR	ORGANIZATION	0.99+
both	QUANTITY	0.99+
10 chief data officers	QUANTITY	0.99+
DJ Patil	PERSON	0.99+
15 years	QUANTITY	0.99+
two	QUANTITY	0.99+
Stijn Christiaens	PERSON	0.99+
400	QUANTITY	0.99+
today	DATE	0.99+
siliconangle.com	OTHER	0.98+
IDC	ORGANIZATION	0.98+
MIT	ORGANIZATION	0.98+
three pillars	QUANTITY	0.98+
Cube	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Monday mornings	DATE	0.98+
Enterprise Technology Research	ORGANIZATION	0.97+
four people	QUANTITY	0.97+
One	QUANTITY	0.97+
over a thousand people	QUANTITY	0.97+
second part	QUANTITY	0.97+
three times	QUANTITY	0.97+
theCUBE.net	OTHER	0.97+
Data Citizens	EVENT	0.96+
about 20,000	QUANTITY	0.96+
Data Citizens 22	ORGANIZATION	0.95+
Data Citizens 22	EVENT	0.95+
five years	QUANTITY	0.94+
one set	QUANTITY	0.94+
next decade	DATE	0.94+
Friday evenings	DATE	0.94+
earlier this year	DATE	0.93+
theCUBE	ORGANIZATION	0.92+
next couple of years	DATE	0.89+
next couple of years	DATE	0.89+
first chief	QUANTITY	0.87+
ePlus	TITLE	0.87+
Data	EVENT	0.82+
Collibra.com	OTHER	0.79+
version one	OTHER	0.78+
four levels	QUANTITY	0.76+
version two	OTHER	0.76+
three	QUANTITY	0.73+
Citizens	ORGANIZATION	0.7+
Data Citizens	ORGANIZATION	0.65+
wikibon.com	ORGANIZATION	0.65+
Absolu	PERSON	0.64+
22	EVENT	0.64+
Data Citizens 2022	TITLE	0.63+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for We-DATA: