Image Title

Search Results for first data lake:

Ramesh Prabagaran, Prosimo.io | Defining the Network Supercloud


 

(upbeat music) >> Hello, and welcome to Supercloud2. I'm John Furrier, host of theCUBE here. We're exploring all the new Supercloud trends around multiple clouds, hyper scale gaps in their systems, new innovations, new applications, new companies, new products, new brands emerging from this big inflection point. Got a great guest who's going to unpack it with me today, Ramesh Prabagaran, who's the co-founder and CEO of Prosimo, CUBE alumni. Ramesh, legend in the industry, you've been around. You've seen many cycles. Welcome to Supercloud2. >> Thank you. You're being too kind. >> Well, you know, you guys have been a technical, great technical founding team, multiple ventures, multiple times around the track as they say, but now we're seeing something completely different. This is our second event, kind of we're doing to start the the ball rolling around unpacking this idea of Supercloud which evolved from a riff with me and Dave to now a working group paper, multiple definitions. People are saying they're Supercloud. CloudFlare says this is their version. Someone says there over there. Fitzi over there in the blog is always, you know, challenging us on our definitions, but it's, the consensus is though something's happening. >> Ramesh: Absolutely. >> And what's your take on this kind of big inflection point? >> Absolutely, so if you just look at kind of this in layers right, so you have hyper scalers that are innovating really quickly on underlying capabilities, and then you have enterprises adopting these technologies, right, there is a layer in the middle that I would say is largely missing, right? And one that addresses the gaps introduced by these new capabilities, by the hyper scalers. At the same time, one that actually spans, let's say multiple regions, multiple clouds and so forth. So that to me is kind of the Supercloud layer of sorts. One that helps enterprises adopt the underlying hyper scaler capabilities a lot faster, and at the same time brings a certain level of consistency and homogeneity also. >> What do you think the big driver of Supercloud is? Is it the industry growing up or is it the demand for new kinds of capabilities or both? Or just evolution? What's your take? >> I would say largely it depends on kind of who the entity is that you're talking about, right? And so I would say both. So if you look at one cohort here, it's adoption, right? If I have a externally facing digital presence, for example, then I'm going to scale that up and get to as many subscribers and users no matter what, right? And at that time it's a different set of problems. If you're looking at kind of traditional enterprise inward that are bringing apps into the cloud and so forth, it's a different set of care abouts, right? So both are, I would say, equally important problems to solve for. >> Well, one reality that we're definitely tracking, and it's not really a debate anymore, is hybrid. >> Ramesh: Yep >> Hybrid happened. It happened faster than most people thought. But, you know, we were talking about this in 2015 when it first got kicked around, but now you see hybrid in the cloud, on premises and the edge. This kind of forms that distributed computing paradigm that we've always been predicting. And so if that continues to play out the way it is, you're now going to have a completely distributed, connected internet and sets of systems, intra and external within companies. So again, the world is connected 100%. Everything's changing, right? >> And that introduces. >> It wasn't your grandfather's networking anymore or storage. The game is still the same, but the play, the components are acting differently. What's your take on this? >> Absolutely. No, absolutely. That's a very key important point, and it's one that we always ask our customers right at the front end, right? Because your starting assumptions matter. If you have workloads of workloads in the cloud and data center is something that you want to connect into, then you'll make decisions kind of keeping cloud in the center and then kind of bolt on technologies for what that means to extend it to the data center. If your center of gravity is in the data center, and then cloud is let's say 10% right now, but you see that growing, then what choices do you have? Right, do you want to bring your data center technologies into the cloud because you want that consistency in operations? Or do you want to start off fresh, right? So this is a really key, important question, and one that many of our customers are actually are grappling with, right? They have this notion that going cloud native is the right approach, but at the same time that means I have a bifurcation in kind of how do I operate my data center versus my cloud, right? Two different operating models, and slowly it'll shift over to one. But you're going to have to deal with dual reality for a while. >> I was talking to an old friend of mine, CIO, very experienced CIO. Big time company, large deployment, a lot of IT. I said, so what's the big trend everyone's telling me about IT's going. He goes no, not really. IT's not going away for me. It's going everywhere in the company. >> Ramesh: Exactly. >> So I need to scale my IT-like capabilities everywhere and then make it invisible. >> Ramesh: Correct. >> Which is essentially code words for saying it's going to be completely cloud native everywhere. This is what is happening. Do you agree? >> Absolutely right, and so if you look at what do enterprises care about it? The reason to go to the cloud is to get speed of operations, and it's apps, apps, apps, right? Do you ever have a conversation on networking and infrastructure first? No, that kind of gets brought into the conversation because you want to deal with users, applications and services, right? And so the end goal is essentially how do users communicate with apps and get the right experience, security and whatnot, and how do apps talk to each other and make sure that you get all of the connectivity and security requirements? Underneath the covers, what does this mean for infrastructure, networking, security and whatnot? It's actually going to be someone else's job, right? And you shouldn't have to think too much about it. So this whole notion of kind of making that transparent is real actually, right? But at the same time, us and all the guys that we talk to on the customer side, that's their job, right? Like we have to work towards making that transparent. Some are going to be in the form of capability, some are going to be driven by data, but that's really where the two worlds are going to come together. >> Lots of debates going on. We just heard from Bob Muglia here on Supercloud2. He said Supercloud's a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. So the question that's being debated is is Supercloud a platform or an architecture in your view? >> Okay, that's a tough one actually. I'm going to side on the side on kind of the platform side right, and the reason for that is architectural choices are things that you make ahead of time. And you, once you're in, there really isn't a fork in the road, right? Platforms continue to evolve. You can iterate, innovate and so on and so forth. And so I'm thinking Supercloud is more of a platform because you do have a choice. Hey, am I going AWS, Azure, GCP. You make that choice. What is my center of gravity? You make that choice. That's kind of an architectural decision, right? Once you make that, then how do I make things work consistently across like two or three clouds? That's a platform choice. >> So who's responsible for the architecture as the platform, the vendor serving the platform or is the platform vendor agnostic? >> You know, this is where you have to kind of peel the onion in layers, right? If you talk about applications, you can't go to a developer team or an app team and say I want you to operate on Google or AWS. They're like I'll pick the cloud that I want, right? Now who are we talking to? The infrastructure guys and the networking guys, right? They want to make sure that it's not bifurcated. It's like, hey, I want to make sure whatever I build for AWS I can equally use that on Azure. I can equally use that on GCP. So if you're talking to more of the application centric teams who really want infrastructure to be transparent, they'll say, okay, I want to make this choice of whether this is AWS, Azure, GCP, and stick to that. And if you come kind of down the layers of the stack into infrastructure, they are thinking a little more holistically, a little more Supercloud, a little more multicloud, and that. >> That's a good point. So that brings up the deployment question. >> Ramesh: Exactly! >> I want to ask you the next question, okay, what is the preferred deployment in your opinion for a Supercloud narrative? Is it single instance, spread it around everywhere? What's the, do you have a single global instance or do you have everything synchronized? >> So I would say first layer of that Supercloud really kind of fix the holes that have been introduced as a result of kind of adopting the hyper scaler technologies, right? So each, the hyper scalers have been really good at innovating and providing really massive scale elastic capabilities, right? But once you start to build capabilities on top of that to help serve the application, there's a few holes start to show up. So first job of Supercloud really is to plug those holes, right? Second is can I get to an operating model, so that I can replicate this not just in a single region, but across multiple regions, same cloud, and then across multiple clouds, right? And so both of those need to be solved for in order to be (cross talking). >> So is that multiple instantiations of the stack or? >> Yeah, so this again depends on kind of the capability, right? So if you take a more solution view, and so I can speak for kind of networking security combined right? There you always take a solution view. You don't ever look at, you know, what does this mean for a single instance in a single region. You take a macro view, and then you then break it down into what does this mean for region, what does it mean for instance, what does this mean for AZs? And so on and so forth. So you kind of have to go top to bottom. >> Okay, welcome you down into the trap now. Okay, synchronizing the data, latency, these are all questions. So what does the network Supercloud look like to you? Because networking is big here. >> Ramesh: Yes, absolutely. >> This is what you guys do. >> Exactly, yeah. So the different set of problems as you go up the stack, right? So if you have hundreds of workloads in a single region, the set of problems you're dealing with there are kind of app native connectivity, how do I go from kind of east/west, all of those fun things, right? Which are usually bound in terms of latency. You don't have those challenges as much, but can you build your entire enterprise application architecture in one region? No, you're going to have to create multiple instances, right? So my data lake is invariably going to be in one place. My business logic is going to be spread across a few places. What does that bring in? I need to go across regions. Am I going to put those two regions right next to each other? No, I'm not going to, right? I'm going to have places in Europe. I'm going to have APAC, and I'm going to have a North American presence, and I need to bring all these things together. So this is where, back to your point, latency really matters, right? Because I need to be able to find out not just best path but also how do I reduce the millisecond, microseconds that my application cares about, which brings in a layer of optimization and then so on and so on and so forth. So this is what we call kind of to borrow the Prosimo language full stack networking, right? Because I'm not just dealing with how do I go from one region to another because that's laws of physics. I can only control so much. But there are a few elements up the application stack in software that you can tweak to actually bring these things closer and closer. >> And on that point, you're seeing security being talked a lot more at the network layer. So how do you secure the Supercloud at the network layer? What's that look like? >> Yeah, we've been grappling with essentially is security kind of foundational, and then is the network on top. And then we had an alternative viewpoint which is kind of network and then security on top. And the answer is actually it's neither, right? It's almost like a meshed up sandwich of sorts. So you need to have networking security work really well together, right? Case in point, I mean we were talking to a customer yesterday. He said, hey, I have my data lake in one region that needs to talk to an analytics service in a completely different region of a different cloud. These two things just need to be able to talk to each other, which means I need to bring elements of networking. I need to bring elements of security, secure access, app segmentation, all of those things. Very simple, I have an analytics service that needs to contact a data lake. That's what he starts with, but then before you know it, it actually brings up a whole stack underneath, so that's. >> VMware calls that cloud chaos. >> Ramesh: Yes, exactly. >> And then that's the halfway point between cloud smart. Cloud first, cloud chaos, cloud smart, and the next thing, you can skip that whole step. But again, again, it's pick your strategy right? Again, this comes back down to your earlier point. I want to ask you from a customer standpoint, you got the hyper scalers doing very, very well. >> Ramesh: Yep, absolutely. >> And I love what their Amazon's doing. I think Microsoft again though they had a little bit of downgrade are catching up fast, and they have their installed base. So you got the land of the installed bases. >> Correct. >> First and greater, better cloud. Install base getting better, almost as good, almost as good is a gift, but close. Now you have them specializing. Silicon, special silicon. So there's gaps for other services. >> Ramesh: Correct. >> And Amazon Web Services, Adam Selipsky's a open book saying, hey, we want our ecosystem to pick up these gaps and build on them. Go ahead, go to town. >> So this is where I think choices are tough, right? Because if you had one choice, you would work with it, and you would work around it, right? Now I have five different choices. Now what do I do? Our viewpoint is there are a bunch of things that say AWS does really, really well. Use that as a foundational layer, right? Like don't reinvent the wheel on those things. Transit gateways, global accelerators and whatnot, they exist for a reason. Billions of dollars have gone into building those things. Use that foundational layer, right? But what you want to build on top of that is actually driven by the application. The requirements of a lambda application that's serverless, it's very different than a packaged application that's responding for transactions, right? Like it's just completely very, very different. And so bring in the right set of capabilities required for those set of applications, and then you go based on that. This is also where I think whether something is a regional construct versus an overall global construct really, really matters, right? Because if you start with the assumption that everything is going to be built regionally, then it's someone else's job to make sure that all of these things are connected. But if you start with kind of the global purview, then the rest of them start to (cross talking). >> What are some of the things that the enterprises might want that are gaps that are going to be filled by the, by startups like you guys and the ecosystem because we're seeing the ecosystem form into two big camps. >> Ramesh: Yep. >> ISVs, which is an old school definition of independent software vendor, aka someone who writes software. >> Ramesh: Exactly. >> SaaS app. >> Ramesh: Correct. >> And then ecosystem software players that were once ISVs now have people building on top of them. >> Ramesh: Correct. >> They're building on top of the cloud. So you have that new hyper scale effect going on. >> Ramesh: Exactly. >> You got ISVs, which is software developers, software vendors. >> Ramesh: Correct. >> And ecosystems. >> Yep. >> What's that impact of that? Cause it's a new dynamic. >> Exactly, so if you take kind of enterprises, want to make sure that that their apps and the data center migrate to the cloud, new apps are developed the right way in the cloud, right? So that's kind of table stakes. So now what choices do they have? They listen to AWS and say, okay, I have all these cloud native services. I want to be able to instantiate all that. Now comes the interesting choice that they have to make. Do I go hire a whole bunch of people and do it myself or do I go there on the platform route, right? Because I made an architectural choice. Now I have to decide whether I want to do this myself or the platform choice. DIY works great for some, but you don't know what you're getting into, and it's people involved, right? People, process, all those fun things involved, right? So we show up there and say, you don't know what you don't know, right? Like because that's the nature of it. Why don't you invest in a platform like what what we provide, and then you actually build on top of it. We will, it's our job to make sure that we keep up with the innovation happening underneath the covers. And at the same time, this is not a closed ended system. You can actually build on top of our platform, right? And so that actually gives you a good mix. Now the care abouts are interesting. Some apps care about experience. Some apps care about latency. Some apps are extremely charty and extremely data intensive, but nobody wants to pay for it, right? And so it's a interesting Jenga that you have to play between experience versus security versus cost, right? And that makes kind of head of infrastructure and cloud platform teams' life really, really, really interesting. >> And this is why I love your background, and Stu Miniman, when he was with theCUBE, and now he's at Red Hat, we used to riff about the network and how network folks are now, those concepts are now up the top of the stack because the cloud is one big network effect. >> Ramesh: Exactly, correct. >> It's a computer. >> Yep, absolutely. No, and case in point, right, like say we're in let's say in San Jose here or or Palo Alto here, and let's say my application is sitting in London, right? The cloud gives you different express lanes. I can go down to my closest pop location provided by AWS and then I can go ride that all the way up to up to London. It's going to give me better performance, low latency, but I'm going to have to incur some costs associated with it. Or I can go all the wild internet all the way from Palo Alta up to kind of the ingress point into London and then go access, but I'm spending time on the wild internet, which means all kinds of fun things happen, right? But I'm not paying much, but my experience is not going to be so great. So, and there are various degrees of shade in them, of gray in the middle, right? So how do you pick what? It all kind of is driven by the applications. >> Well, we certainly want you back for Supercloud3, our next version of this virtual/live event here in our Palo Alto studios. Really appreciate you coming on. >> Absolutely. >> While you're here, give a quick plug for the company. Next minute, we can take a minute to talk about the success of the company. >> Ramesh: Absolutely. >> I know you got a fresh financing this past year. Plenty of money in the bank, going to ride this new wave, Supercloud wave. Give us a quick plug. >> Absolutely, yeah. So three years going on to four this calendar year. So it's an interesting time for the company. We have proven that our technology, product and our initial customers are quite happy with it. Now comes essentially more of those and scale and so forth. That's kind of the interesting phase that we are in. Also heartened to see quite a few of kind of really large and dominant players in the market, partners, channels and so forth, invest in us to take this to the next set of customers. I would say there's been a dramatic shift in the conversation with our customers. The first couple of years or so of the company, we are about three years old right now, was really about us educating them. This is what you need. This is what you need. Now actually it's a lot of just pull, right? We've seen a good indication, as much as a hate RFIs, a good indication is the number of RFIs that show up at our door saying we want you to participate in this because we want to understand more, right? And so as a, I think we are at an interesting point of the, of that shift. >> RFIs always like do all this work and hope for the best. Pray for a deal. You know, you guys on the right side of history. If a customer asks with respect to Supercloud, multicloud, is that your focus? Is that the direction you guys are going into? >> Yeah, so I would say we are kind of both, right? Supercloud and multicloud because we, our customers are hybrid, multiple clouds, all of the above, right? Our main pitch and kind of value back to the customers is go embrace cloud native because that's the right approach, right? It doesn't make sense to go reinvent the wheel on that one, but then make a really good choice about whether you want to do this yourself or invest in a platform to make your life easy. Because we have seen this story play out with many many enterprises, right? They pick the right technologies. They do a simple POC overnight, and they say, yeah, I can make this work for two apps, right? And then they say, yes, I can make this work for 100. You go down a certain path. You hit a wall. You hit a wall, and it's a hard wall. It's like, no, there isn't a thing that you can go around it. >> A lot of dead bodies laying around. >> Ramesh: Exactly. >> Dead wall. >> And then they have to unravel around that, and then they come talk to us, and they say, okay, now what? Like help me, help me through this journey. So I would say to the extent that you can do this diligence ahead of time, do that, and then, and then pick the right platform. >> You've got to have the talent. And you got to be geared up. You got to know what you're getting into. >> Ramesh: Exactly. >> You got to have the staff to do this. >> And cloud talent and skillset in particular, I mean there's lots available but it's in pockets right? And if you look at kind of web three companies, they've gone and kind of amassed all those guys, right? So enterprises are not left with the cream of the crop. >> John: They might be coming back. >> Exactly, exactly, so. >> With this downturn. Ramesh, great to see you and thanks for contributing to Supercloud2, and again, love your team. Very technical team, and you're in the right side of history in this one. Congratulations. >> Ramesh: No, and thank you, thank you very much. >> Okay, this is Supercloud2. I'm John Furrier with Dave Vellante. We'll be back right after this short break. (upbeat music)

Published Date : Feb 17 2023

SUMMARY :

Ramesh, legend in the You're being too kind. blog is always, you know, And one that addresses the gaps and get to as many subscribers and users and it's not really a This kind of forms that The game is still the same, but the play, and it's one that we It's going everywhere in the company. So I need to scale my it's going to be completely and make sure that you get So the question that's being debated is on kind of the platform side kind of peel the onion in layers, right? So that brings up the deployment question. And so both of those need to be solved for So you kind of have to go top to bottom. down into the trap now. in software that you can tweak So how do you secure the that needs to talk to an analytics service and the next thing, you So you got the land of Now you have them specializing. ecosystem to pick up these gaps and then you go based on that. and the ecosystem of independent software vendor, that were once ISVs now have So you have that new hyper is software developers, What's that impact of that? and the data center migrate to the cloud, because the cloud is of gray in the middle, right? you back for Supercloud3, quick plug for the company. Plenty of money in the bank, That's kind of the interesting Is that the direction all of the above, right? and then they come talk to us, And you got to be geared up. And if you look at kind Ramesh, great to see you Ramesh: No, and thank Okay, this is Supercloud2.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RameshPERSON

0.99+

EuropeLOCATION

0.99+

Dave VellantePERSON

0.99+

Ramesh PrabagaranPERSON

0.99+

Bob MugliaPERSON

0.99+

AWSORGANIZATION

0.99+

2015DATE

0.99+

GoogleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

LondonLOCATION

0.99+

San JoseLOCATION

0.99+

JohnPERSON

0.99+

10%QUANTITY

0.99+

DavePERSON

0.99+

John FurrierPERSON

0.99+

Adam SelipskyPERSON

0.99+

twoQUANTITY

0.99+

AmazonORGANIZATION

0.99+

Stu MinimanPERSON

0.99+

100%QUANTITY

0.99+

100QUANTITY

0.99+

two appsQUANTITY

0.99+

yesterdayDATE

0.99+

bothQUANTITY

0.99+

Palo AltoLOCATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

Palo AltaLOCATION

0.99+

SecondQUANTITY

0.99+

two regionsQUANTITY

0.99+

APACORGANIZATION

0.99+

FirstQUANTITY

0.99+

one choiceQUANTITY

0.99+

second eventQUANTITY

0.99+

two thingsQUANTITY

0.99+

three yearsQUANTITY

0.99+

ProsimoORGANIZATION

0.99+

Billions of dollarsQUANTITY

0.99+

Red HatORGANIZATION

0.99+

one regionQUANTITY

0.98+

multicloudORGANIZATION

0.98+

five different choicesQUANTITY

0.98+

hundredsQUANTITY

0.98+

eachQUANTITY

0.98+

first layerQUANTITY

0.98+

firstQUANTITY

0.97+

two worldsQUANTITY

0.97+

SupercloudORGANIZATION

0.97+

oneQUANTITY

0.97+

single instanceQUANTITY

0.97+

Supercloud2ORGANIZATION

0.97+

two big campsQUANTITY

0.97+

one realityQUANTITY

0.96+

three companiesQUANTITY

0.96+

todayDATE

0.96+

SaaSTITLE

0.95+

CloudFlareORGANIZATION

0.95+

first couple of yearsQUANTITY

0.95+

CUBEORGANIZATION

0.94+

first jobQUANTITY

0.94+

Supercloud waveEVENT

0.94+

AzureORGANIZATION

0.94+

three cloudsQUANTITY

0.93+

AWS Startup Showcase S3E1


 

(soft music) >> Hello everyone, welcome to this Cube conversation here from the studios of theCube in Palo Alto, California. John Furrier, your host. We're featuring a startup, Astronomer, astronomer.io is the url. Check it out. And we're going to have a great conversation around one of the most important topics hitting the industry, and that is the future of machine learning and AI and the data that powers it underneath it. There's a lot of things that need to get done, and we're excited to have some of the co-founders of Astronomer here. Viraj Parekh, who is co-founder and Paola Peraza Calderon, another co-founder, both with Astronomer. Thanks for coming on. First of all, how many co-founders do you guys have? >> You know, I think the answer's around six or seven. I forget the exact, but there's really been a lot of people around the table, who've worked very hard to get this company to the point that it's at. And we have long ways to go, right? But there's been a lot of people involved that are, have been absolutely necessary for the path we've been on so far. >> Thanks for that, Viraj, appreciate that. The first question I want to get out on the table, and then we'll get into some of the details, is take a minute to explain what you guys are doing. How did you guys get here? Obviously, multiple co-founders sounds like a great project. The timing couldn't have been better. ChatGPT has essentially done so much public relations for the AI industry. Kind of highlight this shift that's happening. It's real. We've been chronologicalizing, take a minute to explain what you guys do. >> Yeah, sure. We can get started. So yeah, when Astronomer, when Viraj and I joined Astronomer in 2017, we really wanted to build a business around data and we were using an open source project called Apache Airflow, that we were just using sort of as customers ourselves. And over time, we realized that there was actually a market for companies who use Apache Airflow, which is a data pipeline management tool, which we'll get into. And that running Airflow is actually quite challenging and that there's a lot of, a big opportunity for us to create a set of commercial products and opportunity to grow that open source community and actually build a company around that. So the crux of what we do is help companies run data pipelines with Apache Airflow. And certainly we've grown in our ambitions beyond that, but that's sort of the crux of what we do for folks. >> You know, data orchestration, data management has always been a big item, you know, in the old classic data infrastructure. But with AI you're seeing a lot more emphasis on scale, tuning, training. You know, data orchestration is the center of the value proposition when you're looking at coordinating resources, it's one of the most important things. Could you guys explain what data orchestration entails? What does it mean? Take us through the definition of what data orchestration entails. >> Yeah, for sure. I can take this one and Viraj feel free to jump in. So if you google data orchestration, you know, here's what you're going to get. You're going to get something that says, data orchestration is the automated process for organizing silo data from numerous data storage points to organizing it and making it accessible and prepared for data analysis. And you say, okay, but what does that actually mean, right? And so let's give sort of an example. So let's say you're a business and you have sort of the following basic asks of your data team, right? Hey, give me a dashboard in Sigma, for example, for the number of customers or monthly active users and then make sure that that gets updated on an hourly basis. And then number two, a consistent list of active customers that I have in HubSpot so that I can send them a monthly product newsletter, right? Two very basic asks for all sorts of companies and organizations. And when that data team, which has data engineers, data scientists, ML engineers, data analysts get that request, they're looking at an ecosystem of data sources that can help them get there, right? And that includes application databases, for example, that actually have end product user behavior and third party APIs from tools that the company uses that also has different attributes and qualities of those customers or users. And that data team needs to use tools like Fivetran, to ingest data, a data warehouse like Snowflake or Databricks to actually store that data and do analysis on top of it, a tool like DBT to do transformations and make sure that that data is standardized in the way that it needs to be, a tool like Hightouch for reverse ETL. I mean, we could go on and on. There's so many partners of ours in this industry that are doing really, really exciting and critical things for those data movements. And the whole point here is that, you know, data teams have this plethora of tooling that they use to both ingest the right data and come up with the right interfaces to transform and interact with that data. And data orchestration in our view is really the heartbeat of all of those processes, right? And tangibly the unit of data orchestration, you know, is a data pipeline, a set of tasks or jobs that each do something with data over time and eventually run that on a schedule to make sure that those things are happening continuously as time moves on. And, you know, the company advances. And so, you know, for us, we're building a business around Apache Airflow, which is a workflow management tool that allows you to author, run and monitor data pipelines. And so when we talk about data orchestration, we talk about sort of two things. One is that crux of data pipelines that, like I said, connect that large ecosystem of data tooling in your company. But number two, it's not just that data pipeline that needs to run every day, right? And Viraj will probably touch on this as we talk more about Astronomer and our value prop on top of Airflow. But then it's all the things that you need to actually run data and production and make sure that it's trustworthy, right? So it's actually not just that you're running things on a schedule, but it's also things like CI/CD tooling, right? Secure secrets management, user permissions, monitoring, data lineage, documentation, things that enable other personas in your data team to actually use those tools. So long-winded way of saying that, it's the heartbeat that we think of the data ecosystem and certainly goes beyond scheduling, but again, data pipelines are really at the center of it. >> You know, one of the things that jumped out Viraj, if you can get into this, I'd like to hear more about how you guys look at all those little tools that are out there. You mentioned a variety of things. You know, if you look at the data infrastructure, it's not just one stack. You've got an analytic stack, you've got a realtime stack, you've got a data lake stack, you got an AI stack potentially. I mean you have these stacks now emerging in the data world that are >> Yeah. - >> fundamental, but we're once served by either a full package, old school software, and then a bunch of point solution. You mentioned Fivetran there, I would say in the analytics stack. Then you got, you know, S3, they're on the data lake stack. So all these things are kind of munged together. >> Yeah. >> How do you guys fit into that world? You make it easier or like, what's the deal? >> Great question, right? And you know, I think that one of the biggest things we've found in working with customers over, you know, the last however many years, is that like if a data team is using a bunch of tools to get what they need done and the number of tools they're using is growing exponentially and they're kind of roping things together here and there, that's actually a sign of a productive team, not a bad thing, right? It's because that team is moving fast. They have needs that are very specific to them and they're trying to make something that's exactly tailored to their business. So a lot of times what we find is that customers have like some sort of base layer, right? That's kind of like, you know, it might be they're running most of the things in AWS, right? And then on top of that, they'll be using some of the things AWS offers, you know, things like SageMaker, Redshift, whatever. But they also might need things that their Cloud can't provide, you know, something like Fivetran or Hightouch or anything of those other tools and where data orchestration really shines, right? And something that we've had the pleasure of helping our customers build, is how do you take all those requirements, all those different tools and whip them together into something that fulfills a business need, right? Something that makes it so that somebody can read a dashboard and trust the number that it says or somebody can make sure that the right emails go out to their customers. And Airflow serves as this amazing kind of glue between that data stack, right? It's to make it so that for any use case, be it ELT pipelines or machine learning or whatever, you need different things to do them and Airflow helps tie them together in a way that's really specific for a individual business's needs. >> Take a step back and share the journey of what your guys went through as a company startup. So you mentioned Apache open source, you know, we were just, I was just having an interview with the VC, we were talking about foundational models. You got a lot of proprietary and open source development going on. It's almost the iPhone, Android moment in this whole generative space and foundational side. This is kind of important, the open source piece of it. Can you share how you guys started? And I can imagine your customers probably have their hair on fire and are probably building stuff on their own. How do you guys, are you guys helping them? Take us through, 'cuz you guys are on the front end of a big, big wave and that is to make sense of the chaos, reigning it in. Take us through your journey and why this is important. >> Yeah Paola, I can take a crack at this and then I'll kind of hand it over to you to fill in whatever I miss in details. But you know, like Paola is saying, the heart of our company is open source because we started using Airflow as an end user and started to say like, "Hey wait a second". Like more and more people need this. Airflow, for background, started at Airbnb and they were actually using that as the foundation for their whole data stack. Kind of how they made it so that they could give you recommendations and predictions and all of the processes that need to be or needed to be orchestrated. Airbnb created Airflow, gave it away to the public and then, you know, fast forward a couple years and you know, we're building a company around it and we're really excited about that. >> That's a beautiful thing. That's exactly why open source is so great. >> Yeah, yeah. And for us it's really been about like watching the community and our customers take these problems, find solution to those problems, build standardized solutions, and then building on top of that, right? So we're reaching to a point where a lot of our earlier customers who started to just using Airflow to get the base of their BI stack down and their reporting and their ELP infrastructure, you know, they've solved that problem and now they're moving onto things like doing machine learning with their data, right? Because now that they've built that foundation, all the connective tissue for their data arriving on time and being orchestrated correctly is happening, they can build the layer on top of that. And it's just been really, really exciting kind of watching what customers do once they're empowered to pick all the tools that they need, tie them together in the way they need to, and really deliver real value to their business. >> Can you share some of the use cases of these customers? Because I think that's where you're starting to see the innovation. What are some of the companies that you're working with, what are they doing? >> Raj, I'll let you take that one too. (all laughing) >> Yeah. (all laughing) So you know, a lot of it is, it goes across the gamut, right? Because all doesn't matter what you are, what you're doing with data, it needs to be orchestrated. So there's a lot of customers using us for their ETL and ELT reporting, right? Just getting data from all the disparate sources into one place and then building on top of that, be it building dashboards, answering questions for the business, building other data products and so on and so forth. From there, these use cases evolve a lot. You do see folks doing things like fraud detection because Airflow's orchestrating how transactions go. Transactions get analyzed, they do things like analyzing marketing spend to see where your highest ROI is. And then, you know, you kind of can't not talk about all of the machine learning that goes on, right? Where customers are taking data about their own customers kind of analyze and aggregating that at scale and trying to automate decision making processes. So it goes from your most basic, what we call like data plumbing, right? Just to make sure data's moving as needed. All the ways to your more exciting and sexy use cases around like automated decision making and machine learning. >> And I'd say, I mean, I'd say that's one of the things that I think gets me most excited about our future is how critical Airflow is to all of those processes, you know? And I think when, you know, you know a tool is valuable is when something goes wrong and one of those critical processes doesn't work. And we know that our system is so mission critical to answering basic, you know, questions about your business and the growth of your company for so many organizations that we work with. So it's, I think one of the things that gets Viraj and I, and the rest of our company up every single morning, is knowing how important the work that we do for all of those use cases across industries, across company sizes. And it's really quite energizing. >> It was such a big focus this year at AWS re:Invent, the role of data. And I think one of the things that's exciting about the open AI and all the movement towards large language models, is that you can integrate data into these models, right? From outside, right? So you're starting to see the integration easier to deal with, still a lot of plumbing issues. So a lot of things happening. So I have to ask you guys, what is the state of the data orchestration area? Is it ready for disruption? Is it already been disrupted? Would you categorize it as a new first inning kind of opportunity or what's the state of the data orchestration area right now? Both, you know, technically and from a business model standpoint, how would you guys describe that state of the market? >> Yeah, I mean I think, I think in a lot of ways we're, in some ways I think we're categoric rating, you know, schedulers have been around for a long time. I recently did a presentation sort of on the evolution of going from, you know, something like KRON, which I think was built in like the 1970s out of Carnegie Mellon. And you know, that's a long time ago. That's 50 years ago. So it's sort of like the basic need to schedule and do something with your data on a schedule is not a new concept. But to our point earlier, I think everything that you need around your ecosystem, first of all, the number of data tools and developer tooling that has come out the industry has, you know, has some 5X over the last 10 years. And so obviously as that ecosystem grows and grows and grows and grows, the need for orchestration only increases. And I think, you know, as Astronomer, I think we, and there's, we work with so many different types of companies, companies that have been around for 50 years and companies that got started, you know, not even 12 months ago. And so I think for us, it's trying to always category create and adjust sort of what we sell and the value that we can provide for companies all across that journey. There are folks who are just getting started with orchestration and then there's folks who have such advanced use case 'cuz they're hitting sort of a ceiling and only want to go up from there. And so I think we as a company, care about both ends of that spectrum and certainly have want to build and continue building products for companies of all sorts, regardless of where they are on the maturity curve of data orchestration. >> That's a really good point Paola. And I think the other thing to really take into account is it's the companies themselves, but also individuals who have to do their jobs. You know, if you rewind the clock like five or 10 years ago, data engineers would be the ones responsible for orchestrating data through their org. But when we look at our customers today, it's not just data engineers anymore. There's data analysts who sit a lot closer to the business and the data scientists who want to automate things around their models. So this idea that orchestration is this new category is spot on, is right on the money. And what we're finding is it's spreading, the need for it, is spreading to all parts of the data team naturally where Airflows have emerged as an open source standard and we're hoping to take things to the next level. >> That's awesome. You know, we've been up saying that the data market's kind of like the SRE with servers, right? You're going to need one person to deal with a lot of data and that's data engineering and then you're going to have the practitioners, the democratization. Clearly that's coming in what you're seeing. So I got to ask, how do you guys fit in from a value proposition standpoint? What's the pitch that you have to customers or is it more inbound coming into you guys? Are you guys doing a lot of outreach, customer engagements? I'm sure they're getting a lot of great requirements from customers. What's the current value proposition? How do you guys engage? >> Yeah, I mean we've, there's so many, there's so many. Sorry Raj, you can jump in. - >> It's okay. So there's so many companies using Airflow, right? So our, the baseline is that the open source project that is Airflow that was, that came out of Airbnb, you know, over five years ago at this point, has grown exponentially in users and continues to grow. And so the folks that we sell to primarily are folks who are already committed to using Apache Airflow, need data orchestration in the organization and just want to do it better, want to do it more efficiently, want to do it without managing that infrastructure. And so our baseline proposition is for those organizations. Now to Raj's point, obviously I think our ambitions go beyond that, both in terms of the personas that we addressed and going beyond that data engineer, but really it's for, to start at the baseline. You know, as we continue to grow our company, it's really making sure that we're adding value to folks using Airflow and help them do so in a better way, in a larger way and a more efficient way. And that's really the crux of who we sell to. And so to answer your question on, we actually, we get a lot of inbound because they're are so many - >> A built-in audience. >> In the world that use it, that those are the folks who we talk to and come to our website and chat with us and get value from our content. I mean the power of the open source community is really just so, so big. And I think that's also one of the things that makes this job fun, so. >> And you guys are in a great position, Viraj, you can comment, to get your reaction. There's been a big successful business model to starting a company around these big projects for a lot of reasons. One is open source is continuing to be great, but there's also supply chain challenges in there. There's also, you know, we want to continue more innovation and more code and keeping it free and and flowing. And then there's the commercialization of product-izing it, operationalizing it. This is a huge new dynamic. I mean, in the past, you know, five or so years, 10 years, it's been happening all on CNCF from other areas like Apache, Linux Foundation, they're all implementing this. This is a huge opportunity for entrepreneurs to do this. >> Yeah, yeah. Open source is always going to be core to what we do because, you know, we wouldn't exist without the open source community around us. They are huge in numbers. Oftentimes they're nameless people who are working on making something better in a way that everybody benefits from it. But open source is really hard, especially if you're a company whose core competency is running a business, right? Maybe you're running e-commerce business or maybe you're running, I don't know, some sort of like any sort of business, especially if you're a company running a business, you don't really want to spend your time figuring out how to run open source software. You just want to use it, you want to use the best of it, you want to use the community around it. You want to take, you want to be able to google something and get answers for it. You want the benefits of open source. You don't want to have, you don't have the time or the resources to invest in becoming an expert in open source, right? And I think that dynamic is really what's given companies like us an ability to kind of form businesses around that, in the sense that we'll make it so people get the best of both worlds. You'll get this vast open ecosystem that you can build on top of, you can benefit from, that you can learn from, but you won't have to spend your time doing undifferentiated heavy lifting. You can do things that are just specific to your business. >> It's always been great to see that business model evolved. We used to debate 10 years ago, can there be another red hat? And we said, not really the same, but there'll be a lot of little ones that'll grow up to be big soon. Great stuff. Final question, can you guys share the history of the company, the milestones of the Astronomer's journey in data orchestration? >> Yeah, we could. So yeah, I mean, I think, so Raj and I have obviously been at astronomer along with our other founding team and leadership folks, for over five years now. And it's been such an incredible journey of learning, of hiring really amazing people. Solving again, mission critical problems for so many types of organizations. You know, we've had some funding that has allowed us to invest in the team that we have and in the software that we have. And that's been really phenomenal. And so that investment, I think, keeps us confident even despite these sort of macroeconomic conditions that we're finding ourselves in. And so honestly, the milestones for us are focusing on our product, focusing on our customers over the next year, focusing on that market for us, that we know can get value out of what we do. And making developers' lives better and growing the open source community, you know, and making sure that everything that we're doing makes it easier for folks to get started to contribute to the project and to feel a part of the community that we're cultivating here. >> You guys raised a little bit of money. How much have you guys raised? >> I forget what the total is, but it's in the ballpark of 200, over $200 million. So it feels good - >> A little bit of capital. Got a little bit of cash to work with there. Great success. I know it's a Series C financing, you guys been down, so you're up and running. What's next? What are you guys looking to do? What's the big horizon look like for you? And from a vision standpoint, more hiring, more product, what is some of the key things you're looking at doing? >> Yeah, it's really a little of all of the above, right? Like, kind of one of the best and worst things about working at earlier stage startups is there's always so much to do and you often have to just kind of figure out a way to get everything done, but really invest in our product over the next, at least the next, over the course of our company lifetime. And there's a lot of ways we wanting to just make it more accessible to users, easier to get started with, easier to use all kind of on all areas there. And really, we really want to do more for the community, right? Like I was saying, we wouldn't be anything without the large open source community around us. And we want to figure out ways to give back more in more creative ways, in more code driven ways and more kind of events and everything else that we can do to keep those folks galvanized and just keeping them happy using Airflow. >> Paola, any final words as we close out? >> No, I mean, I'm super excited. You know, I think we'll keep growing the team this year. We've got a couple of offices in the US which we're excited about, and a fully global team that will only continue to grow. So Viraj and I are both here in New York and we're excited to be engaging with our coworkers in person. Finally, after years of not doing so, we've got a bustling office in San Francisco as well. So growing those teams and continuing to hire all over the world and really focusing on our product and the open source community is where our heads are at this year, so. >> Congratulations. - >> Excited. 200 million in funding plus good runway. Put that money in the bank, squirrel it away. You know, it's good to kind of get some good interest on it, but still grow. Congratulations on all the work you guys do. We appreciate you and the open sourced community does and good luck with the venture. Continue to be successful and we'll see you at the Startup Showcase. >> Thank you. - >> Yeah, thanks so much, John. Appreciate it. - >> It's theCube conversation, featuring astronomer.io, that's the website. Astronomer is doing well. Multiple rounds of funding, over 200 million in funding. Open source continues to lead the way in innovation. Great business model. Good solution for the next gen, Cloud, scale, data operations, data stacks that are emerging. I'm John Furrier, your host. Thanks for watching. (soft music)

Published Date : Feb 8 2023

SUMMARY :

and that is the future of for the path we've been on so far. take a minute to explain what you guys do. and that there's a lot of, of the value proposition And that data team needs to use tools You know, one of the and then a bunch of point solution. and the number of tools they're using and that is to make sense of the chaos, and all of the processes that need to be That's a beautiful thing. you know, they've solved that problem What are some of the companies Raj, I'll let you take that one too. And then, you know, and the growth of your company So I have to ask you guys, and companies that got started, you know, and the data scientists that the data market's kind of you can jump in. And so the folks that we and come to our website and chat with us I mean, in the past, you to what we do because, you history of the company, and in the software that we have. How much have you guys raised? but it's in the ballpark What are you guys looking to do? and you often have to just kind of and the open source community the work you guys do. Yeah, thanks so much, John. that's the website.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Viraj ParekhPERSON

0.99+

PaolaPERSON

0.99+

VirajPERSON

0.99+

John FurrierPERSON

0.99+

JohnPERSON

0.99+

RajPERSON

0.99+

AirbnbORGANIZATION

0.99+

USLOCATION

0.99+

2017DATE

0.99+

New YorkLOCATION

0.99+

Paola Peraza CalderonPERSON

0.99+

AWSORGANIZATION

0.99+

ApacheORGANIZATION

0.99+

San FranciscoLOCATION

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

1970sDATE

0.99+

10 yearsQUANTITY

0.99+

fiveQUANTITY

0.99+

TwoQUANTITY

0.99+

first questionQUANTITY

0.99+

over 200 millionQUANTITY

0.99+

bothQUANTITY

0.99+

BothQUANTITY

0.99+

over $200 millionQUANTITY

0.99+

Linux FoundationORGANIZATION

0.99+

50 years agoDATE

0.99+

oneQUANTITY

0.99+

fiveDATE

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

this yearDATE

0.98+

OneQUANTITY

0.98+

AirflowTITLE

0.98+

10 years agoDATE

0.98+

Carnegie MellonORGANIZATION

0.98+

over five yearsQUANTITY

0.98+

200QUANTITY

0.98+

12 months agoDATE

0.98+

both worldsQUANTITY

0.98+

5XQUANTITY

0.98+

ChatGPTORGANIZATION

0.98+

firstQUANTITY

0.98+

one stackQUANTITY

0.97+

one personQUANTITY

0.97+

two thingsQUANTITY

0.97+

FivetranORGANIZATION

0.96+

sevenQUANTITY

0.96+

next yearDATE

0.96+

todayDATE

0.95+

50 yearsQUANTITY

0.95+

eachQUANTITY

0.95+

theCubeORGANIZATION

0.94+

HubSpotORGANIZATION

0.93+

SigmaORGANIZATION

0.92+

Series COTHER

0.92+

AstronomerORGANIZATION

0.91+

astronomer.ioOTHER

0.91+

HightouchTITLE

0.9+

one placeQUANTITY

0.9+

AndroidTITLE

0.88+

Startup ShowcaseEVENT

0.88+

Apache AirflowTITLE

0.86+

CNCFORGANIZATION

0.86+

Ignite22 Analysis | Palo Alto Networks Ignite22


 

>>The Cube presents Ignite 22, brought to you by Palo Alto Networks. >>Welcome back everyone. We're so glad that you're still with us. It's the Cube Live at the MGM Grand. This is our second day of coverage of Palo Alto Networks Ignite. This is takeaways from Ignite 22. Lisa Martin here with two really smart guys, Dave Valante. Dave, we're joined by one of our cube alumni, a friend, a friend of the, we say friend of the Cube. >>Yeah, otc. A friend of the Cube >>Karala joined us. Guys, it's great to have you here. It's been an exciting show. A lot of cybersecurity is one of my favorite topics to talk about. But I'd love to get some of the big takeaways from both of you. Dave, we'll start with you. >>A breathing room from two weeks ago. Yeah, that was, that was really pleasant. You know, I mean, I know was, yes, you sat in the analyst program, interested in what your takeaways were from there. But, you know, coming into this, we wrote a piece, Palo Alto's Gold Standard, what they need to do to, to keep that, that status. And we hear it a lot about consolidation. That's their big theme now, which is timely, right? Cause people wanna save money, they wanna do more with less. But I'm really interested in hearing zeus's thoughts on how that's playing in the market. How customers, how easy is it to just say, oh, hey, I'm gonna consolidate. I wanna get into that a little bit with you, how well the strategy's working. We're gonna get into some of the m and a activity and really bring your perspectives to the table. Well, >>It's, it's not easy. I mean, people have been calling for the consolidation of security for decades, and it's, it's, they're the first company that's actually made it happen. Right? And, and I think this is what we're seeing here is the culmination of this long term strategy, this company trying to build more of a platform. And they, you know, they, they came out as a firewall vendor. And I think it's safe to say they're more than firewall today. That's only about two thirds of their revenue now. So down from 80% a few years ago. And when I think of what Palo Alto has become, they're really a data company. Now, if you look at, you know, unit 42 in Cortex, the, the, the Cortex Data Lake, they've done an excellent job of taking telemetry from their products and from the acquisitions they have, right? And bringing that together into one big data lake. >>And then they're able to use that to, to do faster threat notification, forensics, things like that. And so I think the old model of security of create signatures for known threats, it's safe to say it never really worked and it wasn't ever gonna work. You had too many day zero exploits and things. The only way to fight security today is with a AI and ML based analytics. And they have, they're the gold standard. I think the one thing about your post that I would add the gold standard from a data standpoint, and that's given them this competitive advantage to go out and become a platform for a security. Which, like I said, the people have tried to do that for years. And the first one that's actually done it, well, >>We've heard this from some of the startups, like Lacework will say, oh, we treat security as a data problem. Of course there's a startup, Palo Alto's got, you know, whatever, 10, 15 years of, of, of history. But one of the things I wanted to explore with you coming into this was the notion of can you be best of breed and develop a suite? And we, we've been hearing a consistent answer to that question, which is, and, and do you need to, and the answer is, well, best of breed in security requires that full spectrum, that full view. So here's my question to you. So, okay, let's take Esty win relatively new for these guys, right? Yeah. Okay. And >>And one of the few products are not top two, top three in, right? Exactly. >>Yeah. So that's why I want to take that. Yeah. Because in bakeoffs, they're gonna lose on a head-to-head best of breed. And so the customer's gonna say, Hey, you know, I love your, your consolidation play, your esty win's. Just, okay, how about a little discount on that? And you know, these guys are premium priced. Yes. So, you know, are they in essentially through their pricing strategies, sort of creating that stuff, fighting that, is that friction for them where they've got, you know, the customer says, all right, well forget it, we're gonna go stove pipe with the SD WAN will consolidate some of the stuff. Are you seeing that? >>Yeah, I, I, I still think the sales model is that way. And I think that's something they need to work on changing. If they get into a situation where they have to get down into a feature battle of my SD WAN versus your SD wan, my firewall versus your firewall, frankly they've already lost, you know, because their value prop is the suite and, and is the platform. And I was talking to the CISO here that told me, he realizes now that you don't need best of breed everywhere to have best in class threat protection. In fact, best of breed everywhere leads to suboptimal threat protection. Cuz you have all these data data sets that are in silos, right? And so from a data scientist standpoint, right, there's the good data leads to good insights. Well, partial data leads to fragmented insights and that's, that's what the best, best of breed approach gives you. And so I was talking with Palo about this, can they have this vision of being best of breed and platform? I don't really think you can maintain best of breed everywhere across this portfolio this big, but you don't need to. >>That was my second point of my >>Question. That's the point. >>Yeah. And so, cuz cuz because you know, we've talked about this, that that sweets always win in the long run, >>Sweets >>Win. Yeah. But here's the thing, I, I wonder to your your point about, you know, the customer, you know, understanding that that that, that this resonates with them. I, my guess is a lot of customers, you know, at that mid-level and the fat middle are like still sort of wed, you know, hugging that, that tool. So there's, there's work to be done here, but I think they, they, they got it right Because if they devolve, to your point, if they devolve down to that speeds and feeds, eh, what's the point of that? Where's their valuable? >>You do not wanna get into a knife fight. And I, and I, and I think for them the, a big challenge now is convincing customers that the suite, the suite approach does work. And they have to be able to do that in actual customer examples. And so, you know, I I interviewed a bunch of customers here and the ones that have bought into XDR and xor and even are looking at their sim have told me that the, the, so think of soc operations, the old way heavily manually oriented, right? You have multiple panes of glass and you know, and then you've got, so there's a lot of people work before you bring the tools in, right? If done correctly with AI and ml, the machines would do all the heavy lifting and then you'd bring people in at the end to clean up the little bits that were missed, right? >>And so you, you moved to, from something that was very people heavy to something that's machine heavy and machines can work a lot faster than people. And the, and so the ones that I've talked that have, that have done that have said, look, our engineers have moved on to a lot different things. They're doing penetration testing, they're, you know, helping us with, with strategy and they're not fighting that, that daily fight of looking through log files. And the only proof point you need, Dave, is look at every big breach that we've had over the last five years. There's some SIM vendor up there that says, we caught it. Yeah. >>Yeah. We we had the data. >>Yeah. But, but, but the security team missed it. Well they missed it because you're, nobody can look at that much data manually. And so the, I I think their approach of relying heavily on machines to fight the fight is actually the right way. >>Is that a differentiator for them versus, we were talking before we went live that you and I first hit our very first segment back in 2017 at Fort Net. Is that, where do the two stand in your >>Yeah, it's funny cuz if you talk to the two vendors, they don't really see each other in a lot of accounts because Fort Net's more small market mid-market. It's the same strategy to some degree where Fort Net relies heavily on in-house development and Palo Alto relies heavily on acquisition. Yeah. And so I think from a consistently feature set, you know, Fort Net has an advantage there because it, it's all run off their, their their silicon. Where, where Palo's able to innovate very quickly. The, it it requires a lot of work right? To, to bring the front end and back ends together. But they're serving different markets. So >>Do you see that as a differentiator? The integration strategy that Palo Alto has as a differentiator? We talk to so many companies who have an a strong m and a strategy and, and execution arm. But the challenge is always integrating the technology so that the customer to, you know, ultimately it's the customer. >>I actually think they're, they're underrated as a, an acquirer. In fact, Dave wrote a post to a prior on Silicon Angle prior to Accelerate and he, he on, you put it on Twitter and you asked people to rank 'em as an acquirer and they were in the middle of the pack, >>Right? It was, it was. So it was Oracle, VMware, emc, ibm, Cisco, ServiceNow, and Palo Alto. Yeah. Or Oracle got very high marks. It was like 8.5 out of, you know, 10. Yeah. VMware I think was 6.5. Nice. Era was high emc, big range. IBM five to seven. Cisco was three to eight. Yeah. Yeah, right. ServiceNow was a seven. And then, yeah, Palo Alto was like a five. And I, which I think it was unfair. >>Well, and I think it depends on how you look at it. And I, so I think a lot of the acquisitions Palo Altos made, they've done a good job of integrating their backend data and they've almost ignored the front end. And so when you buy some of the products, it's a little clunky today. You know, if you work with Prisma Cloud, it could be a little bit cleaner. And even with, you know, the SD wan that took 'em a long time to bring CloudGenix in and stuff. But I think the approach is right. I don't, I don't necessarily believe you should integrate the front end until you've integrated the back end. >>That's >>The hard part, right? Because UL ultimately what you're gonna get, you're gonna get two panes of glass and one pane of glass and it might look pretty all mush together, but ultimately you're not solving the bigger problem, right. Of, of being able to create that big data like the, the fight security. And so I think, you know, the approach they've taken is the right one. I think from a user standpoint, maybe it doesn't show up as neatly because you don't see the frontend integration, but the way they're doing it is the right way to do it. And I'm glad they're doing it that way versus caving to the pressures of what, you know, the industry might want >>Showed up in the performance of the company. I mean, this company was basically gonna double revenues to 7 billion from 2020 to >>2023. Three. Think about that at that, that >>Make a, that's unbelievable, right? I mean, and then and they wanna double again. Yeah. You know, so, well >>What did, what did Nikesh was quoted as saying they wanna be the first cyber company that's a hundred billion dollars. He didn't give a timeline market cap. >>Right. >>Market cap, right. Do what I wanna get both of your opinions on what you saw and heard and felt this week. What do you think the likelihood is? And and do you have any projections on how, you know, how many years it's gonna take for them to get there? >>Well, >>Well I think so if they're gonna get that big, right? And, and we were talking about this pre-show, any company that's becoming a big company does it through ecosystem >>Bingo. >>Right? And that when you look around the show floor, it's not that impressive. And if that, if there's an area they need to focus on, it's building that ecosystem. And it's not with other security vendors, it's with application vendors and it's with the cloud companies and stuff. And they've got some relationships there, but they need to do more. I actually challenge 'em on that. One of the analyst sessions. They said, look, we've got 800 cortex partners. Well where are they? Right? Why isn't there a cortex stand here with a bunch of the small companies here? So I do think that that is an area they need to focus on. If they are gonna get to that, that market caps number, they will do so do so through ecosystem. Because every company that's achieved that has done it through ecosystem. >>A hundred percent agree. And you know, if you look at CrowdStrike's ecosystem, it's pretty similar. Yeah. You know, it doesn't really, you know, make much, much, not much different from this, but I went back and just looked at some, you know, peak valuations during the pandemic and shortly thereafter CrowdStrike was 70 billion. You know, that's what their roughly their peak Palo Alto was 56, fortune was 59 for the actually diverged. Right. And now Palo Alto has taken the, the top mantle, you know, today it's market cap's 52. So it's held 93% of its peak value. Everybody else is tanking. Even Okta was 45 billion. It's been crushed as you well know. But, so Palo Alto wasn't always, you know, the number one in terms of market cap. But I guess my point is, look, if CrowdStrike could got to 70 billion during Yeah. During the frenzy, I think it's gonna take, to answer your question, I think it's gonna be five years. Okay. Before they get back there. I think this market's gonna be tough for a while from a valuation standpoint. I think generally tech is gonna kind of go up and down and sideways for a good year and a half, maybe even two years could be even longer. And then I think there's gonna be some next wave of productivity innovation that that hits. And then you're gonna, you're almost always gonna exceed the previous highs. It's gonna take a while. Yeah, >>Yeah, yeah. But I think their ability to disrupt the SIM market actually is something I, I believe they're gonna do. I've been calling for the death of the sim for a long time and I know some people at Palo Alto are very cautious about saying that cuz the Splunks and the, you know, they're, they're their partners. But I, I think the, you know, it's what I said before, the, the tools are catching them, but they're, it's not in a way that's useful for the IT pro and, but I, I don't think the SIM vendors have that ecosystem of insight across network cloud endpoint. Right. Which is what you need in order to make a sim useful. >>CISO at an ETR roundtable said, if, if it weren't for my regulators, I would chuck my sim. >>Yes. >>But that's the only reason that, that this person was keeping it. So, >>Yeah. And I think the, the fact that most of those companies have moved to a perpetual MO or a a recurring revenue model actually helps unseat them. Typically when you pour a bunch of money into something, you remember the old computer associate days, nobody ever took it out cuz the sunk dollars you spent to do it. But now that you're paying an annual recurring fee, it's actually makes it easier to take out. So >>Yeah, it's it's an ebb and flow, right? Yeah. Because the maintenance costs were, you know, relatively low. Maybe it was 20% of the total. And then, you know, once every five years you had to do a refresh and you were still locked into the sort of maintenance and, and so yeah, I think you're right. The switching costs with sas, you know, in theory anyway, should be less >>Yeah. As long as you can migrate the data over. And I think they've got a pretty good handle on that. So, >>Yeah. So guys, I wanna get your perspective as a whole bunch of announcements here. We've only been here for a couple days, not a big conference as, as you can see from behind us. What Zs in your opinion was Palo Alto's main message and and what do you think about it main message at this event? And then same question for you. >>Yeah, I, I think their message largely wrapped around disruption, right? And, and they, in The's keynote already talked about that, right? And where they disrupted the firewall market by creating a NextGen firewall. In fact, if you look at all the new services they added to their firewall, you, you could almost say it's a NextGen NextGen firewall. But, but I do think the, the work they've done in the area of cloud and cortex actually I think is, is pretty impressive. And I think that's the, the SOC is ripe for disruption because it's for, for the most part, most socks still, you know, run off legacy playbooks. They run off legacy, you know, forensic models and things and they don't work. It's why we have so many breaches today. The, the dirty little secret that nobody ever wants to talk about is the bad guys are using machine learning, right? And so if you're using a signature based model, all they're do is tweak their model a little bit and it becomes, it bypasses them. So I, I think the only way to fight the the bad guys today is with you gotta fight fire with fire. And I think that's, that's the path they've, they've headed >>Down and the bad guys are hiding in plain sight, you know? >>Yeah, yeah. Well it's, it's not hard to do now with a lot of those legacy tools. So >>I think, I think for me, you know, the stat that we threw out earlier, I think yesterday at our keynote analysis was, you know, the ETR data shows that are, that are that last survey around 35% of the respondents said we are actively consolidating, sorry, 44%, sorry, 35 says we're actively consolidating vendors, redundant vendors today. That number's up to 44%. Yeah. It's by far the number one cost optimization technique. That's what these guys are pitching. And I think it's gonna resonate with people and, and I think to your point, they're integrating at the backend, their beeps are technical, right? I mean, they can deal with that complexity. Yeah. And so they don't need eye candy. Eventually they, they, they want to have that cuz it'll allow 'em to have deeper market penetration and make people more productive. But you know, that consolidation message came through loud and clear. >>Yeah. The big change in this industry too is all the new startups are all cloud native, right? They're all built on Amazon or Google or whatever. Yeah. And when your cloud native and you buy a cloud native integration is fast. It's not like having to integrate this big monolithic software stack anymore. Right. So I I think their pace of integration will only accelerate from here because everything's now cloud native. >>If a customer comes to you or when a customer comes to you and says, Zs help us with this cyber transformation we have, our board isn't necessarily with our executives in terms of execution of a security strategy. How do you advise them where Palo Alto is concerned? >>Yeah. You know, a lot, a lot of this is just fighting legacy mindset. And I've, I was talking with some CISOs here from state and local governments and things and they're, you know, they can't get more budget. They're fighting the tide. But what they did find is through the use of automation technology, they're able to bring their people costs way down. Right. And then be able to use that budget to invest in a lot of new projects. And so with that, you, you have to start with your biggest pain points, apply automation where you can, and then be able to use that budget to reinvest back in your security strategy. And it's good for the IT pros too, the security pros, my advice to, to it pros is if you're doing things today that aren't resume building, stop doing them. Right? Find a way to automate the money your job. And so if you're patching systems and you're looking through log files, there's no reason machines can't do that. And you go do something a lot more interesting. >>So true. It's like storage guys 10 years ago, provisioning loans. Yes. It's like, stop doing that. Yeah. You're gonna be outta a job. And so who, last question I have is, is who do you see as the big competitors, the horses on the track question, right? So obviously Cisco kind of service has led for a while and you know, big portfolio company, CrowdStrike coming at it from end point. You know who, who, who do you see as the real players going for that? You know, right now the market's three to 4%. The leader has three, three 4% of the market. You know who they're all going for? 10, 15, maybe 20% of the market. Who, who are the likely candidates? Yeah, >>I don't know if CrowdStrike really has the breadth of portfolio to compete long term though. I I think they've had a nice run, but I, we might start to see the follow 'em. I think Microsoft is gonna be for middle. They've laid down the gauntlet, right? They are a security vendor, right? We, we were at Reinvent and a AWS is the platform for security vendors. Yes. Middle, somewhere in the middle. But Microsoft make no mistake, they're in security. They've got some good products. I think a lot of 'em are kind of good enough and they, they tie it to the licensing and I'm not sure that works in security, but they've certainly got the ear of a lot of it pros. >>It might work in smb. >>Yeah. Yeah. It, it might. And, and I do like Zscaler. I, I know these guys poo poo the proxy model, but they've, they've done about as much with proxies as you can. And I, I think it's, it's a battle of, I love the, the, the near, you know, proxies are dead and Jay's model, you know, Jay over at c skater throw 'em back at 'em. So I, it's good to see that kind of fight going on between the two. >>Oh, it's great. Well, and, and again, ZScaler's coming at it from their cloud security angle. CrowdStrike's coming at it from endpoint. I, I do think CrowdStrike has an opportunity to build out the portfolio through m and a and maybe ecosystem. And then obviously, you know, Palo Alto's getting it done. How about Cisco? >>Yeah. Cisco's interesting. And I, I think if Cisco can make the network matter in security and it should, right? We're talking about how a lot of you need a lot of forensics to fight security today. Well, they're gonna see things long before anybody else because they have all that network data. If they can tie network security, I, I mean they could really have that business take off. But we've been saying that about Cisco for 20 years. >>But big install based though. Yeah. It's hard for a company, any company to just say, okay, hey Cisco customer sweep the floor and come with us. That's, that's >>A tough thing. They have a lot of good peace parts, right? And like duo's a good product and umbrella's a good product. They've, they've not done a good job. >>They're the opposite of these guys. >>They've not done a good job of the backend integration that, that's where Cisco needs to, to focus. And I do think g G two Patel there fixed the WebEx group and I think he's now, in fact when you talk to him, he's doing very little on WebEx that that group's running itself and he's more focused in security. So I, I think we could see a resurgence there. But you know, they have a, from a revenue perspective, it's a little misleading cuz they have this big legacy base that's in decline while they're moving to cloud and stuff. So, but they, but they, there's a lot of work there're trying to, to tie to network. >>Right. Lots of fuel for conversation. We're gonna have to carry this on, on Silicon angle.com guys. Yes. And Wikibon, lets do see us. Thank you so much for joining Dave and me giving us your insights as to this event. Where are you gonna be next? Are you gonna be on vacation? >>There's nothing more fun than mean on the cube, so, right. What's outside of that though? Yeah, you know, Christmas coming up, I gotta go see family and do the obligatory, although for me that's a lot of travel, so I guess >>More planes. Yeah. >>Hopefully not in Vegas. >>Not in Vegas. >>Awesome. Nothing against Vegas. Yeah, no, >>We love it. We >>Love it. Although I will say my year started off with ces. Yeah. And it's finishing up with Palo Alto here. The bookends. Yeah, exactly. In Vegas bookends. >>Well thanks so much for joining us. Thank you Dave. Always a pleasure to host a show with you and hear your insights. Reading your breaking analysis always kicks off my prep for show and it's always great to see, but predictions come true. So thank you for being my co-host bet. All right. For Dave Valante Enz as Carla, I'm Lisa Martin. You've been watching The Cube, the leader in live, emerging and enterprise tech coverage. Thanks for watching.

Published Date : Dec 15 2022

SUMMARY :

It's the Cube Live at A friend of the Cube Guys, it's great to have you here. You know, I mean, I know was, yes, you sat in the analyst program, interested in what your takeaways were And they, you know, they, they came out as a firewall vendor. And so I think the old model of security of create Palo Alto's got, you know, whatever, 10, 15 years of, of, of history. And one of the few products are not top two, top three in, right? And so the customer's gonna say, Hey, you know, I love your, your consolidation play, And I think that's something they need to work on changing. That's the point. win in the long run, my guess is a lot of customers, you know, at that mid-level and the fat middle are like still sort And so, you know, I I interviewed a bunch of customers here and the ones that have bought into XDR And the only proof point you need, Dave, is look at every big breach that we've had over the last And so the, I I think their approach of relying heavily on Is that a differentiator for them versus, we were talking before we went live that you and I first hit our very first segment back And so I think from a consistently you know, ultimately it's the customer. Silicon Angle prior to Accelerate and he, he on, you put it on Twitter and you asked people to you know, 10. And even with, you know, the SD wan that took 'em a long time to bring you know, the approach they've taken is the right one. I mean, this company was basically gonna double revenues to 7 billion Think about that at that, that I mean, and then and they wanna double again. What did, what did Nikesh was quoted as saying they wanna be the first cyber company that's a hundred billion dollars. And and do you have any projections on how, you know, how many years it's gonna take for them to get And that when you look around the show floor, it's not that impressive. And you know, if you look at CrowdStrike's ecosystem, it's pretty similar. But I, I think the, you know, it's what I said before, the, the tools are catching I would chuck my sim. But that's the only reason that, that this person was keeping it. you remember the old computer associate days, nobody ever took it out cuz the sunk dollars you spent to do it. And then, you know, once every five years you had to do a refresh and you were still And I think they've got a pretty good handle on that. Palo Alto's main message and and what do you think about it main message at this event? So I, I think the only way to fight the the bad guys today is with you gotta fight Well it's, it's not hard to do now with a lot of those legacy tools. I think, I think for me, you know, the stat that we threw out earlier, I think yesterday at our keynote analysis was, And when your cloud native and you buy a cloud native If a customer comes to you or when a customer comes to you and says, Zs help us with this cyber transformation And you go do something a lot more interesting. of service has led for a while and you know, big portfolio company, CrowdStrike coming at it from end point. I don't know if CrowdStrike really has the breadth of portfolio to compete long term though. I love the, the, the near, you know, proxies are dead and Jay's model, And then obviously, you know, Palo Alto's getting it done. And I, I think if Cisco can hey Cisco customer sweep the floor and come with us. And like duo's a good product and umbrella's a good product. And I do think g G two Patel there fixed the WebEx group and I think he's now, Thank you so much for joining Dave and me giving us your insights as to this event. you know, Christmas coming up, I gotta go see family and do the obligatory, although for me that's a lot of travel, Yeah. Yeah, no, We love it. And it's finishing up with Palo Alto here. Always a pleasure to host a show with you and hear your insights.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

DavePERSON

0.99+

CiscoORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Dave ValantePERSON

0.99+

MicrosoftORGANIZATION

0.99+

20%QUANTITY

0.99+

Fort NetORGANIZATION

0.99+

2017DATE

0.99+

93%QUANTITY

0.99+

PaloORGANIZATION

0.99+

20 yearsQUANTITY

0.99+

CarlaPERSON

0.99+

AmazonORGANIZATION

0.99+

IBMORGANIZATION

0.99+

VegasLOCATION

0.99+

threeQUANTITY

0.99+

7 billionQUANTITY

0.99+

GoogleORGANIZATION

0.99+

70 billionQUANTITY

0.99+

2020DATE

0.99+

80%QUANTITY

0.99+

44%QUANTITY

0.99+

Palo Alto NetworksORGANIZATION

0.99+

45 billionQUANTITY

0.99+

52QUANTITY

0.99+

second pointQUANTITY

0.99+

10QUANTITY

0.99+

59QUANTITY

0.99+

yesterdayDATE

0.99+

VMwareORGANIZATION

0.99+

AWSORGANIZATION

0.99+

five yearsQUANTITY

0.99+

two vendorsQUANTITY

0.99+

Palo AltoORGANIZATION

0.99+

KaralaPERSON

0.99+

CrowdStrikeORGANIZATION

0.99+

ibmORGANIZATION

0.99+

15QUANTITY

0.99+

JayPERSON

0.99+

8.5QUANTITY

0.99+

Palo AltosORGANIZATION

0.99+

Dave Valante EnzPERSON

0.99+

two panesQUANTITY

0.99+

two yearsQUANTITY

0.99+

ThreeQUANTITY

0.99+

56QUANTITY

0.99+

bothQUANTITY

0.99+

ChristmasEVENT

0.99+

ServiceNowORGANIZATION

0.99+

second dayQUANTITY

0.99+

oneQUANTITY

0.99+

2023DATE

0.99+

35QUANTITY

0.99+

twoQUANTITY

0.99+

ReinventORGANIZATION

0.98+

The CubeTITLE

0.98+

OneQUANTITY

0.98+

firstQUANTITY

0.98+

WebExORGANIZATION

0.98+

first segmentQUANTITY

0.98+

Palo AltoLOCATION

0.98+

emcORGANIZATION

0.98+

two weeks agoDATE

0.98+

4%QUANTITY

0.98+

Takeaways from Ignite22 | Palo Alto Networks Ignite22


 

>>The Cube presents Ignite 22, brought to you by Palo Alto Networks. >>Welcome back everyone. We're so glad that you're still with us. It's the Cube Live at the MGM Grand. This is our second day of coverage of Palo Alto Networks Ignite. This is takeaways from Ignite 22. Lisa Martin here with two really smart guys, Dave Valante. Dave, we're joined by one of our cube alumni, a friend, a friend of the, we say friend of the Cube. >>Yeah, F otc. A friend of the Cube >>Karala joins us. Guys, it's great to have you here. It's been an exciting show. A lot of cybersecurity is one of my favorite topics to talk about. But I'd love to get some of the big takeaways from both of you. Dave, we'll start with >>You. A breathing room from two weeks ago. Yeah, that was, that was really pleasant. You know, I mean, I know was, yes, you sat in the analyst program, interested in what your takeaways were from there. But, you know, coming into this, we wrote a piece, Palo Alto's Gold Standard, what they need to do to, to keep that, that status. And we hear it a lot about consolidation. That's their big theme now, which is timely, right? Cause people wanna save money, they wanna do more with less. But I'm really interested in hearing zeus's thoughts on how that's playing in the market. How customers, how easy is it to just say, oh, hey, I'm gonna consolidate. I wanna get into that a little bit with you, how well the strategy's working. We're gonna get into some of the m and a activity and really bring your perspectives to the table. Well, >>It's, it's not easy. I mean, people have been calling for the consolidation of security for decades, and it's, it's, they're the first company that's actually made it happen. Right? And, and I think this is what we're seeing here is the culmination of this long-term strategy, this company trying to build more of a platform. And they, you know, they, they came out as a firewall vendor. And I think it's safe to say they're more than firewall today. That's only about two thirds of their revenue now. So down from 80% a few years ago. And when I think of what Palo Alto has become, they're really a data company. Now, if you look at, you know, unit 42 in Cortex, the, the, the Cortex Data Lake, they've done an excellent job of taking telemetry from their products and from the acquisitions they have, right? And bringing that together into one big data lake. >>And then they're able to use that to, to do faster threat notification, forensics, things like that. And so I think the old model of security of create signatures for known threats, it's safe to say it never really worked and it wasn't ever gonna work. You had too many days, zero exploits and things. The only way to fight security today is with a AI and ML based analytics. And they have, they're the gold standard. I think the one thing about your post that I would add, they're the gold standard from a data standpoint. And that's given them this competitive advantage to go out and become a platform for security. Which, like I said, the people have tried to do that for years. And the first one that's actually done it, well, >>We've heard this from some of the startups, like Lacework will say, oh, we treat security as a data problem. Of course there's a startup, Palo Alto's got, you know, whatever, 10, 15 years of, of, of history. But one of the things I wanted to explore with you coming into this was the notion of can you be best of breed and develop a suite? And we, we've been hearing a consistent answer to that question, which is, and, and do you need to, and the answer is, well, best of breed in security requires that full spectrum, that full view. So here's my question to you. So, okay, let's take Estee win relatively new for these guys, right? Yeah. Okay. And >>And one of the few products are not top two, top three in, right? >>Exactly. Yeah. So that's why I want to take that. Yeah. Because in bakeoffs, they're gonna lose on a head-to-head best of breed. And so the customer's gonna say, Hey, you know, I love your, your consolidation play, your esty win's. Just, okay, how about a little discount on that? And you know, these guys are premium priced. Yes. So, you know, are they in essentially through their pricing strategies, sort of creating that stuff, fighting that, is that friction for them where they've got, you know, the customer says, all right, well forget it, we're gonna go stove pipe with the SD WAN will consolidate some of the stuff. Are you seeing that? >>Yeah, I, I, I still think the sales model is that way. And I think that's something they need to work on changing. If they get into a situation where they have to get down into a feature battle of my SD WAN versus your SD wan, my firewall versus your firewall, frankly they've already lost, you know, because their value prop is the suite and, and is the platform. And I was talking with the CISO here that told me, he realizes now that you don't need best of breed everywhere to have best in class threat protection. In fact, best of breed everywhere leads to suboptimal threat protection. Cuz you have all these data data sets that are in silos, right? And so from a data scientist standpoint, right, there's the good data leads to good insights. Well, partial data leads to fragmented insights and that's, that's what the best, best of breed approach gives you. And so I was talking with Palo about this, can they have this vision of being best of breed and platform? I don't really think you can maintain best of breed everywhere across this portfolio this big, but you don't need to. >>That was my second point of my question. That's the point I'm saying. Yeah. And so, cuz cuz because you know, we've talked about this, that that sweets always win in the long run, >>Sweets win. >>Yeah. But here's the thing, I, I wonder to your your point about, you know, the customer, you know, understanding that that that, that this resonates with them. I, my guess is a lot of customers, you know, at that mid-level and the fat middle are like still sort of wed, you know, hugging that, that tool. So there's, there's work to be done here, but I think they, they, they got it right Because if they devolve, to your point, if they devolve down to that speeds and feeds, eh, what's the point of that? Where's their >>Valuable? You do not wanna get into a knife fight. And I, and I, and I think for them the, a big challenge now is convincing customers that the suite, the suite approach does work. And they have to be able to do that in actual customer examples. And so, you know, I I interviewed a bunch of customers here and the ones that have bought into XDR and xor and even are looking at their sim have told me that the, the, so think of soc operations, the old way heavily manually oriented, right? You have multiple panes of glass and you know, and then you've got, so there's a lot of people work before you bring the tools in, right? If done correctly with AI and ml, the machines would do all the heavy lifting and then you'd bring people in at the end to clean up the little bits that were missed, right? >>And so you, you moved to, from something that was very people heavy to something that's machine heavy and machines can work a lot faster than people. And the, and so the ones that I've talked that have, that have done that have said, look, our engineers have moved on to a lot different things. They're doing penetration testing, they're, you know, helping us with, with strategy and they're not fighting that, that daily fight of looking through log files. And the only proof point you need, Dave, is look at every big breach that we've had over the last five years. There's some SIM vendor up there that says, we caught it. Yeah. >>Yeah. We we had the data. >>Yeah. But, but, but the security team missed it. Well they missed it because you're, nobody can look at that much data manually. And so the, I I think their approach of relying heavily on machines to fight the fight is actually the right way. >>Is that a differentiator for them versus, we were talking before we went live that you and I first hit our very first segment back in 2017 at Fort Net. Is that, where do the two stand in your >>Yeah, it's funny cuz if you talk to the two vendors, they don't really see each other in a lot of accounts because Fort Net's more small market mid-market. It's the same strategy to some degree where Fort Net relies heavily on in-house development in Palo Alto relies heavily on acquisition. Yeah. And so I think from a consistently feature set, you know, Fort Net has an advantage there because it, it's all run off their, their their silicon. Where, where Palo's able to innovate very quickly. The, it it requires a lot of work right? To, to bring the front end and back ends together. But they're serving different markets. So >>Do you see that as a differentiator? The integration strategy that Palo Alto has as a differentiator? We talk to so many companies who have an a strong m and a strategy and, and execution arm. But the challenge is always integrating the technology so that the customer to, you know, ultimately it's the customer. >>I actually think they're, they're underrated as a, an acquirer. In fact, Dave wrote a post to a prior on Silicon Angle prior to Accelerate and he, he on, you put it on Twitter and you asked people to rank 'em as an acquirer and they were in the middle of the pack, >>Right? It was, it was. So it was Oracle, VMware, emc, ibm, Cisco, ServiceNow, and Palo Alto. Yeah. Or Oracle got very high marks. It was like 8.5 out of, you know, 10. Yeah. VMware I think was 6.5. Naira was high emc, big range. IBM five to seven. Cisco was three to eight. Yeah. Yeah, right. ServiceNow was a seven. And then, yeah, Palo Alto was like a five. And I, which I think it was unfair. Well, >>And I think it depends on how you look at it. And I, so I think a lot of the acquisitions Palo Alto's made, they've done a good job of integrating the backend data and they've almost ignored the front end. And so when you buy some of the products, it's a little clunky today. You know, if you work with Prisma Cloud, it could be a little bit cleaner. And even with, you know, the SD wan that took 'em a long time to bring CloudGenix in and stuff. But I think the approach is right. I don't, I don't necessarily believe you should integrate the front end until you've integrated the back end. >>That's >>The hard part, right? Because UL ultimately what you're gonna get, you're gonna get two panes of glass and one pane of glass and it might look pretty and all mush together, but ultimately you're not solving the bigger problem, right. Of, of being able to create that big data lake to, to fight security. And so I think, you know, the approach they've taken is the right one. I think from a user standpoint, maybe it doesn't show up as neatly because you don't see the frontend integration, but the way they're doing it is the right way to do it. And I'm glad they're doing it that way versus caving to the pressures of what, you know, the industry might want or >>Showed up in the performance of the company. I mean, this company was basically gonna double revenues to 7 billion from 2020 to >>2023. Think about that at that. That makes, >>I mean that's unbelievable, right? I mean, and then and they wanna double again. Yeah. You know, so, well >>What did, what did Nikesh was quoted as saying they wanna be the first cyber company that's a hundred billion dollars. He didn't give a timeline market >>Cap. Right. >>Market cap, right. Do what I wanna get both of your opinions on what you saw and heard and felt this week. What do you think the likelihood is? And and do you have any projections on how, you know, how many years it's gonna take for them to get there? >>Well, >>Well I think so if they're gonna get that big, right? And, and we were talking about this pre-show, any company that's becoming a big company does it through ecosystem >>Bingo >>Go, right? And that when you look around the show floor, it's not that impressive. No. And if that, if there's an area they need to focus on, it's building that ecosystem. And it's not with other security vendors, it's with application vendors and it's with the cloud companies and stuff. And they've got some relationships there, but they need to do more. I actually challenge 'em on that. One of the analyst sessions. They said, look, we've got 800 cortex partners. Well where are they? Right? Why isn't there a cortex stand here with a bunch of the small companies here? So I do think that that is an area they need to focus on. If they are gonna get to that, that market caps number, they will do so do so through ecosystem. Because every company that's achieved that has done it through ecosystem. >>A hundred percent agree. And you know, if you look at CrowdStrike's ecosystem, it's, I mean, pretty similar. Yeah. You know, it doesn't really, you know, make much, much, not much different from this, but I went back and just looked at some, you know, peak valuations during the pandemic and shortly thereafter CrowdStrike was 70 billion. You know, that's what their roughly their peak Palo Alto was 56, fortune was 59 for the actually diverged. Right. And now Palo Alto has taken the, the top mantle, you know, today it's market cap's 52. So it's held 93% of its peak value. Everybody else is tanking. Even Okta was 45 billion. It's been crushed as you well know. But, so Palo Alto wasn't always, you know, the number one in terms of market cap. But I guess my point is, look, if CrowdStrike could got to 70 billion during Yeah. During the frenzy, I think it's gonna take, to answer your question, I think it's gonna be five years. Okay. Before they get back there. I think this market's gonna be tough for a while from a valuation standpoint. I think generally tech is gonna kind of go up and down and sideways for a good year and a half, maybe even two years could be even longer. And then I think there's gonna be some next wave of productivity innovation that that hits. And then you're gonna, you're almost always gonna exceed the previous highs. It's gonna take a while. Yeah. >>Yeah, yeah. But I think their ability to disrupt the SIM market actually is something that I, I believe they're gonna do. I've been calling for the death of the sim for a long time and I know some people of Palo Alto are very cautious about saying that cuz the Splunks and the, you know, they're, they're their partners. But I, I think the, you know, it's what I said before, the, the tools are catching them, but they're, it's not in a way that's useful for the IT pro and, but I, I don't think the SIM vendors have that ecosystem of insight across network cloud endpoint. Right. Which is what you need in order to make a sim useful. >>CISO at an ETR round table said, if, if it weren't for my regulators, I would chuck my sim. >>Yes. >>But that's the only reason that, that this person was keeping it. No. >>Yeah. And I think the, the fact that most of those companies have moved to a perpetual MO or a a recurring revenue model actually helps unseat them. Typically when you pour a bunch of money into something, you remember the old computer associate says nobody ever took it out cuz the sunk dollars you spent to do it. But now that you're paying an annual recurring fee, it's actually makes it easier to take out. So >>Yeah, it's just an ebb and flow, right? Yeah. Because the maintenance costs were, you know, relatively low. Maybe it was 20% of the total. And then, you know, once every five years you had to do a refresh and you were still locked into the sort of maintenance and, and so yeah, I think you're right. The switching costs with sas, you know, in theory anyway, should be less >>Yeah. As long as you can migrate the data over. And I think they've got a pretty good handle on that. So, >>Yeah. So guys, I wanna get your perspective as a whole bunch of announcements here. We've only been here for a couple days, not a big conference as, as you can see from behind us. What Zs in your opinion was Palo Alto's main message and and what do you think about it main message at this event? And then same question for you. >>Yeah, I, I think their message largely wrapped around disruption, right? And, and they, and The's keynote already talked about that, right? And where they disrupted the firewall market by creating a NextGen firewall. In fact, if you look at all the new services they added to their firewall, you, you could almost say it's a NextGen NextGen firewall. But, but I do think the, the work they've done in the area of cloud and cortex actually I think is, is pretty impressive. And I think that's the, the SOC is ripe for disruption because it's for, for the most part, most socks still, you know, run off legacy playbooks. They run off legacy, you know, forensic models and things and they don't work. It's why we have so many breaches today. The, the dirty little secret that nobody ever wants to talk about is the bad guys are using machine learning, right? And so if you're using a signature based model, all they gotta do is tweak their model a little bit and it becomes, it bypasses them. So I, I think the only way to fight the the bad guys today is with you're gonna fight fire with fire. And I think that's, that's the path they've, they've headed >>Down. Yeah. The bad guys are hiding in plain sight, you know? Yeah, >>Yeah. Well it's, it's not hard to do now with a lot of those legacy tools. So >>I think, I think for me, you know, the stat that we threw out earlier, I think yesterday at our keynote analysis was, you know, the ETR data shows that are, that are that last survey around 35% of the respondents said we are actively consolidating, sorry, 44%, sorry, 35 says who are actively consolidating vendors, redundant vendors today that number's up to 44%. Yeah. It's by far the number one cost optimization technique. That's what these guys are pitching. And I think it's gonna resonate with people and, and I think to your point, they're integrating at the backend, their beeps are technical, right? I mean, they can deal with that complexity. Yeah. And so they don't need eye candy. Eventually they, they, they want to have that cuz it'll allow 'em to have deeper market penetration and make people more productive. But you know, that consolidation message came through loud and clear. >>Yeah. The big change in this industry too is all the new startups are all cloud native, right? They're all built on Amazon or Google or whatever. Yeah. And when your cloud native and you buy a cloud native integration is fast. It's not like having to integrate this big monolithic software stack anymore. Right. So I, I think their pace of integration will only accelerate from here because everything's now cloud native. >>If a customer comes to you or when a customer comes to you and says, Zs help us with this cyber transformation we have, our board isn't necessarily aligned with our executives in terms of execution of a security strategy. How do you advise them where Palo Alto is concerned? >>Yeah. You know, a lot, a lot of this is just fighting legacy mindset. And I've, I was talking with some CISOs here from state and local governments and things and they're, you know, they can't get more budget. They're fighting the tide. But what they did find is through the use of automation technology, they're able to bring their people costs way down. Right. And then be able to use that budget to invest in a lot of new projects. And so with that, you, you have to start with your biggest pain points, apply automation where you can, and then be able to use that budget to reinvest back in your security strategy. And it's good for the IT pros too, the security pros, my advice to the IT pros is, is if you're doing things today that aren't resume building, stop doing them. Right. Find a way to automate the money your job. And so if you're patching systems and you're looking through log files, there's no reason machines can't do that. And you go do something a lot more interesting. >>So true. It's like storage guys 10 years ago, provisioning loans. Yes. It's like, stop doing that. Yeah. You're gonna be outta a job. So who, last question I have is, is who do you see as the big competitors, the horses on the track question, right? So obviously Cisco kind of service has led for a while and you know, big portfolio company, CrowdStrike coming at it from end point. You know who, who, who do you see as the real players going for that? You know, right now the market's three to 4%. The leader has three, three 4% of the market. You know who they're all going for? 10, 15, maybe 20% of the market. Who, who are the likely candidates? Yeah, >>I don't know if CrowdStrike really has the breadth of portfolio to compete long term though. I I think they've had a nice run, but I, we might start to see the follow 'em. I think Microsoft is gonna be for middle. They've laid down the gauntlet, right? They are a security vendor, right? We, we were at Reinvent and a AWS is the platform for security vendors. Yes. Middle, somewhere in the middle. But Microsoft make no mistake, they're in security. They've got some good products. I think a lot of 'em are kind of good enough and they, they tie it to the licensing and I'm not sure that works in security, but they've certainly got the ear of a lot of it pros. >>It might work in smb. >>Yeah, yeah. It, it might. And, and I do like Zscaler. I, I know these guys poo poo the proxy model, but they've, they've done about as much with prox as you can. And I, I think it's, it's a battle of, I love the, the, the near, you know, proxies are dead and Jay's model, you know, Jay over at csca, throw 'em back at 'em. So I, it's good to see that kind of fight going on between the >>Two. Oh, it's great. Well, and, and again, ZScaler's coming at it from their cloud security angle. CrowdStrike's coming at it from endpoint. I, I do think CrowdStrike has an opportunity to build out the portfolio through m and a and maybe ecosystem. And then obviously, you know, Palo Alto's getting it done. How about Cisco? >>Yeah, Cisco's interesting. And I I think if Cisco can make the network matter in security and it should, right? We're talking about how a lot of you need a lot of forensics to fight security today. Well, they're gonna see things long before anybody else because they have all that network data. If they can tie network security, I, I mean they could really have that business take off. But we've been saying that about Cisco for 20 years. >>But big install based though. Yeah. It's hard for a company, any company to say, okay, hey Cisco customer sweep the floor and come with us. That's, that's >>A tough thing. They have a lot of good peace parts, right? And like duo's a good product and umbrella's a good product. They've, they've not done a good job. >>They're the opposite of these guys. >>They've not done a good job of the backend integration and that, that's where Cisco needs to, to focus. And I do think g G two Patel there fixed the WebEx group and I think he's now, in fact when you talk to him, he's doing very little on WebEx that that group's running itself and he's more focused in security. So I, I think we could see a resurgence there. But you know, they have a, from a revenue perspective, it's a little misleading cuz they have this big legacy base that's in decline while they're moving to cloud and stuff. So, but they, but they, there's a lot of Rick there trying to, to tie to network. >>Lots of fuel for conversation. We're gonna have to carry this on, on Silicon angle.com guys. Yes. And Wi KeePon. Lets do see us. Thank you so much for joining Dave and me giving us your insights as to this event. Where are gonna be next? Are you gonna be on >>Vacation? There's nothing more fun than mean on the cube. So what's outside of that though? Yeah, you know, Christmas coming up, I gotta go see family and be the obligatory, although for me that's a lot of travel, so I guess >>More planes. Yeah. >>Hopefully not in Vegas. >>Not in Vegas. >>Awesome. Nothing against Vegas. Yeah, no, >>We love it. We love >>It. Although I will say my year started off with ces. Yeah. And it's finishing up with Palo Alto here. The bookends. Yeah, exactly. In Vegas bookends. >>Well thanks so much for joining us. Thank you Dave. Always a pleasure to host a show with you and hear your insights. Reading your breaking analysis always kicks off my prep for show. And it, it's always great to see, but predictions come true. So thank you for being my co-host bet. All right. For Dave Valante Enz as Carla, I'm Lisa Martin. You've been watching The Cube, the leader in live, emerging and enterprise tech coverage. Thanks for watching.

Published Date : Dec 15 2022

SUMMARY :

The Cube presents Ignite 22, brought to you by Palo Alto It's the Cube Live at A friend of the Cube Guys, it's great to have you here. You know, I mean, I know was, yes, you sat in the analyst program, interested in what your takeaways were And I think it's safe to say they're more than firewall today. And so I think the old model of security of create Palo Alto's got, you know, whatever, 10, 15 years of, of, of history. And so the customer's gonna say, Hey, you know, I love your, your consolidation play, And I think that's something they need to work on changing. And so, cuz cuz because you know, we've talked about this, my guess is a lot of customers, you know, at that mid-level and the fat middle are like still sort And so, you know, I I interviewed a bunch of customers here and the ones that have bought into XDR And the only proof point you need, Dave, is look at every big breach that we've had over the last five And so the, I I think their approach of relying heavily on Is that a differentiator for them versus, we were talking before we went live that you and I first hit our very first segment back And so I think from a consistently you know, ultimately it's the customer. Angle prior to Accelerate and he, he on, you put it on Twitter and you asked people to rank you know, 10. And I think it depends on how you look at it. you know, the approach they've taken is the right one. I mean, this company was basically gonna double revenues to 7 billion That makes, I mean, and then and they wanna double again. What did, what did Nikesh was quoted as saying they wanna be the first cyber company that's a hundred billion dollars. And and do you have any projections on how, you know, how many years it's gonna take for them to get And that when you look around the show floor, it's not that impressive. And you know, if you look at CrowdStrike's ecosystem, it's, But I, I think the, you know, it's what I said before, the, the tools are catching I would chuck my sim. But that's the only reason that, that this person was keeping it. you remember the old computer associate says nobody ever took it out cuz the sunk dollars you spent to do it. And then, you know, once every five years you had to do a refresh and you were still And I think they've got a pretty good handle on that. Palo Alto's main message and and what do you think about it main message at this event? it's for, for the most part, most socks still, you know, run off legacy playbooks. Yeah, So I think, I think for me, you know, the stat that we threw out earlier, I think yesterday at our keynote analysis was, And when your cloud native and you buy a cloud native If a customer comes to you or when a customer comes to you and says, Zs help us with this cyber transformation And you go do something a lot more interesting. So obviously Cisco kind of service has led for a while and you know, big portfolio company, I don't know if CrowdStrike really has the breadth of portfolio to compete long term though. I love the, the, the near, you know, proxies are dead and Jay's model, And then obviously, you know, Palo Alto's getting it done. And I I think if Cisco can hey Cisco customer sweep the floor and come with us. And like duo's a good product and umbrella's a good product. And I do think g G two Patel there fixed the WebEx group and I think he's now, Thank you so much for joining Dave and me giving us your insights as to this event. you know, Christmas coming up, I gotta go see family and be the obligatory, although for me that's a lot of travel, Yeah. Yeah, no, We love it. And it's finishing up with Palo Alto here. Always a pleasure to host a show with you and hear your insights.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Lisa MartinPERSON

0.99+

CiscoORGANIZATION

0.99+

Dave ValantePERSON

0.99+

OracleORGANIZATION

0.99+

20%QUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Fort NetORGANIZATION

0.99+

2017DATE

0.99+

AmazonORGANIZATION

0.99+

20 yearsQUANTITY

0.99+

GoogleORGANIZATION

0.99+

VegasLOCATION

0.99+

CarlaPERSON

0.99+

70 billionQUANTITY

0.99+

80%QUANTITY

0.99+

IBMORGANIZATION

0.99+

10QUANTITY

0.99+

93%QUANTITY

0.99+

Palo AltoLOCATION

0.99+

AWSORGANIZATION

0.99+

five yearsQUANTITY

0.99+

2020DATE

0.99+

Palo Alto NetworksORGANIZATION

0.99+

JayPERSON

0.99+

45 billionQUANTITY

0.99+

7 billionQUANTITY

0.99+

Dave Valante EnzPERSON

0.99+

yesterdayDATE

0.99+

KaralaPERSON

0.99+

PaloORGANIZATION

0.99+

44%QUANTITY

0.99+

ibmORGANIZATION

0.99+

two vendorsQUANTITY

0.99+

35QUANTITY

0.99+

Palo Alto NetworksORGANIZATION

0.99+

Palo AltoORGANIZATION

0.99+

two panesQUANTITY

0.99+

threeQUANTITY

0.99+

ChristmasEVENT

0.99+

VMwareORGANIZATION

0.99+

8.5QUANTITY

0.99+

bothQUANTITY

0.99+

two yearsQUANTITY

0.99+

CrowdStrikeORGANIZATION

0.99+

56QUANTITY

0.99+

oneQUANTITY

0.99+

15QUANTITY

0.99+

second dayQUANTITY

0.99+

firstQUANTITY

0.99+

ReinventORGANIZATION

0.99+

LaceworkORGANIZATION

0.99+

ServiceNowORGANIZATION

0.99+

second pointQUANTITY

0.99+

59QUANTITY

0.99+

emcORGANIZATION

0.99+

4%QUANTITY

0.98+

OneQUANTITY

0.98+

twoQUANTITY

0.98+

todayDATE

0.98+

Ignite22ORGANIZATION

0.98+

two weeks agoDATE

0.98+

NairaORGANIZATION

0.98+

The CubeTITLE

0.98+

2023DATE

0.98+

RickPERSON

0.98+

Holger Mueller, Constellation Research | AWS re:Invent 2022


 

(upbeat music) >> Hey, everyone, welcome back to Las Vegas, "theCube" is on our fourth day of covering AWS re:Invent, live from the Venetian Expo Center. This week has been amazing. We've created a ton of content, as you know, 'cause you've been watching. But, there's been north of 55,000 people here, hundreds of thousands online. We've had amazing conversations across the AWS ecosystem. Lisa Martin, Paul Gillan. Paul, what's your, kind of, take on day four of the conference? It's still highly packed. >> Oh, there's lots of people here. (laughs) >> Yep. Unusual for the final day of a conference. I think Werner Vogels, if I'm pronouncing it right kicked things off today when he talked about asymmetry and how the world is, you know, asymmetric. We build symmetric software, because it's convenient to do so, but asymmetric software actually scales and evolves much better. And I think that that was a conversation starter for a lot of what people are talking about here today, which is how the cloud changes the way we think about building software. >> Absolutely does. >> Our next guest, Holger Mueller, that's one of his key areas of focus. And Holger, welcome, thanks for joining us on the "theCube". >> Thanks for having me. >> What did you take away from the keynote this morning? >> Well, how do you feel on the final day of the marathon, right? We're like 23, 24 miles. Hit the ball yesterday, right? >> We are going strong Holger. And, of course, >> Yeah. >> you guys, we can either talk about business transformation with cloud or the World Cup. >> Or we can do both. >> The World Cup, hands down. World Cup. (Lisa laughs) Germany's out, I'm unbiased now. They just got eliminated. >> Spain is out now. >> What will the U.S. do against Netherlands tomorrow? >> They're going to win. What's your forecast? U.S. will win? >> They're going to win 2 to 1. >> What do you say, 2:1? >> I'm optimistic, but realistic. >> 3? >> I think Netherlands. >> Netherlands will win? >> 2 to nothing. >> Okay, I'll vote for the U.S.. >> Okay, okay >> 3:1 for the U.S.. >> Be optimistic. >> Root for the U.S.. >> Okay, I like that. >> Hope for the best wherever you work. >> Tomorrow you'll see how much soccer experts we are. >> If your prediction was right. (laughs) >> (laughs) Ja, ja. Or yours was right, right, so. Cool, no, but the event, I think the event is great to have 50,000 people. Biggest event of the year again, right? Not yet the 70,000 we had in 2019. But it's great to have the energy. I've never seen the show floor going all the way down like this, right? >> I haven't either. >> I've never seen that. I think it's a record. Often vendors get the space here and they have the keynote area, and the entertainment area, >> Yeah. >> and the food area, and then there's an exposition, right? This is packed. >> It's packed. >> Maybe it'll pay off. >> You don't see the big empty booths that you often see. >> Oh no. >> Exactly, exactly. You know, the white spaces and so on. >> No. >> Right. >> Which is a good thing. >> There's lots of energy, which is great. And today's, of course, the developer day, like you said before, right now Vogels' a rockstar in the developer community, right. Revered visionary on what has been built, right? And he's becoming a little professorial is my feeling, right. He had these moments before too, when it was justifying how AWS moved off the Oracle database about the importance of data warehouses and structures and why DynamoDB is better and so on. But, he had a large part of this too, and this coming right across the keynotes, right? Adam Selipsky talking about Antarctica, right? Scott against almonds and what went wrong. He didn't tell us, by the way, which often the tech winners forget. Scott banked on technology. He had motorized sleds, which failed after three miles. So, that's not the story to tell the technology. Let everything down. Everybody went back to ponies and horses and dogs. >> Maybe goes back to these asynchronous behavior. >> Yeah. >> The way of nature. >> And, yesterday, Swami talking about the bridges, right? The root bridges, right? >> Right. >> So, how could Werner pick up with his video at the beginning. >> Yeah. >> And then talk about space and other things? So I think it's important to educate about event-based architecture, right? And we see this massive transformation. Modern software has to be event based, right? Because, that's how things work and we didn't think like this before. I see this massive transformation in my other research area in other platforms about the HR space, where payrolls are being rebuilt completely. And payroll used to be one of the three peaks of ERP, right? You would size your ERP machine before the cloud to financial close, to run the payroll, and to do an MRP manufacturing run if you're manufacturing. God forbid you run those three at the same time. Your machine wouldn't be able to do that, right? So it was like start the engine, start the boosters, we are running payroll. And now the modern payroll designs like you see from ADP or from Ceridian, they're taking every payroll relevant event. You check in time wise, right? You go overtime, you take a day of vacation and right away they trigger and run the payroll, so it's up to date for you, up to date for you, which, in this economy, is super important, because we have more gig workers, we have more contractors, we have employees who are leaving suddenly, right? The great resignation, which is happening. So, from that perspective, it's the modern way of building software. So it's great to see Werner showing that. The dirty little secrets though is that is more efficient software for the cloud platform vendor too. Takes less resources, gets less committed things, so it's a much more scalable architecture. You can move the events, you can work asynchronously much better. And the biggest showcase, right? What's the biggest transactional showcase for an eventually consistent asynchronous transactional application? I know it's a mouthful, but we at Amazon, AWS, Amazon, right? You buy something on Amazon they tell you it's going to come tomorrow. >> Yep. >> They don't know it's going to come tomorrow by that time, because it's not transactionally consistent, right? We're just making every ERP vendor, who lives in transactional work, having nightmares of course, (Lisa laughs) but for them it's like, yes we have the delivery to promise, a promise to do that, right? But they come back to you and say, "Sorry, we couldn't make it, delivery didn't work and so on. It's going to be a new date. We are out of the product.", right? So these kind of event base asynchronous things are more and more what's going to scale around the world. It's going to be efficient for everybody, it's going to be better customer experience, better employee experience, ultimately better user experience, it's going to be better for the enterprise to build, but we have to learn to build it. So big announcement was to build our environment to build better eventful applications from today. >> Talk about... This is the first re:Invent... Well, actually, I'm sorry, it's the second re:Invent under Adam Selipsky. >> Right. Adam Selipsky, yep. >> But his first year. >> Right >> We're hearing a lot of momentum. What's your takeaway with what he delivered with the direction Amazon is going, their vision? >> Ja, I think compared to the Jassy times, right, we didn't see the hockey stick slide, right? With a number of innovations and releases. That was done in 2019 too, right? So I think it's a more pedestrian pace, which, ultimately, is good for everybody, because it means that when software vendors go slower, they do less width, but more depth. >> Yeah. >> And depth is what customers need. So Amazon's building more on the depth side, which is good news. I also think, and that's not official, right, but Adam Selipsky came from Tableau, right? >> Yeah. So he is a BI analytics guy. So it's no surprise we have three data lake offerings, right? Security data lake, we have a healthcare data lake and we have a supply chain data lake, right? Where all, again, the epigonos mentioned them I was like, "Oh, my god, Amazon's coming to supply chain.", but it's actually data lakes, which is an interesting part. But, I think it's not a surprise that someone who comes heavily out of the analytics BI world, it's off ringside, if I was pitching internally to him maybe I'd do something which he's is familiar with and I think that's what we see in the major announcement of his keynote on Tuesday. >> I mean, speaking of analytics, one of the big announcements early on was Amazon is trying to bridge the gap between Aurora. >> Yep. >> And Redshift. >> Right. >> And setting up for continuous pipelines, continuous integration. >> Right. >> Seems to be a trend that is common to all database players. I mean, Oracle is doing the same thing. SAP is doing the same thing. MariaDB. Do you see the distinction between transactional and analytical databases going away? >> It's coming together, right? Certainly coming together, from that perspective, but there's a fundamental different starting point, right? And with the big idea part, right? The universal database, which does everything for you in one system, whereas the suite of specialized databases, right? Oracle is in the classic Oracle database in the universal database camp. On the other side you have Amazon, which built a database. This is one of the first few Amazon re:Invents. It's my 10th where there was no new database announced. Right? >> No. >> So it was always add another one specially- >> I think they have enough. >> It's a great approach. They have enough, right? So it's a great approach to build something quick, which Amazon is all about. It's not so great when customers want to leverage things. And, ultimately, which I think with Selipsky, AWS is waking up to the enterprise saying, "I have all this different database and what is in them matters to me." >> Yeah. >> "So how can I get this better?" So no surprise between the two most popular database, Aurora and RDS. They're bring together the data with some out of the box parts. I think it's kind of, like, silly when Swami's saying, "Hey, no ETL.". (chuckles) Right? >> Yeah. >> There shouldn't be an ETL from the same vendor, right? There should be data pipes from that perspective anyway. So it looks like, on the overall value proposition database side, AWS is moving closer to the universal database on the Oracle side, right? Because, if you lift, of course, the universal database, under the hood, you see, well, there's different database there, different part there, you do something there, you have to configure stuff, which is also the case but it's one part of it, right, so. >> With that shift, talk about the value that's going to be in it for customers regardless of industry. >> Well, the value for customers is great, because when software vendors, or platform vendors, go in depth, you get more functionality, you get more maturity you get easier ways of setting up the whole things. You get ways of maintaining things. And you, ultimately, get lower TCO to build them, which is super important for enterprise. Because, here, this is the developer cloud, right? Developers love AWS. Developers are scarce, expensive. Might not be want to work for you, right? So developer velocity getting more done with same amount of developers, getting less done, less developers getting more done, is super crucial, super important. So this is all good news for enterprise banking on AWS and then providing them more efficiency, more automation, out of the box. >> Some of your customer conversations this week, talk to us about some of the feedback. What's the common denominator amongst customers right now? >> Customers are excited. First of all, like, first event, again in person, large, right? >> Yeah. >> People can travel, people meet each other, meet in person. They have a good handle around the complexity, which used to be a huge challenge in the past, because people say, "Do I do this?" I know so many CXOs saying, "Yeah, I want to build, say, something in IoT with AWS. The first reference built it like this, the next reference built it completely different. The third one built it completely different again. So now I'm doubting if my team has the skills to build things successfully, because will they be smart enough, like your teams, because there's no repetitiveness and that repetitiveness is going to be very important for AWS to come up with some higher packaging and version numbers.", right? But customers like that message. They like that things are working better together. They're not missing the big announcement, right? One of the traditional things of AWS would be, and they made it even proud, as a system, Jassy was saying, "If we look at the IT spend and we see something which is, like, high margin for us and not served well and we announced something there, right?" So Quick Start, Workspaces, where all liaisons where AWS went after traditional IT spend and had an offering. We haven't had this in 2019, we don't have them in 2020. Last year and didn't have it now. So something is changing on the AWS side. It's a little bit too early to figure out what, but they're not chewing off as many big things as they used in the past. >> Right. >> Yep. >> Did you get the sense that... Keith Townsend, from "The CTO Advisor", was on earlier. >> Yep. >> And he said he's been to many re:Invents, as you have, and he said that he got the sense that this is Amazon's chance to do a victory lap, as he called it. That this is a way for Amazon to reinforce the leadership cloud. >> Ja. >> And really, kind of, establish that nobody can come close to them, nobody can compete with them. >> You don't think that- >> I don't think that's at all... I mean, love Keith, he's a great guy, but I don't think that's the mindset at all, right? So, I mean, Jassy was always saying, "It's still the morning of the day in the cloud.", right? They're far away from being done. They're obsessed over being right. They do more work with the analysts. We think we got something right. And I like the passion, from that perspective. So I think Amazon's far from being complacent and the area, which is the biggest bit, right, the biggest. The only thing where Amazon truly has floundered, always floundered, is the AI space, right? So, 2018, Werner Vogels was doing more technical stuff that "Oh, this is all about linear regression.", right? And Amazon didn't start to put algorithms on silicon, right? And they have a three four trail and they didn't announce anything new here, behind Google who's been doing this for much, much longer than TPU platform, so. >> But they have now. >> They're keen aware. >> Yep. >> They now have three, or they own two of their own hardware platforms for AI. >> Right. >> They support the Intel platform. They seem to be catching up in that area. >> It's very hard to catch up on hardware, right? Because, there's release cycles, right? And just the volume that, just talking about the largest models that we have right now, to do with the language models, and Google is just doing a side note of saying, "Oh, we supported 50 less or 30 less, not little spoken languages, which I've never even heard of, because they're under banked and under supported and here's the language model, right? And I think it's all about little bit the organizational DNA of a company. I'm a strong believer in that. And, you have to remember AWS comes from the retail side, right? >> Yeah. >> Their roll out of data centers follows their retail strategy. Open secret, right? But, the same thing as the scale of the AI is very very different than if you take a look over at Google where it makes sense of the internet, right? The scale right away >> Right. >> is a solution, which is a good solution for some of the DNA of AWS. Also, Microsoft Azure is good. There has no chance to even get off the ship of that at Google, right? And these leaders with Google and it's not getting smaller, right? We didn't hear anything. I mean so much focused on data. Why do they focus so much on data? Because, data is the first step for AI. If AWS was doing a victory lap, data would've been done. They would own data, right? They would have a competitor to BigQuery Omni from the Google side to get data from the different clouds. There's crickets on that topic, right? So I think they know that they're catching up on the AI side, but it's really, really hard. It's not like in software where you can't acquire someone they could acquire in video. >> Not at Core Donovan. >> Might play a game, but that's not a good idea, right? So you can't, there's no shortcuts on the hardware side. As much as I'm a software guy and love software and don't like hardware, it's always a pain, right? There's no shortcuts there and there's nothing, which I think, has a new Artanium instance, of course, certainly, but they're not catching up. The distance is the same, yep. >> One of the things is funny, one of our guests, I think it was Tuesday, it was, it was right after Adam's keynote. >> Sure. >> Said that Adam Selipsky stood up on stage and talked about data for 52 minutes. >> Yeah. Right. >> It was timed, 52 minutes. >> Right. >> Huge emphasis on that. One of the things that Adam said to John Furrier when they were able to sit down >> Yeah >> a week or so ago at an event preview, was that CIOs and CEOs are not coming to Adam to talk about technology. They want to talk about transformation. They want to talk about business transformation. >> Sure, yes, yes. >> Talk to me in our last couple of minutes about what CEOs and CIOs are coming to you saying, "Holger, help us figure this out. We have to transform the business." >> Right. So we advise, I'm going quote our friends at Gartner, once the type A company. So we'll use technology aggressively, right? So take everything in the audience with a grain of salt, followers are the laggards, and so on. So for them, it's really the cusp of doing AI, right? Getting that data together. It has to be in the cloud. We live in the air of infinite computing. The cloud makes computing infinite, both from a storage, from a compute perspective, from an AI perspective, and then define new business models and create new best practices on top of that. Because, in the past, everything was fine out on premise, right? We talked about the (indistinct) size. Now in the cloud, it's just the business model to say, "Do I want to have a little more AI? Do I want a to run a little more? Will it give me the insight in the business?". So, that's the transformation that is happening, really. So, bringing your data together, this live conversation data, but not for bringing the data together. There's often the big win for the business for the first time to see the data. AWS is banking on that. The supply chain product, as an example. So many disparate systems, bring them them together. Big win for the business. But, the win for the business, ultimately, is when you change the paradigm from the user showing up to do something, to software doing stuff for us, right? >> Right. >> We have too much in this operator paradigm. If the user doesn't show up, doesn't find the click, doesn't find where to go, nothing happens. It can't be done in the 21st century, right? Software has to look over your shoulder. >> Good point. >> Understand one for you, autonomous self-driving systems. That's what CXOs, who're future looking, will be talked to come to AWS and all the other cloud vendors. >> Got it, last question for you. We're making a sizzle reel on Instagram. >> Yeah. >> If you had, like, a phrase, like, or a 30 second pitch that would describe re:Invent 2022 in the direction the company's going. What would that elevator pitch say? >> 30 second pitch? >> Yeah. >> All right, just timing. AWS is doing well. It's providing more depth, less breadth. Making things work together. It's catching up in some areas, has some interesting offerings, like the healthcare offering, the security data lake offering, which might change some things in the industry. It's staying the course and it's going strong. >> Ah, beautifully said, Holger. Thank you so much for joining Paul and me. >> Might have been too short. I don't know. (laughs) >> About 10 seconds left over. >> It was perfect, absolutely perfect. >> Thanks for having me. >> Perfect sizzle reel. >> Appreciate it. >> We appreciate your insights, what you're seeing this week, and the direction the company is going. We can't wait to see what happens in the next year. And, yeah. >> Thanks for having me. >> And of course, we've been on so many times. We know we're going to have you back. (laughs) >> Looking forward to it, thank you. >> All right, for Holger Mueller and Paul Gillan, I'm Lisa Martin. You're watching "theCube", the leader in live enterprise and emerging tech coverage. (upbeat music)

Published Date : Dec 1 2022

SUMMARY :

across the AWS ecosystem. of people here. and how the world is, And Holger, welcome, on the final day of the marathon, right? And, of course, or the World Cup. They just got eliminated. What will the U.S. do They're going to win. Hope for the best experts we are. was right. Biggest event of the year again, right? and the entertainment area, and the food area, the big empty booths You know, the white spaces in the developer community, right. Maybe goes back to So, how could Werner pick up and run the payroll, the enterprise to build, This is the first re:Invent... Right. a lot of momentum. compared to the Jassy times, right, more on the depth side, in the major announcement one of the big announcements early on And setting up for I mean, Oracle is doing the same thing. This is one of the first to build something quick, So no surprise between the So it looks like, on the overall talk about the value Well, the value for customers is great, What's the common denominator First of all, like, So something is changing on the AWS side. Did you get the sense that... and he said that he got the sense that can come close to them, And I like the passion, or they own two of their own the Intel platform. and here's the language model, right? But, the same thing as the scale of the AI from the Google side to get The distance is the same, yep. One of the things is funny, Said that Adam Selipsky Yeah. One of the things that are not coming to Adam coming to you saying, for the first time to see the data. It can't be done in the come to AWS and all the We're making a sizzle reel on Instagram. 2022 in the direction It's staying the course Paul and me. I don't know. It was perfect, and the direction the company is going. And of course, we've the leader in live enterprise

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
PaulPERSON

0.99+

HolgerPERSON

0.99+

AdamPERSON

0.99+

ScottPERSON

0.99+

Adam SelipskyPERSON

0.99+

Lisa MartinPERSON

0.99+

AmazonORGANIZATION

0.99+

JassyPERSON

0.99+

KeithPERSON

0.99+

GartnerORGANIZATION

0.99+

Paul GillanPERSON

0.99+

23QUANTITY

0.99+

AWSORGANIZATION

0.99+

twoQUANTITY

0.99+

2019DATE

0.99+

TuesdayDATE

0.99+

2020DATE

0.99+

Las VegasLOCATION

0.99+

Last yearDATE

0.99+

GoogleORGANIZATION

0.99+

Holger MuellerPERSON

0.99+

Keith TownsendPERSON

0.99+

Werner VogelsPERSON

0.99+

OracleORGANIZATION

0.99+

WernerPERSON

0.99+

21st centuryDATE

0.99+

52 minutesQUANTITY

0.99+

threeQUANTITY

0.99+

yesterdayDATE

0.99+

2018DATE

0.99+

Holger MuellerPERSON

0.99+

10thQUANTITY

0.99+

firstQUANTITY

0.99+

TomorrowDATE

0.99+

NetherlandsORGANIZATION

0.99+

U.S.ORGANIZATION

0.99+

50QUANTITY

0.99+

tomorrowDATE

0.99+

LisaPERSON

0.99+

first timeQUANTITY

0.99+

50,000 peopleQUANTITY

0.99+

John FurrierPERSON

0.99+

AntarcticaLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

third oneQUANTITY

0.99+

2QUANTITY

0.99+

Mark Terenzoni, AWS | AWS re:Invent 2022


 

(upbeat music) >> Hello, everyone and welcome back to fabulous Las Vegas, Nevada, where we are here on the show floor at AWS re:Invent. We are theCUBE. I am Savannah Peterson, joined with John Furrier. John, afternoon, day two, we are in full swing. >> Yes. >> What's got you most excited? >> Just got lunch, got the food kicking in. No, we don't get coffee. (Savannah laughing) >> Way to bring the hype there, John. >> No, there's so many people here just in Amazon. We're back to 2019 levels of crowd. The interest levels are high. Next gen, cloud security, big part of the keynote. This next segment, I am super excited about. CUBE Alumni, going back to 2013, 10 years ago he was on theCUBE. Now, 10 years later we're at re:Invent, looking forward to this guest and it's about security, great topic. >> I don't want to delay us anymore, please welcome Mark. Mark, thank you so much for being here with us. Massive day for you and the team. I know you oversee three different units at Amazon, Inspector, Detective, and the most recently announced, Security Lake. Tell us about Amazon Security Lake. >> Well, thanks Savannah. Thanks John for having me. Well, Security Lake has been in the works for a little bit of time and it got announced today at the keynote as you heard from Adam. We're super excited because there's a couple components that are really unique and valuable to our customers within Security Lake. First and foremost, the foundation of Security Lake is an open source project we call OCFS, Open Cybersecurity Framework Schema. And what that allows is us to work with the vendor community at large in the security space and develop a language where we can all communicate around security data. And that's the language that we put into Security Data Lake. We have 60 vendors participating in developing that language and partnering within Security Lake. But it's a communal lake where customers can bring all of their security data in one place, whether it's generated in AWS, they're on-prem, or SaaS offerings or other clouds, all in one location in a language that allows analytics to take advantage of that analytics and give better outcomes for our customers. >> So Adams Selipsky big keynote, he spent all the bulk of his time on data and security. Obviously they go well together, we've talked about this in the past on theCUBE. Data is part of security, but this security's a little bit different in the sense that the global footprint of AWS makes it uniquely positioned to manage some security threats, EKS protection, a very interesting announcement, runtime layer, but looking inside and outside the containers, probably gives extra telemetry on some of those supply chains vulnerabilities. This is actually a very nuanced point. You got Guard Duty kind of taking its role. What does it mean for customers 'cause there's a lot of things in this announcement that he didn't have time to go into detail. Unpack all the specifics around what the security announcement means for customers. >> Yeah, so we announced four items in Adam's keynote today within my team. So I'll start with Guard Duty for EKS runtime. It's complimenting our existing capabilities for EKS support. So today Inspector does vulnerability assessment on EKS or container images in general. Guard Duty does detections of EKS workloads based on log data. Detective does investigation and analysis based on that log data as well. With the announcement today, we go inside the container workloads. We have more telemetry, more fine grain telemetry and ultimately we can provide better detections for our customers to analyze risks within their container workload. So we're super excited about that one. Additionally, we announced Inspector for Lambda. So Inspector, we released last year at re:Invent and we focused mostly on EKS container workloads and EC2 workloads. Single click automatically assess your environment, start generating assessments around vulnerabilities. We've added Lambda to that capability for our customers. The third announcement we made was Macy sampling. So Macy has been around for a while in delivering a lot of value for customers providing information around their sensitive data within S3 buckets. What we found is many customers want to go and characterize all of the data in their buckets, but some just want to know is there any sensitive data in my bucket? And the sampling feature allows the customer to find out their sensitive data in the bucket, but we don't have to go through and do all of the analysis to tell you exactly what's in there. >> Unstructured and structured data. Any data? >> Correct, yeah. >> And the fourth? >> The fourth, Security Data Lake? (John and Savannah laughing) Yes. >> Okay, ocean theme. data lake. >> Very complimentary to all of our services, but the unique value in the data lake is that we put the information in the customer's control. It's in their S3 bucket, they get to decide who gets access to it. We've heard from customers over the years that really have two options around gathering large scale data for security analysis. One is we roll our own and we're security engineers, we're not data engineers. It's really hard for them to build these distributed systems at scale. The second one is we can pick a vendor or a partner, but we're locked in and it's in their schemer and their format and we're there for a long period of time. With Security Data Lake, they get the best of both worlds. We run the infrastructure at scale for them, put the data in their control and they get to decide what use case, what partner, what tool gives them the most value on top of their data. >> Is that always a good thing to give the customers too much control? 'Cause you know the old expression, you give 'em a knife they play with and they they can cut themselves, I mean. But no, seriously, 'cause what's the provisions around that? Because control was big part of the governance, how do you manage the security? How does the customer worry about, if I have too much control, someone makes a mistake? >> Well, what we finding out today is that many customers have realized that some of their data has been replicated seven times, 10 times, not necessarily maliciously, but because they have multiple vendors that utilize that data to give them different use cases and outcomes. It becomes costly and unwieldy to figure out where all that data is. So by centralizing it, the control is really around who has access to the data. Now, ultimately customers want to make those decisions and we've made it simple to aggregate this data in a single place. They can develop a home region if they want, where all the data flows into one region, they can distribute it globally. >> They're in charge. >> They're in charge. But the controls are mostly in the hands of the data governance person in the company, not the security analyst. >> So I'm really curious, you mentioned there's 60 AWS partner companies that have collaborated on the Security lake. Can you tell us a little bit about the process? How long does it take? Are people self-selecting to contribute to these projects? Are you cherry picking? What does that look like? >> It's a great question. There's three levels of collaboration. One is around the open source project that we announced at Black Hat early in this year called OCSF. And that collaboration is we've asked the vendor community to work with us to build a schema that is universally acceptable to security practitioners, not vendor specific and we've asked. >> Savannah: I'm sorry to interrupt you, but is this a first of its kind? >> There's multiple schemes out there developed by multiple parties. They've been around for multiple years, but they've been built by a single vendor. >> Yeah, that's what I'm drill in on a little bit. It sounds like the first we had this level of collaboration. >> There's been collaborations around them, but in a handful of companies. We've really gone to a broad set of collaborators to really get it right. And they're focused around areas of expertise that they have knowledge in. So the EDR vendors, they're focused around the scheme around EDR. The firewall vendors are focused around that area. Certainly the cloud vendors are in their scope. So that's level one of collaboration and that gets us the level playing field and the language in which we'll communicate. >> Savannah: Which is so important. >> Super foundational. Then the second area is around producers and subscribers. So many companies generate valuable security data from the tools that they run. And we call those producers the publishers and they publish the data into Security Lake within that OCSF format. Some of them are in the form of findings, many of them in the form of raw telemetry. Then the second one is in the subscriber side and those are usually analytic vendors, SIM vendors, XDR vendors that take advantage of the logs in one place and generate analytic driven outcomes on top of that, use cases, if you will, that highlight security risks or issues for customers. >> Savannah: Yeah, cool. >> What's the big customer focus when you start looking at Security Lakes? How do you see that planning out? You said there's a collaboration, love the open source vibe on that piece, what data goes in there? What's sharing? 'Cause a big part of the keynote I heard today was, I heard clean rooms, I've cut my antenna up. I'd love to hear that. That means there's an implied sharing aspect. The security industry's been sharing data for a while. What kind of data's in that lake? Give us an example, take us through. >> Well, this a number of sources within AWS, as customers run their workloads in AWS. We've identified somewhere around 25 sources that will be natively single click into Amazon Security Lake. We were announcing nine of them. They're traditional network logs, BBC flow, cloud trail logs, firewall logs, findings that are generated across AWS, EKS audit logs, RDS data logs. So anything that customers run workloads on will be available in data lake. But that's not limited to AWS. Customers run their environments hybridly, they have SaaS applications, they use other clouds in some instances. So it's open to bring all that data in. Customers can vector it all into this one single location if they decide, we make it pretty simple for them to do that. Again, in the same format where outcomes can be generated quickly and easily. >> Can you use the data lake off on premise or it has to be in an S3 in Amazon Cloud? >> Today it's in S3 in Amazon. If we hear customers looking to do something different, as you guys know, we tend to focus on our customers and what they want us to do, but they've been pretty happy about what we've decided to do in this first iteration. >> So we got a story about Silicon Angle. Obviously the ingestion is a big part of it. The reporters are jumping in, but the 53rd party sources is a pretty big number. Is that coming from the OCSF or is that just in general? Who's involved? >> Yeah, OCSF is the big part of that and we have a list of probably 50 more that want to join in part of this. >> The other big names are there, Cisco, CrowdStrike, Peloton Networks, all the big dogs are in there. >> All big partners of AWS, anyway, so it was an easy conversation and in most cases when we started having the conversation, they were like, "Wow, this has really been needed for a long time." And given our breadth of partners and where we sit from our customers perspective in the center of their cloud journey that they've looked at us and said, "You guys, we applaud you for driving this." >> So Mark, take us through the conversations you're having with the customers at re:Inforce. We saw a lot of meetings happening. It was great to be back face to face. You guys have been doing a lot of customer conversation, security Data Lake came out of that. What was the driving force behind it? What were some of the key concerns? What were the challenges and what's now the opportunity that's different? >> We heard from our customers in general. One, it's too hard for us to get all the data we need in a single place, whether through AWS, the industry in general, it's just too hard. We don't have those resources to data wrangle that data. We don't know how to pick schema. There's multiple ones out there. Tell us how we would do that. So these three challenges came out front and center for every customer. And mostly what they said is our resources are limited and we want to focus those resources on security outcomes and we have security engines. We don't want to focus them on data wrangling and large scale distributed systems. Can you help us solve that problem? And it came out loud and clear from almost every customer conversation we had. And that's where we took the challenge. We said, "Okay, let's build this data layer." And then on top of that we have services like Detective and Guard Duty, we'll take advantage of it as well. But we also have a myriad of ISV third parties that will also sit on top of that data and render out. >> What's interesting, I want to get your reaction. I know we don't have much time left, but I want to get your thoughts. When I see Security Data Lake, which is awesome by the way, love the focus, love how you guys put that together. It makes me realize the big thing in re:Invent this year is this idea of specialized solutions. You got instances for this and that, use cases that require certain kind of performance. You got the data pillars that Adam laid out. Are we going to start seeing more specialized data lakes? I mean, we have a video data lake. Is there going to be a FinTech data lake? Is there going to be, I mean, you got the Great Lakes kind of going on here, what is going on with these lakes? I mean, is that a trend that Amazon sees or customers are aligning to? >> Yeah, we have a couple lakes already. We have a healthcare lake and a financial lake and now we have a security lake. Foundationally we have Lake Formation, which is the tool that anyone can build a lake. And most of our lakes run on top of Lake Foundation, but specialize. And the specialization is in the data aggregation, normalization, enridgement, that is unique for those use cases. And I think you'll see more and more. >> John: So that's a feature, not a bug. >> It's a feature, it's a big feature. The customers have ask for it. >> So they want roll their own specialized, purpose-built data thing, lake? They can do it. >> And customer don't want to combine healthcare information with security information. They have different use cases and segmentation of the information that they care about. So I think you'll see more. Now, I also think that you'll see where there are adjacencies that those lakes will expand into other use cases in some cases too. >> And that's where the right tools comes in, as he was talking about this ETL zero, ETL feature. >> It be like an 80, 20 rule. So if 80% of the data is shared for different use cases, you can see how those lakes would expand to fulfill multiple use cases. >> All right, you think he's ready for the challenge? Look, we were on the same page. >> Okay, we have a new challenge, go ahead. >> So think of it as an Instagram Reel, sort of your hot take, your thought leadership moment, the clip we're going to come back to and reference your brilliance 10 years down the road. I mean, you've been a CUBE veteran, now CUBE alumni for almost 10 years, in just a few weeks it'll be that. What do you think is, and I suspect, I think I might know your answer to this, so feel free to be robust in this. But what do you think is the biggest story, key takeaway from the show this year? >> We're democratizing security data within Security Data Lake for sure. >> Well said, you are our shortest answer so far on theCUBE and I absolutely love and respect that. Mark, it has been a pleasure chatting with you and congratulations, again, on the huge announcement. This is such an exciting day for you all. >> Thank you Savannah, thank you John, pleasure to be here. >> John: Thank you, great to have you. >> We look forward to 10 more years of having you. >> Well, maybe we don't have to wait 10 years. (laughs) >> Well, more years, in another time. >> I have a feeling it'll be a lot of security content this year. >> Yeah, pretty hot theme >> Very hot theme. >> Pretty odd theme for us. >> Of course, re:Inforce will be there this year again, coming up 2023. >> All the res. >> Yep, all the res. >> Love that. >> We look forward to see you there. >> All right, thanks, Mark. >> Speaking of res, you're the reason we are here. Thank you all for tuning in to today's live coverage from AWS re:Invent. We are in Las Vegas, Nevada with John Furrier. My name is Savannah Peterson. We are theCUBE and we are the leading source for high tech coverage. (upbeat music)

Published Date : Nov 29 2022

SUMMARY :

to fabulous Las Vegas, Nevada, the food kicking in. big part of the keynote. and the most recently First and foremost, the and outside the containers, and do all of the analysis Unstructured and structured data. (John and Savannah laughing) data lake. and they get to decide what part of the governance, that data to give them different of the data governance on the Security lake. One is around the open source project They've been around for multiple years, It sounds like the first we had and the language in in the subscriber side 'Cause a big part of the Again, in the same format where outcomes and what they want us to do, Is that coming from the OCSF Yeah, OCSF is the big part of that all the big dogs are in there. in the center of their cloud journey the conversations you're having and we have security engines. You got the data pillars in the data aggregation, The customers have ask for it. So they want roll of the information that they care about. And that's where the So if 80% of the data is ready for the challenge? Okay, we have a new is the biggest story, We're democratizing security data on the huge announcement. Thank you Savannah, thank We look forward to 10 Well, maybe we don't have of security content this year. be there this year again, the reason we are here.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SavannahPERSON

0.99+

Mark TerenzoniPERSON

0.99+

CiscoORGANIZATION

0.99+

JohnPERSON

0.99+

Savannah PetersonPERSON

0.99+

MarkPERSON

0.99+

AmazonORGANIZATION

0.99+

10 timesQUANTITY

0.99+

John FurrierPERSON

0.99+

AWSORGANIZATION

0.99+

80%QUANTITY

0.99+

CrowdStrikeORGANIZATION

0.99+

AdamPERSON

0.99+

2019DATE

0.99+

10 yearsQUANTITY

0.99+

2023DATE

0.99+

last yearDATE

0.99+

seven timesQUANTITY

0.99+

60 vendorsQUANTITY

0.99+

2013DATE

0.99+

Peloton NetworksORGANIZATION

0.99+

MacyORGANIZATION

0.99+

three challengesQUANTITY

0.99+

CUBEORGANIZATION

0.99+

TodayDATE

0.99+

10 years laterDATE

0.99+

Las Vegas, NevadaLOCATION

0.99+

todayDATE

0.99+

10 more yearsQUANTITY

0.99+

80QUANTITY

0.99+

OneQUANTITY

0.99+

first iterationQUANTITY

0.98+

10 years agoDATE

0.98+

60QUANTITY

0.98+

two optionsQUANTITY

0.98+

FirstQUANTITY

0.98+

third announcementQUANTITY

0.98+

firstQUANTITY

0.98+

fourthQUANTITY

0.98+

one regionQUANTITY

0.98+

Las Vegas, NevadaLOCATION

0.98+

this yearDATE

0.98+

Data LakeORGANIZATION

0.97+

both worldsQUANTITY

0.97+

20 ruleQUANTITY

0.97+

Great LakesLOCATION

0.97+

single placeQUANTITY

0.96+

Security LakeORGANIZATION

0.96+

S3TITLE

0.96+

one placeQUANTITY

0.96+

one locationQUANTITY

0.96+

InstagramORGANIZATION

0.96+

EKSORGANIZATION

0.95+

The Truth About MySQL HeatWave


 

>>When Oracle acquired my SQL via the Sun acquisition, nobody really thought the company would put much effort into the platform preferring to focus all the wood behind its leading Oracle database, Arrow pun intended. But two years ago, Oracle surprised many folks by announcing my SQL Heatwave a new database as a service with a massively parallel hybrid Columbia in Mary Mary architecture that brings together transactional and analytic data in a single platform. Welcome to our latest database, power panel on the cube. My name is Dave Ante, and today we're gonna discuss Oracle's MySQL Heat Wave with a who's who of cloud database industry analysts. Holgar Mueller is with Constellation Research. Mark Stammer is the Dragon Slayer and Wikibon contributor. And Ron Westfall is with Fu Chim Research. Gentlemen, welcome back to the Cube. Always a pleasure to have you on. Thanks for having us. Great to be here. >>So we've had a number of of deep dive interviews on the Cube with Nip and Aggarwal. You guys know him? He's a senior vice president of MySQL, Heatwave Development at Oracle. I think you just saw him at Oracle Cloud World and he's come on to describe this is gonna, I'll call it a shock and awe feature additions to to heatwave. You know, the company's clearly putting r and d into the platform and I think at at cloud world we saw like the fifth major release since 2020 when they first announced MySQL heat wave. So just listing a few, they, they got, they taken, brought in analytics machine learning, they got autopilot for machine learning, which is automation onto the basic o l TP functionality of the database. And it's been interesting to watch Oracle's converge database strategy. We've contrasted that amongst ourselves. Love to get your thoughts on Amazon's get the right tool for the right job approach. >>Are they gonna have to change that? You know, Amazon's got the specialized databases, it's just, you know, the both companies are doing well. It just shows there are a lot of ways to, to skin a cat cuz you see some traction in the market in, in both approaches. So today we're gonna focus on the latest heat wave announcements and we're gonna talk about multi-cloud with a native MySQL heat wave implementation, which is available on aws MySQL heat wave for Azure via the Oracle Microsoft interconnect. This kind of cool hybrid action that they got going. Sometimes we call it super cloud. And then we're gonna dive into my SQL Heatwave Lake house, which allows users to process and query data across MyQ databases as heatwave databases, as well as object stores. So, and then we've got, heatwave has been announced on AWS and, and, and Azure, they're available now and Lake House I believe is in beta and I think it's coming out the second half of next year. So again, all of our guests are fresh off of Oracle Cloud world in Las Vegas. So they got the latest scoop. Guys, I'm done talking. Let's get into it. Mark, maybe you could start us off, what's your opinion of my SQL Heatwaves competitive position? When you think about what AWS is doing, you know, Google is, you know, we heard Google Cloud next recently, we heard about all their data innovations. You got, obviously Azure's got a big portfolio, snowflakes doing well in the market. What's your take? >>Well, first let's look at it from the point of view that AWS is the market leader in cloud and cloud services. They own somewhere between 30 to 50% depending on who you read of the market. And then you have Azure as number two and after that it falls off. There's gcp, Google Cloud platform, which is further way down the list and then Oracle and IBM and Alibaba. So when you look at AWS and you and Azure saying, hey, these are the market leaders in the cloud, then you start looking at it and saying, if I am going to provide a service that competes with the service they have, if I can make it available in their cloud, it means that I can be more competitive. And if I'm compelling and compelling means at least twice the performance or functionality or both at half the price, I should be able to gain market share. >>And that's what Oracle's done. They've taken a superior product in my SQL heat wave, which is faster, lower cost does more for a lot less at the end of the day and they make it available to the users of those clouds. You avoid this little thing called egress fees, you avoid the issue of having to migrate from one cloud to another and suddenly you have a very compelling offer. So I look at what Oracle's doing with MyQ and it feels like, I'm gonna use a word term, a flanking maneuver to their competition. They're offering a better service on their platforms. >>All right, so thank you for that. Holger, we've seen this sort of cadence, I sort of referenced it up front a little bit and they sat on MySQL for a decade, then all of a sudden we see this rush of announcements. Why did it take so long? And and more importantly is Oracle, are they developing the right features that cloud database customers are looking for in your view? >>Yeah, great question, but first of all, in your interview you said it's the edit analytics, right? Analytics is kind of like a marketing buzzword. Reports can be analytics, right? The interesting thing, which they did, the first thing they, they, they crossed the chasm between OTP and all up, right? In the same database, right? So major engineering feed very much what customers want and it's all about creating Bellevue for customers, which, which I think is the part why they go into the multi-cloud and why they add these capabilities. And they certainly with the AI capabilities, it's kind of like getting it into an autonomous field, self-driving field now with the lake cost capabilities and meeting customers where they are, like Mark has talked about the e risk costs in the cloud. So that that's a significant advantage, creating value for customers and that's what at the end of the day matters. >>And I believe strongly that long term it's gonna be ones who create better value for customers who will get more of their money From that perspective, why then take them so long? I think it's a great question. I think largely he mentioned the gentleman Nial, it's largely to who leads a product. I used to build products too, so maybe I'm a little fooling myself here, but that made the difference in my view, right? So since he's been charged, he's been building things faster than the rest of the competition, than my SQL space, which in hindsight we thought was a hot and smoking innovation phase. It kind of like was a little self complacent when it comes to the traditional borders of where, where people think, where things are separated between OTP and ola or as an example of adjacent support, right? Structured documents, whereas unstructured documents or databases and all of that has been collapsed and brought together for building a more powerful database for customers. >>So I mean it's certainly, you know, when, when Oracle talks about the competitors, you know, the competitors are in the, I always say they're, if the Oracle talks about you and knows you're doing well, so they talk a lot about aws, talk a little bit about Snowflake, you know, sort of Google, they have partnerships with Azure, but, but in, so I'm presuming that the response in MySQL heatwave was really in, in response to what they were seeing from those big competitors. But then you had Maria DB coming out, you know, the day that that Oracle acquired Sun and, and launching and going after the MySQL base. So it's, I'm, I'm interested and we'll talk about this later and what you guys think AWS and Google and Azure and Snowflake and how they're gonna respond. But, but before I do that, Ron, I want to ask you, you, you, you can get, you know, pretty technical and you've probably seen the benchmarks. >>I know you have Oracle makes a big deal out of it, publishes its benchmarks, makes some transparent on on GI GitHub. Larry Ellison talked about this in his keynote at Cloud World. What are the benchmarks show in general? I mean, when you, when you're new to the market, you gotta have a story like Mark was saying, you gotta be two x you know, the performance at half the cost or you better be or you're not gonna get any market share. So, and, and you know, oftentimes companies don't publish market benchmarks when they're leading. They do it when they, they need to gain share. So what do you make of the benchmarks? Have their, any results that were surprising to you? Have, you know, they been challenged by the competitors. Is it just a bunch of kind of desperate bench marketing to make some noise in the market or you know, are they real? What's your view? >>Well, from my perspective, I think they have the validity. And to your point, I believe that when it comes to competitor responses, that has not really happened. Nobody has like pulled down the information that's on GitHub and said, Oh, here are our price performance results. And they counter oracles. In fact, I think part of the reason why that hasn't happened is that there's the risk if Oracle's coming out and saying, Hey, we can deliver 17 times better query performance using our capabilities versus say, Snowflake when it comes to, you know, the Lakehouse platform and Snowflake turns around and says it's actually only 15 times better during performance, that's not exactly an effective maneuver. And so I think this is really to oracle's credit and I think it's refreshing because these differentiators are significant. We're not talking, you know, like 1.2% differences. We're talking 17 fold differences, we're talking six fold differences depending on, you know, where the spotlight is being shined and so forth. >>And so I think this is actually something that is actually too good to believe initially at first blush. If I'm a cloud database decision maker, I really have to prioritize this. I really would know, pay a lot more attention to this. And that's why I posed the question to Oracle and others like, okay, if these differentiators are so significant, why isn't the needle moving a bit more? And it's for, you know, some of the usual reasons. One is really deep discounting coming from, you know, the other players that's really kind of, you know, marketing 1 0 1, this is something you need to do when there's a real competitive threat to keep, you know, a customer in your own customer base. Plus there is the usual fear and uncertainty about moving from one platform to another. But I think, you know, the traction, the momentum is, is shifting an Oracle's favor. I think we saw that in the Q1 efforts, for example, where Oracle cloud grew 44% and that it generated, you know, 4.8 billion and revenue if I recall correctly. And so, so all these are demonstrating that's Oracle is making, I think many of the right moves, publishing these figures for anybody to look at from their own perspective is something that is, I think, good for the market and I think it's just gonna continue to pay dividends for Oracle down the horizon as you know, competition intens plots. So if I were in, >>Dave, can I, Dave, can I interject something and, and what Ron just said there? Yeah, please go ahead. A couple things here, one discounting, which is a common practice when you have a real threat, as Ron pointed out, isn't going to help much in this situation simply because you can't discount to the point where you improve your performance and the performance is a huge differentiator. You may be able to get your price down, but the problem that most of them have is they don't have an integrated product service. They don't have an integrated O L T P O L A P M L N data lake. Even if you cut out two of them, they don't have any of them integrated. They have multiple services that are required separate integration and that can't be overcome with discounting. And the, they, you have to pay for each one of these. And oh, by the way, as you grow, the discounts go away. So that's a, it's a minor important detail. >>So, so that's a TCO question mark, right? And I know you look at this a lot, if I had that kind of price performance advantage, I would be pounding tco, especially if I need two separate databases to do the job. That one can do, that's gonna be, the TCO numbers are gonna be off the chart or maybe down the chart, which you want. Have you looked at this and how does it compare with, you know, the big cloud guys, for example, >>I've looked at it in depth, in fact, I'm working on another TCO on this arena, but you can find it on Wiki bod in which I compared TCO for MySEQ Heat wave versus Aurora plus Redshift plus ML plus Blue. I've compared it against gcps services, Azure services, Snowflake with other services. And there's just no comparison. The, the TCO differences are huge. More importantly, thefor, the, the TCO per performance is huge. We're talking in some cases multiple orders of magnitude, but at least an order of magnitude difference. So discounting isn't gonna help you much at the end of the day, it's only going to lower your cost a little, but it doesn't improve the automation, it doesn't improve the performance, it doesn't improve the time to insight, it doesn't improve all those things that you want out of a database or multiple databases because you >>Can't discount yourself to a higher value proposition. >>So what about, I wonder ho if you could chime in on the developer angle. You, you followed that, that market. How do these innovations from heatwave, I think you used the term developer velocity. I've heard you used that before. Yeah, I mean, look, Oracle owns Java, okay, so it, it's, you know, most popular, you know, programming language in the world, blah, blah blah. But it does it have the, the minds and hearts of, of developers and does, where does heatwave fit into that equation? >>I think heatwave is gaining quickly mindshare on the developer side, right? It's not the traditional no sequel database which grew up, there's a traditional mistrust of oracles to developers to what was happening to open source when gets acquired. Like in the case of Oracle versus Java and where my sql, right? And, but we know it's not a good competitive strategy to, to bank on Oracle screwing up because it hasn't worked not on Java known my sequel, right? And for developers, it's, once you get to know a technology product and you can do more, it becomes kind of like a Swiss army knife and you can build more use case, you can build more powerful applications. That's super, super important because you don't have to get certified in multiple databases. You, you are fast at getting things done, you achieve fire, develop velocity, and the managers are happy because they don't have to license more things, send you to more trainings, have more risk of something not being delivered, right? >>So it's really the, we see the suite where this best of breed play happening here, which in general was happening before already with Oracle's flagship database. Whereas those Amazon as an example, right? And now the interesting thing is every step away Oracle was always a one database company that can be only one and they're now generally talking about heat web and that two database company with different market spaces, but same value proposition of integrating more things very, very quickly to have a universal database that I call, they call the converge database for all the needs of an enterprise to run certain application use cases. And that's what's attractive to developers. >>It's, it's ironic isn't it? I mean I, you know, the rumor was the TK Thomas Curian left Oracle cuz he wanted to put Oracle database on other clouds and other places. And maybe that was the rift. Maybe there was, I'm sure there was other things, but, but Oracle clearly is now trying to expand its Tam Ron with, with heatwave into aws, into Azure. How do you think Oracle's gonna do, you were at a cloud world, what was the sentiment from customers and the independent analyst? Is this just Oracle trying to screw with the competition, create a little diversion? Or is this, you know, serious business for Oracle? What do you think? >>No, I think it has lakes. I think it's definitely, again, attriting to Oracle's overall ability to differentiate not only my SQL heat wave, but its overall portfolio. And I think the fact that they do have the alliance with the Azure in place, that this is definitely demonstrating their commitment to meeting the multi-cloud needs of its customers as well as what we pointed to in terms of the fact that they're now offering, you know, MySQL capabilities within AWS natively and that it can now perform AWS's own offering. And I think this is all demonstrating that Oracle is, you know, not letting up, they're not resting on its laurels. That's clearly we are living in a multi-cloud world, so why not just make it more easy for customers to be able to use cloud databases according to their own specific, specific needs. And I think, you know, to holder's point, I think that definitely lines with being able to bring on more application developers to leverage these capabilities. >>I think one important announcement that's related to all this was the JSON relational duality capabilities where now it's a lot easier for application developers to use a language that they're very familiar with a JS O and not have to worry about going into relational databases to store their J S O N application coding. So this is, I think an example of the innovation that's enhancing the overall Oracle portfolio and certainly all the work with machine learning is definitely paying dividends as well. And as a result, I see Oracle continue to make these inroads that we pointed to. But I agree with Mark, you know, the short term discounting is just a stall tag. This is not denying the fact that Oracle is being able to not only deliver price performance differentiators that are dramatic, but also meeting a wide range of needs for customers out there that aren't just limited device performance consideration. >>Being able to support multi-cloud according to customer needs. Being able to reach out to the application developer community and address a very specific challenge that has plagued them for many years now. So bring it all together. Yeah, I see this as just enabling Oracles who ring true with customers. That the customers that were there were basically all of them, even though not all of them are going to be saying the same things, they're all basically saying positive feedback. And likewise, I think the analyst community is seeing this. It's always refreshing to be able to talk to customers directly and at Oracle cloud there was a litany of them and so this is just a difference maker as well as being able to talk to strategic partners. The nvidia, I think partnerships also testament to Oracle's ongoing ability to, you know, make the ecosystem more user friendly for the customers out there. >>Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able to be best of breed. That's the kind of surprising thing that I'm hearing about, about heatwave. I want to, I want to talk about Lake House because when I think of Lake House, I think data bricks, and to my knowledge data bricks hasn't been in the sites of Oracle yet. Maybe they're next, but, but Oracle claims that MySQL, heatwave, Lakehouse is a breakthrough in terms of capacity and performance. Mark, what are your thoughts on that? Can you double click on, on Lakehouse Oracle's claims for things like query performance and data loading? What does it mean for the market? Is Oracle really leading in, in the lake house competitive landscape? What are your thoughts? >>Well, but name in the game is what are the problems you're solving for the customer? More importantly, are those problems urgent or important? If they're urgent, customers wanna solve 'em. Now if they're important, they might get around to them. So you look at what they're doing with Lake House or previous to that machine learning or previous to that automation or previous to that O L A with O ltp and they're merging all this capability together. If you look at Snowflake or data bricks, they're tacking one problem. You look at MyQ heat wave, they're tacking multiple problems. So when you say, yeah, their queries are much better against the lake house in combination with other analytics in combination with O ltp and the fact that there are no ETLs. So you're getting all this done in real time. So it's, it's doing the query cross, cross everything in real time. >>You're solving multiple user and developer problems, you're increasing their ability to get insight faster, you're having shorter response times. So yeah, they really are solving urgent problems for customers. And by putting it where the customer lives, this is the brilliance of actually being multicloud. And I know I'm backing up here a second, but by making it work in AWS and Azure where people already live, where they already have applications, what they're saying is, we're bringing it to you. You don't have to come to us to get these, these benefits, this value overall, I think it's a brilliant strategy. I give Nip and Argo wallet a huge, huge kudos for what he's doing there. So yes, what they're doing with the lake house is going to put notice on data bricks and Snowflake and everyone else for that matter. Well >>Those are guys that whole ago you, you and I have talked about this. Those are, those are the guys that are doing sort of the best of breed. You know, they're really focused and they, you know, tend to do well at least out of the gate. Now you got Oracle's converged philosophy, obviously with Oracle database. We've seen that now it's kicking in gear with, with heatwave, you know, this whole thing of sweets versus best of breed. I mean the long term, you know, customers tend to migrate towards suite, but the new shiny toy tends to get the growth. How do you think this is gonna play out in cloud database? >>Well, it's the forever never ending story, right? And in software right suite, whereas best of breed and so far in the long run suites have always won, right? So, and sometimes they struggle again because the inherent problem of sweets is you build something larger, it has more complexity and that means your cycles to get everything working together to integrate the test that roll it out, certify whatever it is, takes you longer, right? And that's not the case. It's a fascinating part of what the effort around my SQL heat wave is that the team is out executing the previous best of breed data, bringing us something together. Now if they can maintain that pace, that's something to to, to be seen. But it, the strategy, like what Mark was saying, bring the software to the data is of course interesting and unique and totally an Oracle issue in the past, right? >>Yeah. But it had to be in your database on oci. And but at, that's an interesting part. The interesting thing on the Lake health side is, right, there's three key benefits of a lakehouse. The first one is better reporting analytics, bring more rich information together, like make the, the, the case for silicon angle, right? We want to see engagements for this video, we want to know what's happening. That's a mixed transactional video media use case, right? Typical Lakehouse use case. The next one is to build more rich applications, transactional applications which have video and these elements in there, which are the engaging one. And the third one, and that's where I'm a little critical and concerned, is it's really the base platform for artificial intelligence, right? To run deep learning to run things automatically because they have all the data in one place can create in one way. >>And that's where Oracle, I know that Ron talked about Invidia for a moment, but that's where Oracle doesn't have the strongest best story. Nonetheless, the two other main use cases of the lake house are very strong, very well only concern is four 50 terabyte sounds long. It's an arbitrary limitation. Yeah, sounds as big. So for the start, and it's the first word, they can make that bigger. You don't want your lake house to be limited and the terabyte sizes or any even petabyte size because you want to have the certainty. I can put everything in there that I think it might be relevant without knowing what questions to ask and query those questions. >>Yeah. And you know, in the early days of no schema on right, it just became a mess. But now technology has evolved to allow us to actually get more value out of that data. Data lake. Data swamp is, you know, not much more, more, more, more logical. But, and I want to get in, in a moment, I want to come back to how you think the competitors are gonna respond. Are they gonna have to sort of do a more of a converged approach? AWS in particular? But before I do, Ron, I want to ask you a question about autopilot because I heard Larry Ellison's keynote and he was talking about how, you know, most security issues are human errors with autonomy and autonomous database and things like autopilot. We take care of that. It's like autonomous vehicles, they're gonna be safer. And I went, well maybe, maybe someday. So Oracle really tries to emphasize this, that every time you see an announcement from Oracle, they talk about new, you know, autonomous capabilities. It, how legit is it? Do people care? What about, you know, what's new for heatwave Lakehouse? How much of a differentiator, Ron, do you really think autopilot is in this cloud database space? >>Yeah, I think it will definitely enhance the overall proposition. I don't think people are gonna buy, you know, lake house exclusively cause of autopilot capabilities, but when they look at the overall picture, I think it will be an added capability bonus to Oracle's benefit. And yeah, I think it's kind of one of these age old questions, how much do you automate and what is the bounce to strike? And I think we all understand with the automatic car, autonomous car analogy that there are limitations to being able to use that. However, I think it's a tool that basically every organization out there needs to at least have or at least evaluate because it goes to the point of it helps with ease of use, it helps make automation more balanced in terms of, you know, being able to test, all right, let's automate this process and see if it works well, then we can go on and switch on on autopilot for other processes. >>And then, you know, that allows, for example, the specialists to spend more time on business use cases versus, you know, manual maintenance of, of the cloud database and so forth. So I think that actually is a, a legitimate value proposition. I think it's just gonna be a case by case basis. Some organizations are gonna be more aggressive with putting automation throughout their processes throughout their organization. Others are gonna be more cautious. But it's gonna be, again, something that will help the overall Oracle proposition. And something that I think will be used with caution by many organizations, but other organizations are gonna like, hey, great, this is something that is really answering a real problem. And that is just easing the use of these databases, but also being able to better handle the automation capabilities and benefits that come with it without having, you know, a major screwup happened and the process of transitioning to more automated capabilities. >>Now, I didn't attend cloud world, it's just too many red eyes, you know, recently, so I passed. But one of the things I like to do at those events is talk to customers, you know, in the spirit of the truth, you know, they, you know, you'd have the hallway, you know, track and to talk to customers and they say, Hey, you know, here's the good, the bad and the ugly. So did you guys, did you talk to any customers my SQL Heatwave customers at, at cloud world? And and what did you learn? I don't know, Mark, did you, did you have any luck and, and having some, some private conversations? >>Yeah, I had quite a few private conversations. The one thing before I get to that, I want disagree with one point Ron made, I do believe there are customers out there buying the heat wave service, the MySEQ heat wave server service because of autopilot. Because autopilot is really revolutionary in many ways in the sense for the MySEQ developer in that it, it auto provisions, it auto parallel loads, IT auto data places it auto shape predictions. It can tell you what machine learning models are going to tell you, gonna give you your best results. And, and candidly, I've yet to meet a DBA who didn't wanna give up pedantic tasks that are pain in the kahoo, which they'd rather not do and if it's long as it was done right for them. So yes, I do think people are buying it because of autopilot and that's based on some of the conversations I had with customers at Oracle Cloud World. >>In fact, it was like, yeah, that's great, yeah, we get fantastic performance, but this really makes my life easier and I've yet to meet a DBA who didn't want to make their life easier. And it does. So yeah, I've talked to a few of them. They were excited. I asked them if they ran into any bugs, were there any difficulties in moving to it? And the answer was no. In both cases, it's interesting to note, my sequel is the most popular database on the planet. Well, some will argue that it's neck and neck with SQL Server, but if you add in Mariah DB and ProCon db, which are forks of MySQL, then yeah, by far and away it's the most popular. And as a result of that, everybody for the most part has typically a my sequel database somewhere in their organization. So this is a brilliant situation for anybody going after MyQ, but especially for heat wave. And the customers I talk to love it. I didn't find anybody complaining about it. And >>What about the migration? We talked about TCO earlier. Did your t does your TCO analysis include the migration cost or do you kind of conveniently leave that out or what? >>Well, when you look at migration costs, there are different kinds of migration costs. By the way, the worst job in the data center is the data migration manager. Forget it, no other job is as bad as that one. You get no attaboys for doing it. Right? And then when you screw up, oh boy. So in real terms, anything that can limit data migration is a good thing. And when you look at Data Lake, that limits data migration. So if you're already a MySEQ user, this is a pure MySQL as far as you're concerned. It's just a, a simple transition from one to the other. You may wanna make sure nothing broke and every you, all your tables are correct and your schema's, okay, but it's all the same. So it's a simple migration. So it's pretty much a non-event, right? When you migrate data from an O LTP to an O L A P, that's an ETL and that's gonna take time. >>But you don't have to do that with my SQL heat wave. So that's gone when you start talking about machine learning, again, you may have an etl, you may not, depending on the circumstances, but again, with my SQL heat wave, you don't, and you don't have duplicate storage, you don't have to copy it from one storage container to another to be able to be used in a different database, which by the way, ultimately adds much more cost than just the other service. So yeah, I looked at the migration and again, the users I talked to said it was a non-event. It was literally moving from one physical machine to another. If they had a new version of MySEQ running on something else and just wanted to migrate it over or just hook it up or just connect it to the data, it worked just fine. >>Okay, so every day it sounds like you guys feel, and we've certainly heard this, my colleague David Foyer, the semi-retired David Foyer was always very high on heatwave. So I think you knows got some real legitimacy here coming from a standing start, but I wanna talk about the competition, how they're likely to respond. I mean, if your AWS and you got heatwave is now in your cloud, so there's some good aspects of that. The database guys might not like that, but the infrastructure guys probably love it. Hey, more ways to sell, you know, EC two and graviton, but you're gonna, the database guys in AWS are gonna respond. They're gonna say, Hey, we got Redshift, we got aqua. What's your thoughts on, on not only how that's gonna resonate with customers, but I'm interested in what you guys think will a, I never say never about aws, you know, and are they gonna try to build, in your view a converged Oola and o LTP database? You know, Snowflake is taking an ecosystem approach. They've added in transactional capabilities to the portfolio so they're not standing still. What do you guys see in the competitive landscape in that regard going forward? Maybe Holger, you could start us off and anybody else who wants to can chime in, >>Happy to, you mentioned Snowflake last, we'll start there. I think Snowflake is imitating that strategy, right? That building out original data warehouse and the clouds tasking project to really proposition to have other data available there because AI is relevant for everybody. Ultimately people keep data in the cloud for ultimately running ai. So you see the same suite kind of like level strategy, it's gonna be a little harder because of the original positioning. How much would people know that you're doing other stuff? And I just, as a former developer manager of developers, I just don't see the speed at the moment happening at Snowflake to become really competitive to Oracle. On the flip side, putting my Oracle hat on for a moment back to you, Mark and Iran, right? What could Oracle still add? Because the, the big big things, right? The traditional chasms in the database world, they have built everything, right? >>So I, I really scratched my hat and gave Nipon a hard time at Cloud world say like, what could you be building? Destiny was very conservative. Let's get the Lakehouse thing done, it's gonna spring next year, right? And the AWS is really hard because AWS value proposition is these small innovation teams, right? That they build two pizza teams, which can be fit by two pizzas, not large teams, right? And you need suites to large teams to build these suites with lots of functionalities to make sure they work together. They're consistent, they have the same UX on the administration side, they can consume the same way, they have the same API registry, can't even stop going where the synergy comes to play over suite. So, so it's gonna be really, really hard for them to change that. But AWS super pragmatic. They're always by themselves that they'll listen to customers if they learn from customers suite as a proposition. I would not be surprised if AWS trying to bring things closer together, being morely together. >>Yeah. Well how about, can we talk about multicloud if, if, again, Oracle is very on on Oracle as you said before, but let's look forward, you know, half a year or a year. What do you think about Oracle's moves in, in multicloud in terms of what kind of penetration they're gonna have in the marketplace? You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at the, the Microsoft Azure deal. I think that's really interesting. I've, I've called it a little bit of early days of a super cloud. What impact do you think this is gonna have on, on the marketplace? But, but both. And think about it within Oracle's customer base, I have no doubt they'll do great there. But what about beyond its existing install base? What do you guys think? >>Ryan, do you wanna jump on that? Go ahead. Go ahead Ryan. No, no, no, >>That's an excellent point. I think it aligns with what we've been talking about in terms of Lakehouse. I think Lake House will enable Oracle to pull more customers, more bicycle customers onto the Oracle platforms. And I think we're seeing all the signs pointing toward Oracle being able to make more inroads into the overall market. And that includes garnishing customers from the leaders in, in other words, because they are, you know, coming in as a innovator, a an alternative to, you know, the AWS proposition, the Google cloud proposition that they have less to lose and there's a result they can really drive the multi-cloud messaging to resonate with not only their existing customers, but also to be able to, to that question, Dave's posing actually garnish customers onto their platform. And, and that includes naturally my sequel but also OCI and so forth. So that's how I'm seeing this playing out. I think, you know, again, Oracle's reporting is indicating that, and I think what we saw, Oracle Cloud world is definitely validating the idea that Oracle can make more waves in the overall market in this regard. >>You know, I, I've floated this idea of Super cloud, it's kind of tongue in cheek, but, but there, I think there is some merit to it in terms of building on top of hyperscale infrastructure and abstracting some of the, that complexity. And one of the things that I'm most interested in is industry clouds and an Oracle acquisition of Cerner. I was struck by Larry Ellison's keynote, it was like, I don't know, an hour and a half and an hour and 15 minutes was focused on healthcare transformation. Well, >>So vertical, >>Right? And so, yeah, so you got Oracle's, you know, got some industry chops and you, and then you think about what they're building with, with not only oci, but then you got, you know, MyQ, you can now run in dedicated regions. You got ADB on on Exadata cloud to customer, you can put that OnPrem in in your data center and you look at what the other hyperscalers are, are doing. I I say other hyperscalers, I've always said Oracle's not really a hyperscaler, but they got a cloud so they're in the game. But you can't get, you know, big query OnPrem, you look at outposts, it's very limited in terms of, you know, the database support and again, that that will will evolve. But now you got Oracle's got, they announced Alloy, we can white label their cloud. So I'm interested in what you guys think about these moves, especially the industry cloud. We see, you know, Walmart is doing sort of their own cloud. You got Goldman Sachs doing a cloud. Do you, you guys, what do you think about that and what role does Oracle play? Any thoughts? >>Yeah, let me lemme jump on that for a moment. Now, especially with the MyQ, by making that available in multiple clouds, what they're doing is this follows the philosophy they've had the past with doing cloud, a customer taking the application and the data and putting it where the customer lives. If it's on premise, it's on premise. If it's in the cloud, it's in the cloud. By making the mice equal heat wave, essentially a plug compatible with any other mice equal as far as your, your database is concern and then giving you that integration with O L A P and ML and Data Lake and everything else, then what you've got is a compelling offering. You're making it easier for the customer to use. So I look the difference between MyQ and the Oracle database, MyQ is going to capture market more market share for them. >>You're not gonna find a lot of new users for the Oracle debate database. Yeah, there are always gonna be new users, don't get me wrong, but it's not gonna be a huge growth. Whereas my SQL heatwave is probably gonna be a major growth engine for Oracle going forward. Not just in their own cloud, but in AWS and in Azure and on premise over time that eventually it'll get there. It's not there now, but it will, they're doing the right thing on that basis. They're taking the services and when you talk about multicloud and making them available where the customer wants them, not forcing them to go where you want them, if that makes sense. And as far as where they're going in the future, I think they're gonna take a page outta what they've done with the Oracle database. They'll add things like JSON and XML and time series and spatial over time they'll make it a, a complete converged database like they did with the Oracle database. The difference being Oracle database will scale bigger and will have more transactions and be somewhat faster. And my SQL will be, for anyone who's not on the Oracle database, they're, they're not stupid, that's for sure. >>They've done Jason already. Right. But I give you that they could add graph and time series, right. Since eat with, Right, Right. Yeah, that's something absolutely right. That's, that's >>A sort of a logical move, right? >>Right. But that's, that's some kid ourselves, right? I mean has worked in Oracle's favor, right? 10 x 20 x, the amount of r and d, which is in the MyQ space, has been poured at trying to snatch workloads away from Oracle by starting with IBM 30 years ago, 20 years ago, Microsoft and, and, and, and didn't work, right? Database applications are extremely sticky when they run, you don't want to touch SIM and grow them, right? So that doesn't mean that heat phase is not an attractive offering, but it will be net new things, right? And what works in my SQL heat wave heat phases favor a little bit is it's not the massive enterprise applications which have like we the nails like, like you might be only running 30% or Oracle, but the connections and the interfaces into that is, is like 70, 80% of your enterprise. >>You take it out and it's like the spaghetti ball where you say, ah, no I really don't, don't want to do all that. Right? You don't, don't have that massive part with the equals heat phase sequel kind of like database which are more smaller tactical in comparison, but still I, I don't see them taking so much share. They will be growing because of a attractive value proposition quickly on the, the multi-cloud, right? I think it's not really multi-cloud. If you give people the chance to run your offering on different clouds, right? You can run it there. The multi-cloud advantages when the Uber offering comes out, which allows you to do things across those installations, right? I can migrate data, I can create data across something like Google has done with B query Omni, I can run predictive models or even make iron models in different place and distribute them, right? And Oracle is paving the road for that, but being available on these clouds. But the multi-cloud capability of database which knows I'm running on different clouds that is still yet to be built there. >>Yeah. And >>That the problem with >>That, that's the super cloud concept that I flowed and I I've always said kinda snowflake with a single global instance is sort of, you know, headed in that direction and maybe has a league. What's the issue with that mark? >>Yeah, the problem with the, with that version, the multi-cloud is clouds to charge egress fees. As long as they charge egress fees to move data between clouds, it's gonna make it very difficult to do a real multi-cloud implementation. Even Snowflake, which runs multi-cloud, has to pass out on the egress fees of their customer when data moves between clouds. And that's really expensive. I mean there, there is one customer I talked to who is beta testing for them, the MySQL heatwave and aws. The only reason they didn't want to do that until it was running on AWS is the egress fees were so great to move it to OCI that they couldn't afford it. Yeah. Egress fees are the big issue but, >>But Mark the, the point might be you might wanna root query and only get the results set back, right was much more tinier, which been the answer before for low latency between the class A problem, which we sometimes still have but mostly don't have. Right? And I think in general this with fees coming down based on the Oracle general E with fee move and it's very hard to justify those, right? But, but it's, it's not about moving data as a multi-cloud high value use case. It's about doing intelligent things with that data, right? Putting into other places, replicating it, what I'm saying the same thing what you said before, running remote queries on that, analyzing it, running AI on it, running AI models on that. That's the interesting thing. Cross administered in the same way. Taking things out, making sure compliance happens. Making sure when Ron says I don't want to be American anymore, I want to be in the European cloud that is gets migrated, right? So tho those are the interesting value use case which are really, really hard for enterprise to program hand by hand by developers and they would love to have out of the box and that's yet the innovation to come to, we have to come to see. But the first step to get there is that your software runs in multiple clouds and that's what Oracle's doing so well with my SQL >>Guys. Amazing. >>Go ahead. Yeah. >>Yeah. >>For example, >>Amazing amount of data knowledge and, and brain power in this market. Guys, I really want to thank you for coming on to the cube. Ron Holger. Mark, always a pleasure to have you on. Really appreciate your time. >>Well all the last names we're very happy for Romanic last and moderator. Thanks Dave for moderating us. All right, >>We'll see. We'll see you guys around. Safe travels to all and thank you for watching this power panel, The Truth About My SQL Heat Wave on the cube. Your leader in enterprise and emerging tech coverage.

Published Date : Nov 1 2022

SUMMARY :

Always a pleasure to have you on. I think you just saw him at Oracle Cloud World and he's come on to describe this is doing, you know, Google is, you know, we heard Google Cloud next recently, They own somewhere between 30 to 50% depending on who you read migrate from one cloud to another and suddenly you have a very compelling offer. All right, so thank you for that. And they certainly with the AI capabilities, And I believe strongly that long term it's gonna be ones who create better value for So I mean it's certainly, you know, when, when Oracle talks about the competitors, So what do you make of the benchmarks? say, Snowflake when it comes to, you know, the Lakehouse platform and threat to keep, you know, a customer in your own customer base. And oh, by the way, as you grow, And I know you look at this a lot, to insight, it doesn't improve all those things that you want out of a database or multiple databases So what about, I wonder ho if you could chime in on the developer angle. they don't have to license more things, send you to more trainings, have more risk of something not being delivered, all the needs of an enterprise to run certain application use cases. I mean I, you know, the rumor was the TK Thomas Curian left Oracle And I think, you know, to holder's point, I think that definitely lines But I agree with Mark, you know, the short term discounting is just a stall tag. testament to Oracle's ongoing ability to, you know, make the ecosystem Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able So when you say, yeah, their queries are much better against the lake house in You don't have to come to us to get these, these benefits, I mean the long term, you know, customers tend to migrate towards suite, but the new shiny bring the software to the data is of course interesting and unique and totally an Oracle issue in And the third one, lake house to be limited and the terabyte sizes or any even petabyte size because you want keynote and he was talking about how, you know, most security issues are human I don't think people are gonna buy, you know, lake house exclusively cause of And then, you know, that allows, for example, the specialists to And and what did you learn? The one thing before I get to that, I want disagree with And the customers I talk to love it. the migration cost or do you kind of conveniently leave that out or what? And when you look at Data Lake, that limits data migration. So that's gone when you start talking about So I think you knows got some real legitimacy here coming from a standing start, So you see the same And you need suites to large teams to build these suites with lots of functionalities You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at Ryan, do you wanna jump on that? I think, you know, again, Oracle's reporting I think there is some merit to it in terms of building on top of hyperscale infrastructure and to customer, you can put that OnPrem in in your data center and you look at what the So I look the difference between MyQ and the Oracle database, MyQ is going to capture market They're taking the services and when you talk about multicloud and But I give you that they could add graph and time series, right. like, like you might be only running 30% or Oracle, but the connections and the interfaces into You take it out and it's like the spaghetti ball where you say, ah, no I really don't, global instance is sort of, you know, headed in that direction and maybe has a league. Yeah, the problem with the, with that version, the multi-cloud is clouds And I think in general this with fees coming down based on the Oracle general E with fee move Yeah. Guys, I really want to thank you for coming on to the cube. Well all the last names we're very happy for Romanic last and moderator. We'll see you guys around.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
MarkPERSON

0.99+

Ron HolgerPERSON

0.99+

RonPERSON

0.99+

Mark StammerPERSON

0.99+

IBMORGANIZATION

0.99+

Ron WestfallPERSON

0.99+

RyanPERSON

0.99+

AWSORGANIZATION

0.99+

DavePERSON

0.99+

WalmartORGANIZATION

0.99+

Larry EllisonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

AlibabaORGANIZATION

0.99+

OracleORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Holgar MuellerPERSON

0.99+

AmazonORGANIZATION

0.99+

Constellation ResearchORGANIZATION

0.99+

Goldman SachsORGANIZATION

0.99+

17 timesQUANTITY

0.99+

twoQUANTITY

0.99+

David FoyerPERSON

0.99+

44%QUANTITY

0.99+

1.2%QUANTITY

0.99+

4.8 billionQUANTITY

0.99+

JasonPERSON

0.99+

UberORGANIZATION

0.99+

Fu Chim ResearchORGANIZATION

0.99+

Dave AntePERSON

0.99+

Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst


 

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jess BorgmanPERSON

0.99+

RichardPERSON

0.99+

20 centsQUANTITY

0.99+

sixQUANTITY

0.99+

JustinPERSON

0.99+

Richard JarvisPERSON

0.99+

OracleORGANIZATION

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

JessPERSON

0.99+

pythonsTITLE

0.99+

seven yearsQUANTITY

0.99+

TodayDATE

0.99+

JavasTITLE

0.99+

TeradataORGANIZATION

0.99+

VMwareORGANIZATION

0.98+

millionsQUANTITY

0.98+

EVAsORGANIZATION

0.98+

JAKPERSON

0.98+

StarburstORGANIZATION

0.98+

bothQUANTITY

0.97+

10DATE

0.97+

12 years agoDATE

0.97+

StarbustTITLE

0.96+

todayDATE

0.95+

Apache icebergORGANIZATION

0.94+

GoogleORGANIZATION

0.93+

12 yearsQUANTITY

0.92+

single pointQUANTITY

0.92+

two worldsQUANTITY

0.92+

10QUANTITY

0.91+

HuduLOCATION

0.91+

UnixTITLE

0.9+

one thingQUANTITY

0.87+

trillions of recordsQUANTITY

0.83+

first data lakeQUANTITY

0.82+

StarburstTITLE

0.8+

PJIORGANIZATION

0.79+

years agoDATE

0.76+

IETITLE

0.75+

Lie 2TITLE

0.72+

many years agoDATE

0.72+

over a couple timesQUANTITY

0.7+

TCOORGANIZATION

0.7+

ParqueORGANIZATION

0.67+

Number twoQUANTITY

0.64+

KubernetesORGANIZATION

0.59+

a decadeQUANTITY

0.58+

plus yearsDATE

0.57+

AzureTITLE

0.57+

S3TITLE

0.55+

DeltaTITLE

0.54+

20QUANTITY

0.49+

lastDATE

0.48+

MohanPERSON

0.44+

ORCORGANIZATION

0.27+

Starburst The Data Lies FULL V2b


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Starburst The Data Lies FULL V1


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 20 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Breaking Analysis Further defining Supercloud W/ tech leaders VMware, Snowflake, Databricks & others


 

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante at our inaugural super cloud 22 event we further refined the concept of a super cloud iterating on the definition the salient attributes and some examples of what is and what is not a super cloud welcome to this week's wikibon cube insights powered by etr you know snowflake has always been what we feel is one of the strongest examples of a super cloud and in this breaking analysis from our studios in palo alto we unpack our interview with benoit de javille co-founder and president of products at snowflake and we test our super cloud definition on the company's data cloud platform and we're really looking forward to your feedback first let's examine how we defl find super cloudant very importantly one of the goals of super cloud 22 was to get the community's input on the definition and iterate on previous work super cloud is an emerging computing architecture that comprises a set of services which are abstracted from the underlying primitives of hyperscale clouds we're talking about services such as compute storage networking security and other native tooling like machine learning and developer tools to create a global system that spans more than one cloud super cloud as shown on this slide has five essential properties x number of deployment models and y number of service models we're looking for community input on x and y and on the first point as well so please weigh in and contribute now we've identified these five essential elements of a super cloud let's talk about these first the super cloud has to run its services on more than one cloud leveraging the cloud native tools offered by each of the cloud providers the builder of the super cloud platform is responsible for optimizing the underlying primitives of each cloud and optimizing for the specific needs be it cost or performance or latency or governance data sharing security etc but those primitives must be abstracted such that a common experience is delivered across the clouds for both users and developers the super cloud has a metadata intelligence layer that can maximize efficiency for the specific purpose of the super cloud i.e the purpose that the super cloud is intended for and it does so in a federated model and it includes what we call a super pass this is a prerequisite that is a purpose-built component and enables ecosystem partners to customize and monetize incremental services while at the same time ensuring that the common experiences exist across clouds now in terms of deployment models we'd really like to get more feedback on this piece but here's where we are so far based on the feedback we got at super cloud 22. we see three deployment models the first is one where a control plane may run on one cloud but supports data plane interactions with more than one other cloud the second model instantiates the super cloud services on each individual cloud and within regions and can support interactions across more than one cloud with a unified interface connecting those instantiations those instances to create a common experience and the third model superimposes its services as a layer or in the case of snowflake they call it a mesh on top of the cloud on top of the cloud providers region or regions with a single global instantiation a single global instantiation of those services which spans multiple cloud providers this is our understanding from a comfort the conversation with benoit dejaville as to how snowflake approaches its solutions and for now we're going to park the service models we need to more time to flesh that out and we'll propose something shortly for you to comment on now we peppered benoit dejaville at super cloud 22 to test how the snowflake data cloud aligns to our concepts and our definition let me also say that snowflake doesn't use the term data cloud they really want to respect and they want to denigrate the importance of their hyperscale partners nor do we but we do think the hyperscalers today anyway are building or not building what we call super clouds but they are but but people who bar are building super clouds are building on top of hyperscale clouds that is a prerequisite so here are the questions that we tested with snowflake first question how does snowflake architect its data cloud and what is its deployment model listen to deja ville talk about how snowflake has architected a single system play the clip there are several ways to do this you know uh super cloud as as you name them the way we we we picked is is to create you know one single system and that's very important right the the the um [Music] there are several ways right you can instantiate you know your solution uh in every region of a cloud and and you know potentially that region could be a ws that region could be gcp so you are indeed a multi-cloud solution but snowflake we did it differently we are really creating cloud regions which are superposed on top of the cloud provider you know region infrastructure region so we are building our regions but but where where it's very different is that each region of snowflake is not one in instantiation of our service our service is global by nature we can move data from one region to the other when you land in snowflake you land into one region but but you can grow from there and you can you know exist in multiple clouds at the same time and that's very important right it's not one single i mean different instantiation of a system is one single instantiation which covers many cloud regions and many cloud providers snowflake chose the most advanced level of our three deployment models dodgeville talked about too presumably so it could maintain maximum control and ensure that common experience like the iphone model next we probed about the technical enablers of the data cloud listen to deja ville talk about snow grid he uses the term mesh and then this can get confusing with the jamaicani's data mesh concept but listen to benoit's explanation well as i said you know first we start by building you know snowflake regions we have today furry region that spawn you know the world so it's a worldwide worldwide system with many regions but all these regions are connected together they are you know meshed together with our technology we name it snow grid and that makes it hard because you know regions you know azure region can talk to a ws region or gcp regions and and as a as a user of our cloud you you don't see really these regional differences that you know regions are in different you know potentially clown when you use snowflake you can exist your your presence as an organization can be in several regions several clouds if you want geographic and and and both geographic and cloud provider so i can share data irrespective of the the cloud and i'm in the snowflake data cloud is that correct i can do that today exactly and and that's very critical right what we wanted is to remove data silos and and when you instantiate a system in one single region and that system is locked in that region you cannot communicate with other parts of the world you are locking the data in one region right and we didn't want to do that we wanted you know data to be distributed the way customer wants it to be distributed across the world and potentially sharing data at world scale now maybe there are many ways to skin the other cat meaning perhaps if a platform does instantiate in multiple places there are ways to share data but this is how snowflake chose to approach the problem next question how do you deal with latency in this big global system this is really important to us because while snowflake has some really smart people working as engineers and and the like we don't think they've solved for the speed of light problem the best people working on it as we often joke listen to benoit deja ville's comments on this topic so yes and no the the way we do it it's very expensive to do that because generally if you want to join you know data which is in which are in different regions and different cloud it's going to be very expensive because you need to move you know data every time you join it so the way we do it is that you replicate the subset of data that you want to access from one region from other regions so you can create this data mesh but data is replicated to make it very cheap and very performant too and is the snow grid does that have the metadata intelligence yes to actually can you describe that a little bit yeah snow grid is both uh a way to to exchange you know metadata about so each region of snowflake knows about all the other regions of snowflake every time we create a new region diary you know the metadata is distributed over our data cloud not only you know region knows all the regions but knows you know every organization that exists in our clouds where this organization is where data can be replicated by this organization and then of course it's it's also used as a way to uh uh exchange data right so you can exchange you know beta by scale of data size and we just had i was just receiving an email from one of our customers who moved more than four petabytes of data cross-region cross you know cloud providers in you know few days and you know it's a lot of data so it takes you know some time to move but they were able to do that online completely online and and switch over you know to the diff to the other region which is failover is very important also so yes and no probably means typically no he says yes and no probably means no so it sounds like snowflake is selectively pulling small amounts of data and replicating it where necessary but you also heard him talk about the metadata layer which is one of the essential aspects of super cloud okay next we dug into security it's one of the most important issues and we think one of the hardest parts related to deploying super cloud so we've talked about how the cloud has become the first line of defense for the cso but now with multi-cloud you have multiple first lines of defense and that means multiple shared responsibility models and multiple tool sets from different cloud providers and an expanded threat surface so listen to benoit's explanation here please play the clip this is a great question uh security has always been the most important aspect of snowflake since day one right this is the question that every customer of ours has you know how you can you guarantee the security of my data and so we secure data really tightly in region we have several layers of security it starts by by encrypting it every data at rest and that's very important a lot of customers are not doing that right you hear these attacks for example on on cloud you know where someone left you know their buckets uh uh open and then you know you can access the data because it's a non-encrypted uh so we are encrypting everything at rest we are encrypting everything in transit so a region is very secure now you know you never from one region you never access data from another region in snowflake that's why also we replicate data now the replication of that data across region or the metadata for that matter is is really highly secure so snow grits ensure that everything is encrypted everything is you know we have multiple you know encryption keys and it's you know stored in hardware you know secure modules so we we we built you know snow grids such that it's secure and it allows very secure movement of data so when we heard this explanation we immediately went to the lowest common denominator question meaning when you think about how aws for instance deals with data in motion or data and rest it might be different from how another cloud provider deals with it so how does aws uh uh uh differences for example in the aws maturity model for various you know cloud capabilities you know let's say they've got a faster nitro or graviton does it do do you have to how does snowflake deal with that do they have to slow everything else down like imagine a caravan cruising you know across the desert so you know every truck can keep up let's listen it's a great question i mean of course our software is abstracting you know all the cloud providers you know infrastructure so that when you run in one region let's say aws or azure it doesn't make any difference as far as the applications are concerned and and this abstraction of course is a lot of work i mean really really a lot of work because it needs to be secure it needs to be performance and you know every cloud and it has you know to expose apis which are uniform and and you know cloud providers even though they have potentially the same concept let's say blob storage apis are completely different the way you know these systems are secure it's completely different the errors that you can get and and the retry you know mechanism is very different from one cloud to the other performance is also different we discovered that when we were starting to port our software and and and you know we had to completely rethink how to leverage blob storage in that cloud versus that cloud because just of performance too so we had you know for example to you know stripe data so all this work is work that's you know you don't need as an application because our vision really is that applications which are running in our data cloud can you know be abstracted of all this difference and and we provide all the services all the workload that this application need whether it's transactional access to data analytical access to data you know managing you know logs managing you know metrics all of these is abstracted too such that they are not you know tied to one you know particular service of one cloud and and distributing this application across you know many regions many cloud is very seamless so from that answer we know that snowflake takes care of everything but we really don't understand the performance implications in you know in that specific case but we feel pretty certain that the promises that snowflake makes around governance and security within their data sharing construct construct will be kept now another criterion that we've proposed for super cloud is a super pass layer to create a common developer experience and an enabler for ecosystem partners to monetize please play the clip let's listen we build it you know a custom build because because as you said you know what exists in one cloud might not exist in another cloud provider right so so we have to build you know on this all these this components that modern application mode and that application need and and and and that you know goes to machine learning as i say transactional uh analytical system and the entire thing so such that they can run in isolation basically and the objective is the developer experience will be identical across those clouds yes right the developers doesn't need to worry about cloud provider and actually our system we have we didn't talk about it but the marketplace that we have which allows actually to deliver we're getting there yeah okay now we're not going to go deep into ecosystem today we've talked about snowflakes strengths in this regard but snowflake they pretty much ticked all the boxes on our super cloud attributes and definition we asked benoit dejaville to confirm that this is all shipping and available today and he also gave us a glimpse of the future play the clip and we are still developing it you know the transactional you know unistore as we call it was announced in last summit so so they are still you know working properly but but but that's the vision right and and and that's important because we talk about the infrastructure right you mentioned a lot about storage and compute but it's not only that right when you think about application they need to use the transactional database they need to use an analytical system they need to use you know machine learning so you need to provide also all these services which are consistent across all the cloud providers so you can hear deja ville talking about expanding beyond taking advantage of the core infrastructure storage and networking et cetera and bringing intelligence to the data through machine learning and ai so of course there's more to come and there better be at this company's valuation despite the recent sharp pullback in a tightening fed environment okay so i know it's cliche but everyone's comparing snowflakes and data bricks databricks has been pretty vocal about its open source posture compared to snowflakes and it just so happens that we had aligotsy on at super cloud 22 as well he wasn't in studio he had to do remote because i guess he's presenting at an investor conference this week so we had to bring him in remotely now i didn't get to do this interview john furrier did but i listened to it and captured this clip about how data bricks sees super cloud and the importance of open source take a listen to goatzee yeah i mean let me start by saying we just we're big fans of open source we think that open source is a force in software that's going to continue for you know decades hundreds of years and it's going to slowly replace all proprietary code in its way we saw that you know it could do that with the most advanced technology windows you know proprietary operating system very complicated got replaced with linux so open source can pretty much do anything and what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse you know data lake machine learning real-time stack in open source and we're excited to be part of it for us delta lake is a very important project that really helps you standardize how you lay out your data in the cloud and with it comes a really important protocol called delta sharing that enables you in an open way actually for the first time ever share large data sets between organizations but it uses an open protocol so the great thing about that is you don't need to be a database customer you don't even like databricks you just need to use this open source project and you can now securely share data sets between organizations across clouds and it actually does so really efficiently just one copy of the data so you don't have to copy it if you're within the same cloud so the implication of ellie gotzi's comments is that databricks with delta sharing as john implied is playing a long game now i don't know if enough about the databricks architecture to comment in detail i got to do more research there so i reached out to my two analyst friends tony bear and sanji mohan to see what they thought because they cover these companies pretty closely here's what tony bear said quote i've viewed the divergent lake house strategies of data bricks and snowflake in the context of their roots prior to delta lake databrick's prime focus was the compute not the storage layer and more specifically they were a compute engine not a database snowflake approached from the opposite end of the pool as they originally fit the mold of the classic database company rather than a specific compute engine per se the lake house pushes both companies outside of their original comfort zones data bricks to storage snowflake to compute engine so it makes perfect sense for databricks to embrace the open source narrative at the storage layer and for snowflake to continue its walled garden approach but in the long run their strategies are already overlapping databricks is not a 100 open source company its practitioner experience has always been proprietary and now so is its sql query engine likewise snowflake has had to open up with the support of iceberg for open data lake format the question really becomes how serious snowflake will be in making iceberg a first-class citizen in its environment that is not necessarily officially branding a lake house but effectively is and likewise can databricks deliver the service levels associated with walled gardens through a more brute force approach that relies heavily on the query engine at the end of the day those are the key requirements that will matter to data bricks and snowflake customers end quote that was some deep thought by by tony thank you for that sanjay mohan added the following quote open source is a slippery slope people buy mobile phones based on open source android but it's not fully open similarly databricks delta lake was not originally fully open source and even today its photon execution engine is not we are always going to live in a hybrid world snowflake and databricks will support whatever model works best for them and their customers the big question is do customers care as deeply about which vendor has a higher degree of openness as we technology people do i believe customers evaluation criteria is far more nuanced than just to decipher each vendor's open source claims end quote okay so i had to ask dodgeville about their so-called wall garden approach and what their strategy is with apache iceberg here's what he said iceberg is is very important so just to to give some context iceberg is an open you know table format right which was you know first you know developed by netflix and netflix you know put it open source in the apache community so we embrace that's that open source standard because because it's widely used by by many um many you know companies and also many companies have you know really invested a lot of effort in building you know big data hadoop solution or data like solution and they want to use snowflake and they couldn't really use snowflake because all their data were in open you know formats so we are embracing icebergs to help these companies move through the cloud but why we have been relentless with direct access to data direct access to data is a little bit of a problem for us and and the reason is when you direct access to data now you have direct access to storage now you have to understand for example the specificity of one cloud versus the other so as soon as you start to have direct access to data you lose your you know your cloud diagnostic layer you don't access data with api when you have direct access to data it's very hard to secure data because you need to grant access direct access to tools which are not you know protected and you see a lot of you know hacking of of data you know because of that so so that was not you know direct access to data is not serving well our customers and that's why we have been relented to do that because it's it's cr it's it's not cloud diagnostic it's it's you you have to code that you have to you you you need a lot of intelligence while apis access so we want open apis that's that's i guess the way we embrace you know openness is is by open api versus you know you access directly data here's my take snowflake is hedging its bets because enough people care about open source that they have to have some open data format options and it's good optics and you heard benoit deja ville talk about the risks of directly accessing the data and the complexities it brings now is that maybe a little fud against databricks maybe but same can be said for ollie's comments maybe flooding the proprietaryness of snowflake but as both analysts pointed out open is a spectrum hey i remember unix used to equal open systems okay let's end with some etr spending data and why not compare snowflake and data bricks spending profiles this is an xy graph with net score or spending momentum on the y-axis and pervasiveness or overlap in the data set on the x-axis this is data from the january survey when snowflake was holding above 80 percent net score off the charts databricks was also very strong in the upper 60s now let's fast forward to this next chart and show you the july etr survey data and you can see snowflake has come back down to earth now remember anything above 40 net score is highly elevated so both companies are doing well but snowflake is well off its highs and data bricks has come down somewhat as well databricks is inching to the right snowflake rocketed to the right post its ipo and as we know databricks wasn't able to get to ipo during the covet bubble ali gotzi is at the morgan stanley ceo conference this week they got plenty of cash to withstand a long-term recession i'm told and they've started the message that they're a billion dollars in annualized revenue i'm not sure exactly what that means i've seen some numbers on their gross margins i'm not sure what that means i've seen some numbers on their net retention revenue or net revenue retention again i'll reserve judgment until we see an s1 but it's clear both of these companies have momentum and they're out competing in the market well as always be the ultimate arbiter different philosophies perhaps is it like democrats and republicans well it could be but they're both going after a solving data problem both companies are trying to help customers get more value out of their data and both companies are highly valued so they have to perform for their investors to paraphrase ralph nader the similarities may be greater than the differences okay that's it for today thanks to the team from palo alto for this awesome super cloud studio build alex myerson and ken shiffman are on production in the palo alto studios today kristin martin and sheryl knight get the word out to our community rob hoff is our editor-in-chief over at siliconangle thanks to all please check out etr.ai for all the survey data remember these episodes are all available as podcasts wherever you listen just search breaking analysis podcasts i publish each week on wikibon.com and siliconangle.com and you can email me at david.vellante at siliconangle.com or dm me at devellante or comment on my linkedin posts and please as i say etr has got some of the best survey data in the business we track it every quarter and really excited to be partners with them this is dave vellante for the cube insights powered by etr thanks for watching and we'll see you next time on breaking analysis [Music] you

Published Date : Aug 14 2022

SUMMARY :

and and the retry you know mechanism is

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
netflixORGANIZATION

0.99+

john furrierPERSON

0.99+

palo altoORGANIZATION

0.99+

tony bearPERSON

0.99+

bostonLOCATION

0.99+

sanji mohanPERSON

0.99+

ken shiffmanPERSON

0.99+

bothQUANTITY

0.99+

todayDATE

0.99+

ellie gotziPERSON

0.99+

VMwareORGANIZATION

0.99+

SnowflakeORGANIZATION

0.99+

siliconangle.comOTHER

0.99+

more than four petabytesQUANTITY

0.99+

first pointQUANTITY

0.99+

kristin martinPERSON

0.99+

both companiesQUANTITY

0.99+

first questionQUANTITY

0.99+

rob hoffPERSON

0.99+

more than oneQUANTITY

0.99+

second modelQUANTITY

0.98+

alex myersonPERSON

0.98+

third modelQUANTITY

0.98+

one regionQUANTITY

0.98+

one copyQUANTITY

0.98+

one regionQUANTITY

0.98+

five essential elementsQUANTITY

0.98+

androidTITLE

0.98+

100QUANTITY

0.98+

first lineQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

sherylPERSON

0.98+

more than one cloudQUANTITY

0.98+

firstQUANTITY

0.98+

iphoneCOMMERCIAL_ITEM

0.98+

super cloud 22EVENT

0.98+

each cloudQUANTITY

0.98+

eachQUANTITY

0.97+

sanjay mohanPERSON

0.97+

johnPERSON

0.97+

republicansORGANIZATION

0.97+

this weekDATE

0.97+

hundreds of yearsQUANTITY

0.97+

siliconangleORGANIZATION

0.97+

each weekQUANTITY

0.97+

data lake houseORGANIZATION

0.97+

one single regionQUANTITY

0.97+

januaryDATE

0.97+

dave vellantePERSON

0.96+

each regionQUANTITY

0.96+

oneQUANTITY

0.96+

dave vellantePERSON

0.96+

tonyPERSON

0.96+

above 80 percentQUANTITY

0.95+

more than one cloudQUANTITY

0.95+

more than one cloudQUANTITY

0.95+

data lakeORGANIZATION

0.95+

five essential propertiesQUANTITY

0.95+

democratsORGANIZATION

0.95+

first timeQUANTITY

0.95+

julyDATE

0.94+

linuxTITLE

0.94+

etrORGANIZATION

0.94+

devellanteORGANIZATION

0.93+

dodgevilleORGANIZATION

0.93+

each vendorQUANTITY

0.93+

super cloud 22ORGANIZATION

0.93+

delta lakeORGANIZATION

0.92+

three deployment modelsQUANTITY

0.92+

first linesQUANTITY

0.92+

dejavilleLOCATION

0.92+

day oneQUANTITY

0.92+

Starburst Panel Q2


 

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

sixQUANTITY

0.99+

OracleORGANIZATION

0.99+

Richard JarvisPERSON

0.99+

20 centsQUANTITY

0.99+

20%QUANTITY

0.99+

Kurt MonashPERSON

0.99+

P Sanji MohanPERSON

0.99+

TodayDATE

0.99+

seven yearsQUANTITY

0.99+

pythonsTITLE

0.99+

TeradataORGANIZATION

0.99+

JAKPERSON

0.99+

JavasTITLE

0.99+

10DATE

0.99+

todayDATE

0.98+

StarbustTITLE

0.98+

StarburstORGANIZATION

0.97+

VMwareORGANIZATION

0.97+

bothQUANTITY

0.97+

12 years agoDATE

0.96+

single pointQUANTITY

0.96+

millions of hoursQUANTITY

0.95+

10QUANTITY

0.93+

UnixTITLE

0.92+

12 yearsQUANTITY

0.92+

GoogleORGANIZATION

0.9+

two worldsQUANTITY

0.9+

DYORGANIZATION

0.87+

first data lakeQUANTITY

0.86+

HuduLOCATION

0.85+

trillionsQUANTITY

0.85+

one thingQUANTITY

0.83+

many years agoDATE

0.79+

Apache icebergORGANIZATION

0.79+

over a couple timesQUANTITY

0.77+

emus healthORGANIZATION

0.75+

JiminPERSON

0.73+

StarburstTITLE

0.73+

years agoDATE

0.72+

AzureTITLE

0.7+

KubernetesORGANIZATION

0.67+

TCOORGANIZATION

0.64+

S3TITLE

0.62+

DeltaORGANIZATION

0.6+

plus yearsDATE

0.59+

Number twoQUANTITY

0.58+

a decadeQUANTITY

0.56+

icebergTITLE

0.47+

ParqueORGANIZATION

0.47+

lastDATE

0.47+

20QUANTITY

0.46+

Q2QUANTITY

0.31+

ORCORGANIZATION

0.27+

theCUBE Insights with Industry Analysts | Snowflake Summit 2022


 

>>Okay. Okay. We're back at Caesar's Forum. The Snowflake summit 2022. The cubes. Continuous coverage this day to wall to wall coverage. We're so excited to have the analyst panel here, some of my colleagues that we've done a number. You've probably seen some power panels that we've done. David McGregor is here. He's the senior vice president and research director at Ventana Research. To his left is Tony Blair, principal at DB Inside and my in the co host seat. Sanjeev Mohan Sanremo. Guys, thanks so much for coming on. I'm glad we can. Thank you. You're very welcome. I wasn't able to attend the analyst action because I've been doing this all all day, every day. But let me start with you, Dave. What have you seen? That's kind of interested you. Pluses, minuses. Concerns. >>Well, how about if I focus on what I think valuable to the customers of snowflakes and our research shows that the majority of organisations, the majority of people, do not have access to analytics. And so a couple of things they've announced I think address those are helped to address those issues very directly. So Snow Park and support for Python and other languages is a way for organisations to embed analytics into different business processes. And so I think that will be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most most people in the organisation or not, analysts, they're doing some line of business function. Their HR managers, their marketing people, their salespeople, their finance people right there, not sitting there mucking around in the data. They're doing a job and they need analytics in that job. So, >>Tony, I thank you. I've heard a lot of data mesh talk this week. It's kind of funny. Can't >>seem to get away from it. You >>can't see. It seems to be gathering momentum, but But what have you seen? That's been interesting. >>What I have noticed. Unfortunately, you know, because the rooms are too small, you just can't get into the data mesh sessions, so there's a lot of interest in it. Um, it's still very I don't think there's very much understanding of it, but I think the idea that you can put all the data in one place which, you know, to me, stuff like it seems to be kind of sort of in a way, it sounds like almost like the Enterprise Data warehouse, you know, Clouded Cloud Native Edition, you know, bring it all in one place again. Um, I think it's providing, sort of, You know, it's I think, for these folks that think this might be kind of like a a linchpin for that. I think there are several other things that actually that really have made a bigger impression on me. Actually, at this event, one is is basically is, um we watch their move with Eunice store. Um, and it's kind of interesting coming, you know, coming from mongo db last week. And I see it's like these two companies seem to be going converging towards the same place at different speeds. I think it's not like it's going to get there faster than Mongo for a number of different reasons, but I see like a number of common threads here. I mean, one is that Mongo was was was a company. It's always been towards developers. They need you know, start cultivating data, people, >>these guys going the other way. >>Exactly. Bingo. And the thing is that but they I think where they're converging is the idea of operational analytics and trying to serve all constituencies. The other thing, which which also in terms of serving, you know, multiple constituencies is how snowflake is laid out Snow Park and what I'm finding like. There's an interesting I economy. On one hand, you have this very ingrained integration of Anaconda, which I think is pretty ingenious. On the other hand, you speak, let's say, like, let's say the data robot folks and say, You know something our folks wanna work data signs us. We want to work in our environment and use snowflake in the background. So I see those kind of some interesting sort of cross cutting trends. >>So, Sandy, I mean, Frank Sullivan, we'll talk about there's definitely benefits into going into the walled garden. Yeah, I don't think we dispute that, but we see them making moves and adding more and more open source capabilities like Apache iceberg. Is that a Is that a move to sort of counteract the narrative that the data breaks is put out there. Is that customer driven? What's your take on that? >>Uh, primarily I think it is to contract this whole notion that once you move data into snowflake, it's a proprietary format. So I think that's how it started. But it's hugely beneficial to the customers to the users, because now, if you have large amounts of data in parquet files, you can leave it on s three. But then you using the the Apache iceberg table format. In a snowflake, you get all the benefits of snowflakes. Optimizer. So, for example, you get the, you know, the micro partitioning. You get the meta data. So, uh, in a single query, you can join. You can do select from a snowflake table union and select from iceberg table, and you can do store procedures, user defined functions. So I think they what they've done is extremely interesting. Uh, iceberg by itself still does not have multi table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache iceberg in a raw format, they don't have it. But snowflake does, >>right? There's hence the delta. And maybe that maybe that closes over time. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I mean, it reminds me of, like reinvent in 2013, you know? But then I'm struck by the complexity of the last big data era and a dupe and all the different tools. And is this different, or is it the sort of same wine new new bottle? You guys have any thoughts on that? >>I think it's different and I'll tell you why. I think it's different because it's based around sequel. So if back to Tony's point, these vendors are coming at this from different angles, right? You've got data warehouse vendors and you've got data lake vendors and they're all going to meet in the middle. So in your case, you're taught operational analytical. But the same thing is true with Data Lake and Data Warehouse and Snowflake no longer wants to be known as the Data Warehouse. There a data cloud and our research again. I like to base everything off of that. >>I love what our >>research shows that organisation Two thirds of organisations have sequel skills and one third have big data skills, so >>you >>know they're going to meet in the middle. But it sure is a lot easier to bring along those people who know sequel already to that midpoint than it is to bring big data people to remember. >>Mrr Odula, one of the founders of Cloudera, said to me one time, John Kerry and the Cube, that, uh, sequel is the killer app for a Yeah, >>the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. Animals really have thought out the ease of use, you know? I mean, they thought about I mean, from the get go, they thought of too thin to polls. One is ease of use, and the other is scale. And they've had. And that's basically, you know, I think very much differentiates it. I mean, who do have the scale, but it didn't have the ease of use. But don't I >>still need? Like, if I have, you know, governance from this vendor or, you know, data prep from, you know, don't I still have to have expertise? That's sort of distributed in those those worlds, right? I mean, go ahead. Yeah. >>So the way I see it is snowflake is adding more and more capabilities right into the database. So, for example, they've they've gone ahead and added security and privacy so you can now create policies and do even set level masking, dynamic masking. But most organisations have more than snowflake. So what we are starting to see all around here is that there's a whole series of data catalogue companies, a bunch of companies that are doing dynamic data masking security and governance data observe ability, which is not a space snowflake has gone into. So there's a whole ecosystem of companies that that is mushrooming, although, you know so they're using the native capabilities of snowflake, but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other, like relational databases, you can run these cross platform capabilities in that layer. So so that way, you know, snowflakes done a great job of enabling that ecosystem about >>the stream lit acquisition. Did you see anything here that indicated there making strong progress there? Are you excited about that? You're sceptical. Go ahead. >>And I think it's like the last mile. Essentially. In other words, it's like, Okay, you have folks that are basically that are very, very comfortable with tableau. But you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency, um, to San James Point. I think part of it, this kind of plays into it is what makes this different from the ado Pere is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously to make put this native obviously snowflake acquired stream. Let's so we can expect that's extremely capabilities are going to be native. >>And the other thing, too, about the Hadoop ecosystem is Claudia had to help fund all those different projects and got really, really spread thin. I want to ask you guys about this super cloud we use. Super Cloud is this sort of metaphor for the next wave of cloud. You've got infrastructure aws, azure, Google. It's not multi cloud, but you've got that infrastructure you're building a layer on top of it that hides the underlying complexities of the primitives and the a p I s. And you're adding new value in this case, the data cloud or super data cloud. And now we're seeing now is that snowflake putting forth the notion that they're adding a super path layer. You can now build applications that you can monetise, which to me is kind of exciting. It makes makes this platform even less discretionary. We had a lot of talk on Wall Street about discretionary spending, and that's not discretionary. If you're monetising it, um, what do you guys think about that? Is this something that's that's real? Is it just a figment of my imagination, or do you see a different way of coming any thoughts on that? >>So, in effect, they're trying to become a data operating system, right? And I think that's wonderful. It's ambitious. I think they'll experience some success with that. As I said, applications are important. That's a great way to deliver information. You can monetise them, so you know there's there's a good economic model around it. I think they will still struggle, however, with bringing everything together onto one platform. That's always the challenge. Can you become the platform that's hard, hard to predict? You know, I think this is This is pretty exciting, right? A lot of energy, a lot of large ecosystem. There is a network effect already. Can they succeed in being the only place where data exists? You know, I think that's going to be a challenge. >>I mean, the fact is, I mean, this is a classic best of breed versus the umbrella play. The thing is, this is nothing new. I mean, this is like the you know, the old days with enterprise applications were basically oracle and ASAP vacuumed up all these. You know, all these applications in their in their ecosystem, whereas with snowflake is. And if you look at the cloud, folks, the hyper scale is still building out their own portfolios as well. Some are, You know, some hyper skills are more partner friendly than others. What? What Snowflake is saying is that we're going to give all of you folks who basically are competing against the hyper skills in various areas like data catalogue and pipelines and all that sort of wonderful stuff will make you basically, you know, all equal citizens. You know the burden is on you to basically we will leave. We will lay out the A P. I s Well, we'll allow you to basically, you know, integrate natively to us so you can provide as good experience. But the but the onus is on your back. >>Should the ecosystem be concerned, as they were back to reinvent 2014 that Amazon was going to nibble away at them or or is it different? >>I find what they're doing is different. Uh, for example, data sharing. They were the first ones out the door were data sharing at a large scale. And then everybody has jumped in and said, Oh, we also do data sharing. All the hyper scholars came in. But now what snowflake has done is they've taken it to the next level. Now they're saying it's not just data sharing. It's up sharing and not only up sharing. You can stream the thing you can build, test deploy, and then monetise it. Make it discoverable through, you know, through your marketplace >>you can monetise it. >>Yes. Yeah, so So I I think what they're doing is they are taking it a step further than what hyper scale as they are doing. And because it's like what they said is becoming like the data operating system You log in and you have all of these different functionalities you can do in machine learning. Now you can do data quality. You can do data preparation and you can do Monetisation. Who do you >>think is snowflakes? Biggest competitor? What do you guys think? It's a hard question, isn't it? Because you're like because we all get the we separate computer from storage. We have a cloud data and you go, Okay, that's nice, >>but there's, like, a crack. I think >>there's uniqueness. I >>mean, put it this way. In the old days, it would have been you know, how you know the prime household names. I think today is the hyper scholars and the idea what I mean again, this comes down to the best of breed versus by, you know, get it all from one source. So where is your comfort level? Um, so I think they're kind. They're their co op a Titian the hyper scale. >>Okay, so it's not data bricks, because why they're smaller. >>Well, there is some okay now within the best of breed area. Yes, there is competition. The obvious is data bricks coming in from the data engineering angle. You know, basically the snowflake coming from, you know, from the from the data analyst angle. I think what? Another potential competitor. And I think Snowflake, basically, you know, admitted as such potentially is mongo >>DB. Yeah, >>Exactly. So I mean, yes, there are two different levels of sort >>of a on a longer term collision course. >>Exactly. Exactly. >>Sort of service now and in salesforce >>thing that was that we actually get when I say that a lot of people just laughed. I was like, No, you're kidding. There's no way. I said Excuse me, >>But then you see Mongo last week. We're adding some analytics capabilities and always been developers, as you say, and >>they trashed sequel. But yet they finally have started to write their first real sequel. >>We have M c M Q. Well, now we have a sequel. So what >>were those numbers, >>Dave? Two thirds. One third. >>So the hyper scale is but the hyper scale urz are you going to trust your hyper scale is to do your cross cloud. I mean, maybe Google may be I mean, Microsoft, perhaps aws not there yet. Right? I mean, how important is cross cloud, multi cloud Super cloud Whatever you want to call it What is your data? >>Shows? Cloud is important if I remember correctly. Our research shows that three quarters of organisations are operating in the cloud and 52% are operating across more than one cloud. So, uh, two thirds of the organisations are in the cloud are doing multi cloud, so that's pretty significant. And now they may be operating across clouds for different reasons. Maybe one application runs in one cloud provider. Another application runs another cloud provider. But I do think organisations want that leverage over the hyper scholars right they want they want to be able to tell the hyper scale. I'm gonna move my workloads over here if you don't give us a better rate. Uh, >>I mean, I I think you know, from a database standpoint, I think you're right. I mean, they are competing against some really well funded and you look at big Query barely, you know, solid platform Red shift, for all its faults, has really done an amazing job of moving forward. But to David's point, you know those to me in any way. Those hyper skills aren't going to solve that cross cloud cloud problem, right? >>Right. No, I'm certainly >>not as quickly. No. >>Or with as much zeal, >>right? Yeah, right across cloud. But we're gonna operate better on our >>Exactly. Yes. >>Yes. Even when we talk about multi cloud, the many, many definitions, like, you know, you can mean anything. So the way snowflake does multi cloud and the way mongo db two are very different. So a snowflake says we run on all the hyper scalar, but you have to replicate your data. What Mongo DB is claiming is that one cluster can have notes in multiple different clouds. That is right, you know, quite something. >>Yeah, right. I mean, again, you hit that. We got to go. But, uh, last question, um, snowflake undervalued, overvalued or just about right >>in the stock market or in customers. Yeah. Yeah, well, but, you know, I'm not sure that's the right question. >>That's the question I'm asking. You know, >>I'll say the question is undervalued or overvalued for customers, right? That's really what matters. Um, there's a different audience. Who cares about the investor side? Some of those are watching, but But I believe I believe that the from the customer's perspective, it's probably valued about right, because >>the reason I I ask it, is because it has so hyped. You had $100 billion value. It's the past service now is value, which is crazy for this student Now. It's obviously come back quite a bit below its IPO price. So But you guys are at the financial analyst meeting. Scarpelli laid out 2029 projections signed up for $10 billion.25 percent free time for 20% operating profit. I mean, they better be worth more than they are today. If they do >>that. If I If I see the momentum here this week, I think they are undervalued. But before this week, I probably would have thought there at the right evaluation, >>I would say they're probably more at the right valuation employed because the IPO valuation is just such a false valuation. So hyped >>guys, I could go on for another 45 minutes. Thanks so much. David. Tony Sanjeev. Always great to have you on. We'll have you back for sure. Having us. All right. Thank you. Keep it right there. Were wrapping up Day two and the Cube. Snowflake. Summit 2022. Right back. Mm. Mhm.

Published Date : Jun 16 2022

SUMMARY :

What have you seen? And I also think that the native applications as part of the I've heard a lot of data mesh talk this week. seem to get away from it. It seems to be gathering momentum, but But what have you seen? but I think the idea that you can put all the data in one place which, And the thing is that but they I think where they're converging is the idea of operational that the data breaks is put out there. So, for example, you get the, you know, the micro partitioning. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I think it's different and I'll tell you why. But it sure is a lot easier to bring along those people who know sequel already the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. you know, data prep from, you know, don't I still have to have expertise? So so that way, you know, snowflakes done a great job of Did you see anything here that indicated there making strong is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously I want to ask you guys about this super cloud we Can you become the platform that's hard, hard to predict? I mean, this is like the you know, the old days with enterprise applications You can stream the thing you can build, test deploy, You can do data preparation and you can do We have a cloud data and you go, Okay, that's nice, I think I In the old days, it would have been you know, how you know the prime household names. You know, basically the snowflake coming from, you know, from the from the data analyst angle. Exactly. I was like, No, But then you see Mongo last week. But yet they finally have started to write their first real sequel. So what One third. So the hyper scale is but the hyper scale urz are you going to trust your hyper scale But I do think organisations want that leverage I mean, I I think you know, from a database standpoint, I think you're right. not as quickly. But we're gonna operate better on our Exactly. the hyper scalar, but you have to replicate your data. I mean, again, you hit that. but, you know, I'm not sure that's the right question. That's the question I'm asking. that the from the customer's perspective, it's probably valued about right, So But you guys are at the financial analyst meeting. But before this week, I probably would have thought there at the right evaluation, I would say they're probably more at the right valuation employed because the IPO valuation is just such Always great to have you on.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

Frank SullivanPERSON

0.99+

TonyPERSON

0.99+

MicrosoftORGANIZATION

0.99+

DavePERSON

0.99+

Tony BlairPERSON

0.99+

Tony SanjeevPERSON

0.99+

AmazonORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

SandyPERSON

0.99+

David McGregorPERSON

0.99+

MongoORGANIZATION

0.99+

20%QUANTITY

0.99+

$100 billionQUANTITY

0.99+

Ventana ResearchORGANIZATION

0.99+

2013DATE

0.99+

last weekDATE

0.99+

52%QUANTITY

0.99+

Sanjeev Mohan SanremoPERSON

0.99+

more than one cloudQUANTITY

0.99+

2014DATE

0.99+

2029 projectionsQUANTITY

0.99+

two companiesQUANTITY

0.99+

45 minutesQUANTITY

0.99+

San James PointLOCATION

0.99+

$10 billion.25 percentQUANTITY

0.99+

one applicationQUANTITY

0.99+

OdulaPERSON

0.99+

John KerryPERSON

0.99+

PythonTITLE

0.99+

Summit 2022EVENT

0.99+

Data WarehouseORGANIZATION

0.99+

SnowflakeEVENT

0.98+

ScarpelliPERSON

0.98+

Data LakeORGANIZATION

0.98+

one platformQUANTITY

0.98+

this weekDATE

0.98+

todayDATE

0.98+

10 different tablesQUANTITY

0.98+

three quartersQUANTITY

0.98+

oneQUANTITY

0.97+

ApacheORGANIZATION

0.97+

Day twoQUANTITY

0.97+

DB InsideORGANIZATION

0.96+

one placeQUANTITY

0.96+

one sourceQUANTITY

0.96+

one thirdQUANTITY

0.96+

Snowflake Summit 2022EVENT

0.96+

One thirdQUANTITY

0.95+

two thirdsQUANTITY

0.95+

ClaudiaPERSON

0.94+

one timeQUANTITY

0.94+

one cloud providerQUANTITY

0.94+

Two thirdsQUANTITY

0.93+

theCUBEORGANIZATION

0.93+

data lakeORGANIZATION

0.92+

Snow ParkLOCATION

0.92+

ClouderaORGANIZATION

0.91+

two different levelsQUANTITY

0.91+

threeQUANTITY

0.91+

one clusterQUANTITY

0.89+

single queryQUANTITY

0.87+

awsORGANIZATION

0.84+

first onesQUANTITY

0.83+

Snowflake summit 2022EVENT

0.83+

azureORGANIZATION

0.82+

mongo dbORGANIZATION

0.82+

OneQUANTITY

0.81+

Eunice storeORGANIZATION

0.8+

wave ofEVENT

0.78+

cloudORGANIZATION

0.77+

first real sequelQUANTITY

0.77+

M c M Q.PERSON

0.76+

Red shiftORGANIZATION

0.74+

AnacondaORGANIZATION

0.73+

SnowflakeORGANIZATION

0.72+

ASAPORGANIZATION

0.71+

SnowORGANIZATION

0.68+

snowflakeTITLE

0.66+

ParkTITLE

0.64+

CubeCOMMERCIAL_ITEM

0.63+

ApacheTITLE

0.63+

MrrPERSON

0.63+

senior vice presidentPERSON

0.62+

Wall StreetORGANIZATION

0.6+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Kristen MartinPERSON

0.99+

Rob HoffPERSON

0.99+

Cheryl KnightPERSON

0.99+

Stephanie ChanPERSON

0.99+

Alex MyersonPERSON

0.99+

DavePERSON

0.99+

ZhamakPERSON

0.99+

oneQUANTITY

0.99+

Dave VellantePERSON

0.99+

AWSORGANIZATION

0.99+

10 lakesQUANTITY

0.99+

Sanji MohanPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Paul AndrewPERSON

0.99+

twoQUANTITY

0.99+

NetflixORGANIZATION

0.99+

Zhamak DehghaniPERSON

0.99+

Data Mesh: Delivering Data-Driven Value at ScaleTITLE

0.99+

BostonLOCATION

0.99+

OracleORGANIZATION

0.99+

14 plus yearsQUANTITY

0.99+

Palo AltoLOCATION

0.99+

two pointsQUANTITY

0.99+

siliconangle.comOTHER

0.99+

second layerQUANTITY

0.99+

2016DATE

0.99+

LinkedInORGANIZATION

0.99+

todayDATE

0.99+

SnowflakeORGANIZATION

0.99+

hundreds of lakesQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

david.vellante@siliconangle.comOTHER

0.99+

theCUBE StudiosORGANIZATION

0.98+

SQLTITLE

0.98+

one unitQUANTITY

0.98+

firstQUANTITY

0.98+

100 levelQUANTITY

0.98+

third pointQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

EuropeLOCATION

0.98+

three bucketsQUANTITY

0.98+

ETRORGANIZATION

0.98+

DevStackTITLE

0.97+

OneQUANTITY

0.97+

wikibon.comOTHER

0.97+

bothQUANTITY

0.97+

ThoughtworksORGANIZATION

0.96+

one setQUANTITY

0.96+

one streamQUANTITY

0.96+

IntuitORGANIZATION

0.95+

one wayQUANTITY

0.93+

two worldsQUANTITY

0.93+

HelloFreshORGANIZATION

0.93+

this weekDATE

0.93+

last nightDATE

0.91+

fourth oneQUANTITY

0.91+

SnowflakeTITLE

0.91+

two different modelsQUANTITY

0.91+

ML AnalyticsTITLE

0.91+

Breaking AnalysisTITLE

0.87+

two worldsQUANTITY

0.84+

Analyst Predictions 2022: The Future of Data Management


 

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
381 databasesQUANTITY

0.99+

2014DATE

0.99+

2022DATE

0.99+

2021DATE

0.99+

january of 2022DATE

0.99+

100 usersQUANTITY

0.99+

jamal daganiPERSON

0.99+

last weekDATE

0.99+

dave meningerPERSON

0.99+

sanjiPERSON

0.99+

second questionQUANTITY

0.99+

15 converged data storesQUANTITY

0.99+

dave vellantePERSON

0.99+

microsoftORGANIZATION

0.99+

threeQUANTITY

0.99+

sanjeevPERSON

0.99+

2023DATE

0.99+

15 data storesQUANTITY

0.99+

siliconangle.comOTHER

0.99+

last yearDATE

0.99+

sanjeev mohanPERSON

0.99+

sixQUANTITY

0.99+

twoQUANTITY

0.99+

carlPERSON

0.99+

tonyPERSON

0.99+

carl olufsenPERSON

0.99+

six yearsQUANTITY

0.99+

davidPERSON

0.99+

carlos specterPERSON

0.98+

both sidesQUANTITY

0.98+

2010sDATE

0.98+

first backlashQUANTITY

0.98+

five yearsQUANTITY

0.98+

todayDATE

0.98+

davePERSON

0.98+

eachQUANTITY

0.98+

three quartersQUANTITY

0.98+

firstQUANTITY

0.98+

single platformQUANTITY

0.98+

lake houseORGANIZATION

0.98+

bothQUANTITY

0.98+

this yearDATE

0.98+

dougPERSON

0.97+

one wordQUANTITY

0.97+

this yearDATE

0.97+

wikibon.comOTHER

0.97+

one platformQUANTITY

0.97+

39QUANTITY

0.97+

about 600 percentQUANTITY

0.97+

two analystsQUANTITY

0.97+

ten yearsQUANTITY

0.97+

single platformQUANTITY

0.96+

fiveQUANTITY

0.96+

oneQUANTITY

0.96+

three quartersQUANTITY

0.96+

californiaLOCATION

0.96+

googleORGANIZATION

0.96+

singleQUANTITY

0.95+

Tom Miller & Ankur Jain, Merkle | AWS re:Invent 2021


 

>>Okay, We're back at AWS Re. Invent. You're watching the >>cubes. Continuous coverage >>coverage. This is Day four. I think it's the first time it reinvent. We've done four days. This is our ninth year covering Reinvent. Tom Miller is here is the senior vice president of Alliances. And he's joined by Anchor Jane. Who's the global cloud? Practically practise lead at Merkel. Guys, good to see you. Thanks for coming on. Thank you, Tom. Tell us about Merkel. For those who might not be familiar with you. >>So Merkel is a customer experience management company. That is, um, under the Dentsu umbrella. Dense. Who is a global media agency? We represent one of the pillars which is global, our customer experience management. And they also have media and creative. And what Merkel does is provide that technology to help bring that creative and media together. They're a tech company. Yes. >>Okay, so there's some big big tail winds, changes, trends going on in the market. Obviously the pandemic. You know, the force marched to digital. Uh, there's regulation. What are some of the big waves that you guys are seeing that you're trying to ride? >>So what we're seeing is, uh we've got, uh, as a start. We've got a lot of existing databases with clients that are on Prem that we manage today within a sequel environment or so forth. And they need to move that to a cloud environment to be more flexible, more agile, provide them with more data to be able to follow that customer experience that they want with their clients, that they're all realising they need to be in a digital environment. And so that's a big push for us working with AWS and helping move our clients into that cloud environments. >>And you're relatively new to the ws world, right? Maybe you can talk >>about that anchor actually, as a partner. We may be new, but Merkel works with AWS has been working with AWS for over five years as a customer as a customer. So what we did was last year we formalise the relationship with us to be, uh, an advanced partner now. So we were part of the restock programme, basically which is a pool of very select partners. And Merkel comes in with the specialisation of marketing. So as Tom said, you know, we're part of, uh Dentsu umbrella are our core focuses on customer experience, transformation and how we do that Customer experience. Transformation is through digital transformation, data transformation. And that's where we see AWS being a very good partner to us to modernise the solutions that Martin can take to the market. >>So your on Prem databases is probably a lot of diversity on a lot of technical that when the cloud more agility, infinite resources do you have a tech stack? Are you more of an integrator? Right tool for the right job? Maybe you could describe >>your I can take that what time just described. So let me give you some perspective on what these databases are. These databases are essentially Markle, helping big brands 1400 Fortune 500 brands to organise their marketing ecosystem, especially Martek ecosystem. So these databases, they house customer touchpoints customer customer data from disparate sources, and they basically integrate that data in one central place and then bolt on analytics, data science, artificial intelligence, machine learning on top of it, helping them with those email campaigns or direct mail campaigns, social campaigns. So that's what these databases are all about, and and these databases currently set on Prem on Merkel's own data centre. And we have a huge opportunity to kind of take those databases and modernise them. Give all these ai ml type of capabilities advanced analytic capabilities to our customers by using AWS is the platform to kind of migrate. And you do that as a service. We do that as a service. >>Strategically, you're sort of transforming your business to help your customers transform their business right? Take away. It's it's classic. I mean, you really it's happening. This theme of, you know a W started with taking away the undifferentiated heavy lifting for infrastructure. Now you're seeing NASDAQ. Goldman Sachs. You guys in the media world essentially building your own clouds, right? That's the strategy. Yes, super clouds. We call >>them Super Cloud. Yeah, it's about helping our clients understand What is it they're trying to accomplish? And for the most part, they're trying to understand the customer journey where the customer is, how they're driving that experience with them and understanding that experience through the journey and doing that in the cloud makes it tremendously easier and more economical form. >>I was listening to the, uh, snowflake earnings call from last night and they were talking about, you know, a couple of big verticals, one being media and all. I keep talking about direct direct to consumer, right? You're hearing that a lot of media companies want to interact and build community directly. They don't want to necessarily. I mean, you don't want to go through a third party anymore if you don't have to, Technology is enabling that is that kind of the play here? >>Yes, Director Consumer is a huge player. Companies which were traditionally brick and mortar based or relied on a supply chain of dealers and distributors are now basically transforming themselves to be direct to consumer. They want to sell directly to the consumer. Personalisation comes becomes a big theme, especially indeed to see type of environment, because now those customers are expecting brands to know what's there like. What's their dislike? Which products which services are they interested in? So that's that's all kind of advanced analytics machine learning powered solutions. These are big data problems that all these brands are kind of trying to solve. That's where Merkel is partnering with AWS to bring all those technologies and and build those next generation solutions for access. So what kind >>of initiatives are you working >>on? So there are, like, 34 areas that we are working very closely with AWS number one. I would say Think about our marketers friend, you know, and they have a transformation like direct to consumer on the channel e commerce, these types of capabilities in mind. But they don't know where to start. What tools? What technologies will be part of that ecosystem. That's where Merkel provides consulting services to to give them a road map, give them recommendations on how to structure these big, large strategic initiatives. That's number one we are doing in partnership with AWS to reach out to our joint customers and help them transform those ecosystems. Number two as Tom mentioned migrations, helping chief data officers, chief technology officers, chief marketing officers modernise their environment by migrating them to cloud number three. Merkel has a solution called mercury, which is essentially all about customer identity. How do we identify a customer across multiple channels? We are Modernising all that solution of making that available on AWS marketplace for customers to actually easily use that solution. And number four, I would say, is helping them set up data foundation. That's through intelligent marketing Data Lake leveraging AWS technologies like blue, red shift and and actually modernise their data platforms. And number four is more around clean rooms, which is bring on your first party data. Join it with Amazon data to see how those customers are behaving when they are making a purchase on amazon dot com, which gives insights to these brands to reshape their marketing strategy to those customers. So those are like four or five focus areas. So I was >>gonna ask you about the data and the data strategy like, who owns the data? You're kind of alchemists that your clients have first party data and you might recommend bringing in other data sources. And you're sort of creating this new cocktail. Who owns the data? >>Well, ultimately, client also data because that that's their customers' data. Uh, to your point on, we helped them enrich that data by bringing in third party data, which is what we call is. So Merkel has a service called data source, which is essentially a collection of data that we acquire about customers. Their likes, their dislikes, their buying power, their interests so we monetise all that data. And the idea is to take those data assets and make them available on AWS data exchange so that it becomes very easy for brands to use their first party data. Take this third party data from Merkel and then, uh, segment their customers much more intelligently. >>And the CMO is your sort of ideal customer profile. >>Yeah, CMO is our main customer profile and we'll work with the chief data officer Will work with the chief technology officer. We kind of we bridge both sides. We can go technology and marketing and bring them both together. So you have a CMO who's trying to solve for some type of issue. And you have a chief technology officer who wants to improve their infrastructure. And we know how to bring them together into a conversation and help both parties get both get what they want. >>And I suppose the chief digital officer fits in there too. Yeah, he fits in their CDOs. Chief Digital officer CMO. Sometimes they're all they're one and the same. Other times they're mixed. I've seen see IOS and and CDOs together. Sure, you sort of. It's all data. It's all >>day. >>Yeah, some of the roles that come into play, as as Tom mentioned. And you mentioned C I o c T. O s chief information officer, chief technology officer, chief data officer, more from the side. And then we have the CMOS chief digital officers from the marketing side. So the secret sauce that Merkel brings to the table is that we know the language, what I t speaks and what business speaks. So when we talk about the business initiatives like direct to consumer Omni Channel E commerce, those are more business driven initiatives. That's where Merkel comes in to kind of help them with our expertise over the last 30 years on on how to run these strategic initiatives. And then at the same time, how do we translate translate those strategic initiatives into it transformation because it does require a lot of idea transformation to happen underneath. That's where AWS also helps us. So we kind of span across both sides of the horizon. >>So you got data. You've got tools, you've got software. You've got expertise that now you're making that available as a as a service. That's right. How far are you into that? journey of satisfying your business. >>Well, the cloud journey started almost, I would say, 5 to 7 years ago at Merkel, >>where you started, where you began leveraging the cloud. That's right. And then the light bulb went off >>the cloud again. We use clouds in multiple aspects, from general computing perspective, leveraging fully managed services that AWS offers. So that's one aspect, which is to bring in data from disparate sources, house it, analyse it and and derive intelligence. The second piece on the cloud side is, uh, SAS, offering software as a service offerings like Adobe Salesforce and other CDP platforms. So Merkel covers a huge spectrum. When it comes to cloud and you got >>a combination, you have a consulting business and also >>so Merkel has multiple service lines. Consulting business is one of them where we can help them on how to approach these transformational initiatives and give them blueprints and roadmaps and strategy. Then we can also help them understand what the customer strategy should be, so that they can market very intelligently to their end customers. Then we have a technology business, which is all about leveraging cloud and advanced analytics. Then we have data business that data assets that I was talking about, that we monetise. We have promotions and loyalty. We have media, so we recover multiple services portfolio. >>How do you mentioned analytics a couple times? How do you tie that? Back to the to the to the sales function. I would imagine your your clients are increasingly asking for analytics so they can manage their dashboards and and make sure they're above the line. How is that evolving? Yes, >>So that's a very important line because, you know, data is data, right? You bring in the data, but what you do with the data, how you know, how you ask questions and how you derive intelligence from it? Because that's the actionable part. So a few areas I'll give you one or two examples on how those analytics kind of come into picture. Let's imagine a brand which is trying to sell a particular product or a particular service to the to a set of customers Now who those set of customers are, You know where they should target this, who their target customers are, what the demographics are that's all done through and analytics and what I gave you is a very simple example. There are so many advanced examples, you know, that come into artificial intelligence machine learning those type of aspects as well. So analytics definitely play a huge role on how these brands need to sell and personalised the offerings that they're going to offer to. The customers >>used to be really pure art, right? It's really >>not anymore. It's all data driven. Moneyball. Moneyball? >>Yes, exactly. Exactly. Maybe still a little bit of hard in there, right? It doesn't hurt. It doesn't hurt to have a little creative flair still, but you've got to go with the data. >>That's where the expertise comes in, right? That's where the experience comes in and how you take that science and combine it with the art to present it to the end customer. That's exactly you know. It's a combination, >>and we also take the time to educate our clients on how we're doing it. So it's not done in a black box, so they can learn and grow themselves where they may end up developing their own group to handle it, as opposed to outsourcing with Merkel, >>teach them how to fish. Last question. Where do you see this in 2 to 3 years. Where do you want to take it? >>I think future is Cloud AWS being the market leader. I think aws has a huge role to play. Um, we are very excited to be partners with AWS. I think it's a match made in heaven. AWS cells in, uh, majority of the sales happen in our focus is marketing. I think if we can bring both the worlds together, I think that would be a very powerful story for us to be >>good news for AWS. They little your DNA can rub off on them would be good, guys. Thanks so much for coming to the Cube. Thank you. All right. Thank you for watching everybody. This is Dave Volonte for the Cube Day four aws re invent. Were the Cube the global leader in high tech coverage? Right back. Mhm. Mhm. Mhm.

Published Date : Dec 2 2021

SUMMARY :

You're watching the Tom Miller is here is the senior vice president of Alliances. is provide that technology to help bring that creative and media together. What are some of the big waves that you guys are seeing that you're trying to ride? And they need to move that to a cloud environment So as Tom said, you know, we're part of, uh Dentsu umbrella And you do that as a service. I mean, you really it's happening. And for the most part, they're trying to understand the Technology is enabling that is that kind of the play here? These are big data problems that all these brands are kind of trying to solve. I would say Think about our marketers friend, you know, and they have a transformation clients have first party data and you might recommend bringing in other data sources. And the idea is to take those data assets and make them available on AWS So you have a CMO And I suppose the chief digital officer fits in there too. So the secret sauce that Merkel brings to the table is that we know the language, So you got data. where you started, where you began leveraging the cloud. When it comes to cloud and you got Then we have a technology business, which is all about leveraging cloud and advanced analytics. the to the sales function. You bring in the data, but what you do with the data, how you know, how you ask questions and how you derive It's all data driven. It doesn't hurt to have a little creative flair still, but you've got to go with the data. That's where the experience comes in and how you take that science So it's not done in a black box, so they can learn and grow Where do you want to take it? I think aws has a huge role to play. Thanks so much for coming to the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TomPERSON

0.99+

Tom MillerPERSON

0.99+

MerkelPERSON

0.99+

AWSORGANIZATION

0.99+

Dave VolontePERSON

0.99+

Goldman SachsORGANIZATION

0.99+

NASDAQORGANIZATION

0.99+

2QUANTITY

0.99+

second pieceQUANTITY

0.99+

IOSTITLE

0.99+

ninth yearQUANTITY

0.99+

oneQUANTITY

0.99+

last yearDATE

0.99+

AmazonORGANIZATION

0.99+

fourQUANTITY

0.99+

awsORGANIZATION

0.99+

both sidesQUANTITY

0.99+

both partiesQUANTITY

0.99+

34 areasQUANTITY

0.99+

bothQUANTITY

0.99+

four daysQUANTITY

0.99+

one aspectQUANTITY

0.99+

last nightDATE

0.99+

over five yearsQUANTITY

0.99+

MartekORGANIZATION

0.99+

5DATE

0.99+

3 yearsQUANTITY

0.98+

DentsuORGANIZATION

0.98+

SASORGANIZATION

0.97+

Day fourQUANTITY

0.97+

first timeQUANTITY

0.97+

Ankur Jain,PERSON

0.96+

CMOSORGANIZATION

0.96+

AdobeORGANIZATION

0.96+

7 years agoDATE

0.96+

pandemicEVENT

0.96+

two examplesQUANTITY

0.95+

first partyQUANTITY

0.95+

CubeCOMMERCIAL_ITEM

0.94+

todayDATE

0.93+

five focus areasQUANTITY

0.91+

JanePERSON

0.9+

Number twoQUANTITY

0.89+

ReinventTITLE

0.89+

last 30 yearsDATE

0.89+

InventEVENT

0.85+

number fourQUANTITY

0.84+

one central placeQUANTITY

0.82+

Omni Channel EORGANIZATION

0.81+

MartinPERSON

0.78+

number oneQUANTITY

0.77+

four awsQUANTITY

0.76+

Tomer Shiran, Dremio | AWS re:Invent 2021


 

>>Good morning. Welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets of remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back. One of our cube alumni timbers. She ran the founder and CPO of Jenny-O to the program. Tom is going to be talking about why 2022 is the year open data architectures surpass the data warehouse Timur. Welcome back to the >>Cube. Thanks for having me. It's great to be here. It's >>Great to be here at a live event in person, my goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to, I want to unpack that with you. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? >>Yeah, for us, it's a big year of, uh, a lot of product news, a lot of new products, new innovation, a company's grown a lot. We're, uh, you know, probably three times bigger than we were a year ago. So a lot of, a lot of new, new folks on the team and, uh, many, many new customers. >>It's good, always new customers, especially during the last 22 months, which have been obviously incredibly challenging, but I want to unpack this, the difference between a data lake and data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities and how customers are benefiting. Sure. Yeah. >>I think you could think of the lake house as kind of the evolution of the lake, right? So we have, we've had data lakes for a while. Now, the transition to the cloud made them a lot more powerful and now a lot of new capabilities coming into the world of data lakes really make the, that whole kind of concept that whole architecture, much more powerful to the point that you really are not going to need a data warehouse anymore. Right. And so it kind of gives you the best of both worlds, all the advantages that we had with data lakes, the flexibility to use different processing engines, to have data in your own account and open formats, um, all those benefits, but also the benefits that you had with warehouses, where you could do transactions and get high performance for your, uh, BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the, the benefits of both >>Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go about from say they've got data warehouses, data lakes to actually evolving to the lake house. >>You know, data warehouses have been around forever, right? And you know, there's, there's been some new innovation there as we've kind of moved to the cloud, but fundamentally there are very close and very proprietary architecture that gets very expensive quickly. And so, you know, with a data warehouse, you have to take your data and load it into the warehouse, right. You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, that's, that's what you do. You bring the data into the engine. Um, the data lake house is a really different architecture. It's one where you actually, you're having, you have data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dremio for high performance, uh, BI workloads and SQL type of analysis, a spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the, on the same data and the data stays in the customer's own account, right? So S3 effectively becomes their new data warehouse. >>Okay. So it can imagine during the last 22 months of this scattered work from Eddie, and we're still in this work from anywhere environment with so much data being generated at the edge of the edge, expanding that bringing the engines to the data is probably now more timely than ever. >>Yeah. I think the, the growth in data, uh, you see it everywhere, right? That that's the reason so many companies like ourselves are doing so well. Right? It's, it's, there's so much new data, so many new use cases and every company wants to be data-driven right. They all want to be, you know, to, to democratize data within the organization. Um, you know, but you need the platforms to be able to do that. Right. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3, but move it into, you know, subsets of it into a data warehouse. And then from there move, you know, substance of that into, you know, BI extracts, right? Tableau extracts power BI imports, and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. Um, and that's one of the main things that we've been focused on at Dremio, um, is really taking the, the, the lake house and the lake and making it, not just something that data scientists use for, you know, really kind of advanced use cases, but even your production BI workloads can actually now run on the lake house when you're using a SQL technology. Like, and then >>It's really critical because as you talked about this, you know, companies, every company, these days is a data company. If they're not, they have to be, or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real-time data access is no longer, a nice to have. It's really an essential for businesses in any organization. >>I think, you know, we, we see it even in our own company, right? The folks that are joining the workforce now, they, they learn sequel in school, right. They, they, they don't want to report on their desk, printed out every Monday morning. They want access to the database. How do I connect my whatever tool I want, or even type sequel by hand. And I want access to the data and I want to just use it. Right. And I want the performance of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. Um, and that's basically what we're solving >>The lake house versus a data warehouse, better able to really facilitate data democratization across an organization. >>Yeah. Because there's a big, you know, people don't talk a lot about the story before the story, right. With, with a data warehouse, the data never starts there. Right. You typically first have your data in something like an S3 or perhaps in other databases, right. And then you have to kind of ETL at all into, um, into that warehouse. And that's a lot of work. And typically only a small subset of the data gets ETL into that data warehouse. And then the user wants to query something that's not in the warehouse. And somebody has to go from engineering, spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, uh, to get the data in. And so it's a big problem, right? And so if you can have a system that can query the data directly in S3 and even join it with sources, uh, outside of that things like your Oracle database, your, your SQL server database here, you know, Mongo, DB, et cetera. Well, now you can really have the ability to expose data to your, to your users within the company and make it very self-service. They can, they can query any data at any time and get a fast response time that that's, that's what they need >>At self-service is key there. Speaking of self-service and things that are new. I know you guys dromio cloud launched that recently, new SAS offering. Talk to me about that. What's going on there. Yeah. >>We want to stream your cloud. We, we spent about two years, um, working on that internally and, uh, really the goal was to simplify how we deliver all of the, kind of the benefits that we've had in our product. Right. Sub-second response times on the lake, a semantic layer, the ability to connect to multiple sources, but take away the pain of having to, you know, install and manage software. Right. And so we did it in a way that the user doesn't have to think about versions. They don't have to think about upgrades. They don't have to monitor anything. It's basically like running and using Gmail. Right? You log in, you, you get to use it, right. You don't have to be very sophisticated. There's no, not a lot of administration you have to do. Um, it basically makes it a lot, a lot simpler. >>And what's the adoption been like so far? >>It's been great. It's been limited availability, but we've been onboarding customers, uh, every week now. Um, many startups, many of the world's largest companies. So that's been, that's been really exciting actually. >>So quite a range of customers. And one of the things, it sounds like you want me to has grown itself during the pandemic. We've seen acceleration of, of that, of, of, uh, startups, of a lot of companies, of cloud adoption of migration. What are some, how have your customer conversations changed in the last 22 months as businesses and every industry kind of scrambled in the beginning to, to survive and now are realizing that they need to modernize, to thrive and to be competitive and to have competitive advantage. >>I think I've seen a few different trends here. One is certainly, there's been a lot of, uh, acceleration of movement to the cloud, right? With, uh, uh, you know, how different businesses have been impacted. It's required them to be more agile, more elastic, right. They don't necessarily know how much workload they're gonna have at any point in time. So having that flexibility, both in terms of the technology that can, you know, with Dremio cloud, we scale, for example, infinitely, like you can have, you know, one query a day, or you can have a thousand queries a second and the system just takes care of it. Right. And so that's really important to these companies that are going through, you know, being impacted in various different ways, right? You had the companies, you know, the Peloton and zooms of the world that were business was exploding. >>And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, uh, you know, since then, but so that flexibility, um, has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of InsureTech, FinTech, you know, different, uh, companies that are trying to take advantage of data. So if you, as a, as an enterprise are not doing that, you know, that's a problem. >>It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know, there's someone in that rear view mirror who is going to be focused on data. I have a real solid, modern data strategy. That's going to be able to take over if a company is resting on its laurels at all. So here we are at reinvent, they talked a lot about, um, I just came off of Adam psyllid speeds. So Lipsey's keynote. But talk to me about the jumbo AWS partnership. I know AWS its partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? >>You know, we've been very close partners with AWS for, for a number of years now, and it kind of spans many different parts of AWS from kind of the, uh, the engineering organization. So very close relationship with the S3 team, the C2 team, uh, you know, just having dinner last night with, uh, Kevin Miller, the GM of S3. Um, and so that's kind of one side of things is really the engineering integration. You know, we're the first technology to integrate with AWS lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. Um, and then also they've been really helpful on the go-to-market side of things on the sales and marketing, um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually promoting Dremio to their customers, um, uh, to help them be successful. So it's really been a good, good partnership. >>And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession sounds like you're, there's deep alignment on from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment, because when you're going into customer conversations, I imagine they want to see one unified team. >>Yeah. You know, I think Amazon does have that customer first and obviously we do as well. And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. Right. So that's where we call it customer obsession internally. Um, and I think that's very much what we've seen, you know, with, with AWS as well as the desire to make the customer successful comes before. Okay. How does this affect a specific Amazon product? Right? Because anytime a customer is, uh, you know, using Dremio on AWS, they're also consuming many different AWS services and they're bringing data into AWS. And so, um, I, I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case. Yup. >>Solving those customer problems is the whole reason that we're all here. Right. Talk to me a little bit about, um, as we have just a few more minutes here, we, when we hear terms like, future-proof, I always want to dig in with, with folks like yourself, chief product officers, what does it actually mean? How do you enable businesses to create these future-proof data architectures that are gonna allow them to scale and be really competitive? Sure. >>So yeah, I think many companies have been, have experienced. What's known as lock-in right. They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, right? You, you start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with the cloud data warehouses 10 times more than you thought you were going to be spending. And at that point it becomes very difficult. Right? What do you do? And so, um, one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account. Right? It's in their S3 buckets in source formats, like parquet files and iceberg tables. Um, and they can use many different technologies on that. So, you know, today the best technology for, for, you know, sequel and, you know, powering your, your mission critical BI is, is Dremio, but tomorrow they might be something else, right. >>And that customer can then take that, uh, uh, that company can take that new technology point at the same data and start using it right. That they don't have to go through some really crazy migration process. And, you know, we see that with Teradata data and Oracle, right? The, the, the old school vendors, um, that's always been a pain. And now it is with the, with the newer, uh, cloud data warehouses, you see a lot of complaints around that, so that the lake house is fundamentally designed. Especially if you choose open source formats, like iceberg tables, as opposed to say a Delta, like you're, you're really, you know, future-proofing yourself. Right. Um, >>Got it. Talk to me about some of the things as we wrap up here that, that attendees can learn and see and touch and feel and smell at the jumbo booth at this reinvent. >>Yeah. I think there's a, there's a few different things they can, uh, they can watch, uh, watch a demo or play around with the dremmel cloud and they can talk to our team about what we're doing with Apache iceberg. It's a iceberg to me is one of the more exciting projects, uh, in this space because, you know, it's just created by Netflix and apple Salesforce, AWS just announced support for iceberg with that, with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind iceberg now for, for a while. And so that to us is very exciting. We're happy to chat with folks at the booth about that. Um, Nessie is another project that we created an source project for, uh, really providing a good experience for your data, where you have version control and branching, and kind of trying to reinvent, uh, data engineering, data management. So that's another cool project that there, uh, we can talk about at the booth. >>So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program today, talking about the difference between a data warehouse data lake, the lake house, did a great job explaining that Jamil cloud what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Thank you. Thanks for having me. My pleasure for Tomer. She ran I'm Lisa Martin. You're watching the cube. Our coverage of AWS reinvent continues after this.

Published Date : Nov 30 2021

SUMMARY :

She ran the founder and CPO of Jenny-O to the program. It's great to be here. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, We're, uh, you know, probably three times bigger than we were a year data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities So the lake house makes kind of both of those come together and gives you the, the benefits of both Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, expanding that bringing the engines to the data is probably now more timely than ever. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, It's really critical because as you talked about this, you know, companies, every company, these days is a data company. I think, you know, we, we see it even in our own company, right? The lake house versus a data warehouse, better able to really facilitate data democratization across spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, I know you guys dromio cloud launched that recently, to, you know, install and manage software. Um, many startups, many of the world's largest companies. And one of the things, it sounds like you want me to has grown itself during the pandemic. So having that flexibility, both in terms of the technology that can, you know, And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, You're one of the partners, but talk to me about what's going on with the partnership. um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. How do you enable businesses to create these future-proof They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, And, you know, we see that with Teradata data and Oracle, right? Talk to me about some of the things as we wrap up here that, that attendees can learn and see and uh, in this space because, you know, it's just created by Netflix and apple Salesforce, So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

AmazonORGANIZATION

0.99+

Kevin MillerPERSON

0.99+

AWSORGANIZATION

0.99+

TomPERSON

0.99+

10 timesQUANTITY

0.99+

10 timesQUANTITY

0.99+

TomerPERSON

0.99+

OracleORGANIZATION

0.99+

ElizabethPERSON

0.99+

two monthsQUANTITY

0.99+

Tomer ShiranPERSON

0.99+

TeradataORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

bothQUANTITY

0.99+

LipseyPERSON

0.99+

DremioPERSON

0.99+

tomorrowDATE

0.99+

appleORGANIZATION

0.99+

a monthQUANTITY

0.99+

OneQUANTITY

0.99+

fall of 2021DATE

0.98+

todayDATE

0.98+

EddiePERSON

0.98+

oneQUANTITY

0.98+

both worldsQUANTITY

0.98+

Adam psyllidPERSON

0.98+

GmailTITLE

0.98+

S3TITLE

0.97+

next decadeDATE

0.97+

SQLTITLE

0.97+

a year agoDATE

0.97+

three timesQUANTITY

0.97+

two live setsQUANTITY

0.97+

2022DATE

0.97+

this weekDATE

0.96+

icebergTITLE

0.96+

DremioORGANIZATION

0.96+

firstQUANTITY

0.96+

about two yearsQUANTITY

0.95+

ApacheORGANIZATION

0.95+

TableauTITLE

0.95+

Monday morningDATE

0.94+

SASORGANIZATION

0.94+

one queryQUANTITY

0.94+

JemenaORGANIZATION

0.94+

earlier this summerDATE

0.93+

secondQUANTITY

0.93+

first focusQUANTITY

0.92+

last 22 monthsDATE

0.91+

DeltaORGANIZATION

0.9+

zeroQUANTITY

0.9+

2021DATE

0.89+

last nightDATE

0.87+

a thousand queriesQUANTITY

0.85+

MongoORGANIZATION

0.85+

a dayQUANTITY

0.84+

first technologyQUANTITY

0.82+

pandemicEVENT

0.81+

a secondQUANTITY

0.8+

Greg Rokita, Edmunds.com & Joel Minnick, Databricks | AWS re:Invent 2021


 

>>We'll come back to the cubes coverage of AWS reinvent 2021, the industry's most important hybrid event. Very few hybrid events, of course, in the last two years. And the cube is excited to be here. Uh, this is our ninth year covering AWS reinvent this the 10th reinvent we're here with Joel Minnick, who the vice president of product and partner marketing at smoking hot company, Databricks and Greg Rokita, who is executive director of technology at Edmonds. If you're buying a car or leasing a car, you gotta go to Edmund's. We're gonna talk about busting data silos, guys. Great to see you again. >>Welcome. Welcome. Glad to be here. >>All right. So Joel, what the heck is a lake house? This is all over the place. Everybody's talking about lake house. What is it? >>And it did well in a nutshell, a Lakehouse is the ability to have one unified platform to handle all of your traditional analytics workloads. So your BI and reporting Trisha, the lake, the workloads that you would have for your data warehouse on the same platform as the workloads that you would have for data science and machine learning. And so if you think about kind of the way that, uh, most organizations have built their infrastructure in the cloud today, what we have is generally customers will land all their data in a data lake and a data lake is fantastic because it's low cost, it's open. It's able to handle lots of different kinds of data. Um, but the challenges that data lakes have is that they don't necessarily scale very well. It's very hard to govern data in a data lake house. It's very hard to manage that data in a data lake, sorry, in a, in a data lake. >>And so what happens is that customers then move the data out of a data lake into downstream systems and what they tend to move it into our data warehouses to handle those traditional reporting kinds of workloads that they have. And they do that because data warehouses are really great at being able to have really great scale, have really great performance. The challenge though, is that data warehouses really only work for structured data. And regardless of what kind of data warehouse you adopt, all data warehouse and platforms today are built on some kind of proprietary format. So once you've put that data into the data warehouse, that's, that is kind of what you're locked into. The promise of the data lake house was to say, look, what if we could strip away all of that complexity and having to move data back and forth between all these different systems and keep the data exactly where it is today and where it is today is in the data lake. >>And then being able to apply a transaction layer on top of that. And the Databricks case, we do that through a technology and open source technology called data lake, or sorry, Delta lake. And what Delta lake allows us to do is when you need it, apply that performance, that reliability, that quality, that scale that you would expect out of a data warehouse directly on your data lake. And if I can do that, then what I'm able to do now is operate from one single source of truth that handles all of my analytics workloads, both my traditional analytics workloads and my data science and machine learning workloads, and being able to have all of those workloads on one common platform. It means that now not only do I get much, much more simple in the way that my infrastructure works and therefore able to operate at much lower costs, able to get things to production much, much faster. >>Um, but I'm also able to now to leverage open source in a much bigger way being that lake house is inherently built on an open platform. Okay. So I'm no longer locked into any kind of data format. And finally, probably one of the most, uh, lasting benefits of a lake house is that all the roles that have to take that have to touch my data for my data engineers, to my data analyst, my data scientists, they're all working on the same data, which means that collaboration that has to happen to go answer really hard problems with data. I'm now able to do much, much more easy because those silos that traditionally exist inside of my environment no longer have to be there. And so Lakehouse is that is the promise to have one single source of truth, one unified platform for all of my data. Okay, >>Great. Thank you for that very cogent description of what a lake house is now. Let's I want to hear from the customer to see, okay, this is what he just said. True. So actually, let me ask you this, Greg, because the other problem that you, you didn't mention about the data lake is that with no schema on, right, it gets messy and Databricks, I think, correct me if I'm wrong, has begun to solve that problem, right? Through series of tooling and AI. That's what Delta liked us. It's a man, like it's a managed service. Everybody thought you were going to be like the cloud era of spark and Brittany Britain, a brilliant move to create a managed service. And it's worked great. Now everybody has a managed service, but so can you paint a picture at Edmonds as to what you're doing with, maybe take us through your journey the early days of a dupe, a data lake. Oh, that sounds good. Throw it in there, paint a picture as to how you guys are using data and then tie it into what y'all just said. >>As Joel said, that they'll the, it simplifies the architecture quite a bit. Um, in a modern enterprise, you have to deal with a variety of different data sources, structured semi-structured and unstructured in the form of images and videos. And with Delta lake and built a lake, you can have one system that handles all those data sources. So what that does is that basically removes the issue of multiple systems that you have to administer. It lowers the cost, and it provides consistency. If you have multiple systems that deal with data, you always arise as the issue as to which data has to be loaded into which system. And then you have issues with consistency. Once you have issues with consistency, business users, as analysts will stop trusting your data. So that was very critical for us to unify the system of data handling in the one place. >>Additionally, you have a massive scalability. So, um, I went to the talk with from apple saying that, you know, he can process two years worth of data. Instead of just two days in an Edmonds, we have this use case of backfilling the data. So often we changed the logic and went to new. We need to reprocess massive amounts of data with the lake house. We can reprocess months worth of data in, in a matter of minutes or hours. And additionally at the data lake houses based on open, uh, open standards, like parquet that allowed us, allowed us to basically hope open source and third-party tools on top of the Delta lake house. Um, for example, a Mattson, we use a Matson for data discovery, and finally, uh, the lake house approach allows us for different skillsets of people to work on the same source data. We have analysts, we have, uh, data engineers, we have statisticians and data scientists using their own programming languages, but working on the same core of data sets without worrying about duplicating data and consistency issues between the teams. >>So what, what is, what are the primary use cases where you're using house Lakehouse Delta? >>So, um, we work, uh, we have several use cases, one of them more interesting and important use cases as vehicle pricing, you have used the Edmonds. So, you know, you go to our website and you use it to research vehicles, but it turns out that pricing and knowing whether you're getting a good or bad deal is critical for our, uh, for our business. So with the lake house, we were able to develop a data pipeline that ingests the transactions, curates the transactions, cleans them, and then feeds that curated a curated feed into the machine learning model that is also deployed on the lake house. So you have one system that handles this huge complexity. And, um, as you know, it's very hard to find unicorns that know all those technologies, but because we have flexibility of using Scala, Java, uh, Python and SQL, we have different people working on different parts of that pipeline on the same system and on the same data. So, um, having Lakehouse really enabled us to be very agile and allowed us to deploy new sources easily when we, when they arrived and fine tune the model to decrease the error rates for the price prediction. So that process is ongoing and it's, it's a very agile process that kind of takes advantage of the, of the different skill sets of different people on one system. >>Because you know, you guys democratized by car buying, well, at least the data around car buying because as a consumer now, you know, I know what they're paying and I can go in of course, but they changed their algorithms as well. I mean, the, the dealers got really smart and then they got kickbacks from the manufacturer. So you had to get smarter. So it's, it's, it's a moving target, I guess. >>Great. The pricing is actually very complex. Like I, I don't have time to explain it to you, but knowing, especially in this crazy market inflationary market where used car prices are like 38% higher year over year, and new car prices are like 10% higher and they're changing rapidly. So having very responsive pricing model is, is extremely critical. Uh, you, I don't know if you're familiar with Zillow. I mean, they almost went out of business because they mispriced their, uh, their houses. So, so if you own their stock, you probably under shorthand of it, but, you know, >>No, but it's true because I, my lease came up in the middle of the pandemic and I went to Edmonds, say, what's this car worth? It was worth like $7,000. More than that. Then the buyout costs the residual value. I said, I'm taking it, can't pass up that deal. And so you have to be flexible. You're saying the premises though, that open source technology and Delta lake and lake house enabled that flexible. >>Yes, we are able to ingest new transactions daily recalculate our model within less than an hour and deploy the new model with new pricing, you know, almost real time. So, uh, in this environment, it's very critical that you kind of keep up to date and ingest their latest transactions as they prices change and recalculate your model that predicts the future prices. >>Because the business lines inside of Edmond interact with the data teams, you mentioned data engineers, data scientists, analysts, how do the business people get access to their data? >>Originally, we only had a core team that was using Lakehouse, but because the usage was so powerful and easy, we were able to democratize it across our units. So other teams within software engineering picked it up and then analysts picked it up. And then even business users started using the dashboarding and seeing, you know, how the price has changed over time and seeing other, other metrics within the, >>What did that do for data quality? Because I feel like if I'm a business person, I might have context of the data that an analyst might not have. If they're part of a team that's servicing all these lines of business, did you find that the data quality, the collaboration affected data? >>Th the biggest thing for us was the fact that we don't have multiple systems now. So you don't have to load the data. Whenever you have to load the data from one system to another, there is always a lag. There's always a delay. There is always a problematic job that didn't do the copy correctly. And the quality is uncertain. You don't know which system tells you the truth. Now we just have one layer of data. Whether you do reports, whether you're data processing or whether you do modeling, they all read the same data. And the second thing is that with the dashboarding capabilities, people that were not very technical that before we could only use Tableau and Tableau is not the easiest thing to use as if you're not technical. Now they can use it. So anyone can see how our pricing data looks, whether you're an executive, whether you're an analyst or a casual business users, >>But Hey, so many questions, you guys are gonna have to combat. I'm gonna run out of time, but you now allow a consumer to buy a car directly. Yes. Right? So that's a new service that you launched. I presume that required new data. We give, we >>Give consumers offers. Yes. And, and that offer you >>Offered to buy my league. >>Exactly. And that offer leverages the pricing that we develop on top of the lake house. So the most important thing is accurately giving you a very good offer price, right? So if we give you a price, that's not so good. You're going to go somewhere else. If we give you price, that's too high, we're going to go bankrupt like Zillow debt, right. >>It took to enable that you're working off the same dataset. Yes. You're going to have to spin up a, did you have to inject new data? Was there a new data source that we're working on? >>Once we curate the data sources and once we clean it, we see the directly to the model. And all of those components are running on the lake house, whether you're curating the data, cleaning it or running the model. The nice thing about lake house is that machine learning is the first class citizen. If you use something like snowflake, I'm not going to slam snowflake here, but you >>Have two different use case. You have >>To, you have to load it into a different system later. You have to load it into a different system. So like good luck doing machine learning on snowflake. Right. >>Whereas, whereas Databricks, that's kind of your raison d'etre >>So what are your, your, your data engineer? I feel like I should be a salesman or something. Yeah. I'm not, I'm not saying that. Just, just because, you know, I was told to, like, I'm saying it because of that's our use case, >>Your use case. So question for each of you, what, what business results did you see when you went to kind of pre lake house, post lake house? What are the, any metrics you can share? And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, what can you tell us? Well, >>Uh, before their lake house, we had two different systems. We had one for processing, which was still data breaks. And the second one for serving and we iterated over Nateeza or Redshift, but we figured that maintaining two different system and loading data from one to the other was a huge overhead administration security costs. Um, the fact that you had to consistency issues. So the fact that you can have one system, um, with, uh, centralized data, solves all those issues. You have to have one security mechanism, one administrative mechanism, and you don't have to load the data from one system to the other. You don't have to make compromises. >>It's scale is not a problem because of the cloud, >>Because you can spend clusters at will for different use cases. So your clusters are independent. You have processing clusters that are not affecting your serving clusters. So, um, in the past, if you were running a serving, say on Nateeza or Redshift, if you were doing heavy processing, your reports would be affected, but now all those clusters are separated. So >>Consumer data consumer can take that data from the producer independ >>Using its own cluster. Okay. >>Yeah. I'll give you the final word, Joel. I know it's been, I said, you guys got to come back. This is what have you seen broadly? >>Yeah. Well, I mean, I think Greg's point about scale. It's an interesting one. So if you look at cross the entire Databricks platform, the platform is launching 9 million VMs every day. Um, and we're in total processing over nine exabytes a month. So in terms of just how much data the platform is able to flow through it, uh, and still maintain a extremely high performance is, is bar none out there. And then in terms of, if you look at just kind of the macro environment of what's happening out there, you know, I think what's been most exciting to watch or what customers are experiencing traditionally or, uh, on the traditional data warehouse and kinds of workloads, because I think that's where the promise of lake house really comes into its own is saying, yes, I can run these traditional data warehousing workloads that require a high concurrency high scale, high performance directly on my data lake. >>And, uh, I think probably the two most salient data points to raise up there is, uh, just last month, Databricks announced it's set the world record for the, for the, uh, TPC D S 100 terabyte benchmark. So that is a place where Databricks at the lake house architecture, that benchmark is built to measure data warehouse performance and the lake house beat data warehouse and sat their own game in terms of overall performance. And then what's that spends from a price performance standpoint, it's customers on Databricks right now are able to enjoy that level of performance at 12 X better price performance than what cloud data warehouses provide. So not only are we jumping on this extremely high scale and performance, but we're able to do it much, much more efficiently. >>We're gonna need a whole nother section second segment to talk about benchmarking that guys. Thanks so much, really interesting session and thank you and best of luck to both join the show. Thank you for having us. Very welcome. Okay. Keep it right there. Everybody you're watching the cube, the leader in high-tech coverage at AWS reinvent 2021

Published Date : Nov 30 2021

SUMMARY :

Great to see you again. Glad to be here. This is all over the place. and reporting Trisha, the lake, the workloads that you would have for your data warehouse on And regardless of what kind of data warehouse you adopt, And what Delta lake allows us to do is when you need it, that all the roles that have to take that have to touch my data for as to how you guys are using data and then tie it into what y'all just said. And with Delta lake and built a lake, you can have one system that handles all Additionally, you have a massive scalability. So you have one system that So you had to get smarter. So, so if you own their stock, And so you have to be flexible. less than an hour and deploy the new model with new pricing, you know, you know, how the price has changed over time and seeing other, other metrics within the, lines of business, did you find that the data quality, the collaboration affected data? So you don't have to load But Hey, so many questions, you guys are gonna have to combat. So the most important thing is accurately giving you a very good offer did you have to inject new data? I'm not going to slam snowflake here, but you You have To, you have to load it into a different system later. Just, just because, you know, I was told to, And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, So the fact that you can have one system, So, um, in the past, if you were running a serving, Okay. This is what have you seen broadly? So if you look at cross the entire So not only are we jumping on this extremely high scale and performance, but we're able to do it much, Thanks so much, really interesting session and thank you and best of luck to both join the show.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JoelPERSON

0.99+

GregPERSON

0.99+

Joel MinnickPERSON

0.99+

$7,000QUANTITY

0.99+

Greg RokitaPERSON

0.99+

38%QUANTITY

0.99+

two daysQUANTITY

0.99+

10%QUANTITY

0.99+

JavaTITLE

0.99+

DatabricksORGANIZATION

0.99+

two yearsQUANTITY

0.99+

one systemQUANTITY

0.99+

oneQUANTITY

0.99+

ScalaTITLE

0.99+

appleORGANIZATION

0.99+

PythonTITLE

0.99+

SQLTITLE

0.99+

ninth yearQUANTITY

0.99+

last monthDATE

0.99+

lake houseORGANIZATION

0.99+

two different systemsQUANTITY

0.99+

TableauTITLE

0.99+

2021DATE

0.99+

9 million VMsQUANTITY

0.99+

second thingQUANTITY

0.99+

less than an hourQUANTITY

0.99+

LakehouseORGANIZATION

0.98+

12 XQUANTITY

0.98+

DeltaORGANIZATION

0.98+

Delta lake houseORGANIZATION

0.98+

one layerQUANTITY

0.98+

one common platformQUANTITY

0.98+

bothQUANTITY

0.97+

AWSORGANIZATION

0.97+

ZillowORGANIZATION

0.97+

Brittany BritainPERSON

0.97+

Edmunds.comORGANIZATION

0.97+

two different systemQUANTITY

0.97+

EdmondsORGANIZATION

0.97+

over nine exabytes a monthQUANTITY

0.97+

todayDATE

0.96+

Lakehouse DeltaORGANIZATION

0.96+

Delta lakeORGANIZATION

0.95+

TrishaPERSON

0.95+

data lakeORGANIZATION

0.94+

MattsonORGANIZATION

0.92+

second segmentQUANTITY

0.92+

eachQUANTITY

0.92+

MatsonORGANIZATION

0.91+

two most salient data pointsQUANTITY

0.9+

EdmondsLOCATION

0.89+

100 terabyteQUANTITY

0.87+

one single sourceQUANTITY

0.86+

first classQUANTITY

0.85+

NateezaTITLE

0.85+

one securityQUANTITY

0.85+

RedshiftTITLE

0.84+

Did HPE GreenLake Just Set a New Bar in the On-Prem Cloud Services Market?


 

>> Welcome back to The Cube's coverage of HPE's GreenLake announcements. My name is Dave Vellante and you're watching the Cube. I'm here with Holger Mueller, who is an analyst at Constellation Research. And Matt Maccaux is the global field CTO of Ezmeral software at HPE. We're going to talk data. Gents, great to see you. >> Holger: Great to be here. >> So, Holger, what do you see happening in the data market? Obviously data's hot, you know, digital, I call it the force marks to digital. Everybody realizes wow, digital business, that's a data business. We've got to get our data act together. What do you see in the market is the big trends, the big waves? >> We are all young enough or old enough to remember when people were saying data is the new oil, right? Nothing has changed, right? Data is the key ingredient, which matters to enterprise, which they have to store, which they have to enrich, which they have to use for their decision-making. It's the foundation of everything. If you want to go into machine learning or (indistinct) It's growing very fast, right? We have the capability now to look at all the data in enterprise, which weren't able 10 years ago to do that. So data is main center to everything. >> Yeah, it's even more valuable than oil, I think, right? 'Cause with oil, you can only use once. Data, you can, it's kind of polyglot. I can go in different directions and it's amazing, right? >> It's the beauty of digital products, right? They don't get consumed, right? They don't get fired up, right? And no carbon footprint, right? "Oh wait, wait, we have to think about carbon footprint." Different story, right? So to get to the data, you have to spend some energy. >> So it's that simple, right? I mean, it really is. Data is fundamental. It's got to be at the core. And so Matt, what are you guys announcing today, and how does that play into what Holger just said? >> What we're announcing today is that organizations no longer need to make a difficult choice. Prior to today, organizations were thinking if I'm going to do advanced machine learning and really exploit my data, I have to go to the cloud. But all my data's still on premises because of privacy rules, industry rules. And so what we're announcing today, through GreenLake Services, is a cloud services way to deliver that same cloud-based analytical capability. Machine learning, data engineering, through hybrid analytics. It's a unified platform to tie together everything from data engineering to advance data science. And we're also announcing the world's first Kubernetes native object store, that is hybrid cloud enabled. Which means you can keep your data connected across clouds in a data fabric, or Dave, as you say, mesh. >> Okay, can we dig into that a little bit? So, you're essentially saying that, so you're going to have data in both places, right? Public cloud, edge, on-prem, and you're saying, HPE is announcing a capability to connect them, I think you used the term fabric. I'm cool, by the way, with the term fabric, we can, we'll parse that out another time. >> I love for you to discuss textiles. Fabrics vs. mesh. For me, every fabric breaks down to mesh if you put it on a microscope. It's the same thing. >> Oh wow, now that's really, that's too detailed for my brain, right this moment. But, you're saying you can connect all those different estates because data by its very nature is everywhere. You're going to unify that, and what, that can manage that through sort of a single view? >> That's right. So, the management is centralized. We need to be able to know where our data is being provisioned. But again, we don't want organizations to feel like they have to make the trade off. If they want to use cloud surface A in Azure, and cloud surface B in GCP, why not connect them together? Why not allow the data to remain in sync or not, through a distributed fabric? Because we use that term fabric over and over again. But the idea is let the data be where it most naturally makes sense, and exploit it. Monetization is an old tool, but exploit it in a way that works best for your users and applications. >> In sync or not, that's interesting. So it's my choice? >> That's right. Because the back of an automobile could be a teeny tiny, small edge location. It's not always going to be in sync until it connects back up with a training facility. But we still need to be able to manage that. And maybe that data gets persisted to a core data center. Maybe it gets pushed to the cloud, but we still need to know where that data is, where it came from, its lineage, what quality it has, what security we're going to wrap around that, that all should be part of this fabric. >> Okay. So, you've got essentially a governance model, at least maybe you're working toward that, and maybe it's not all baked today, but that's the north star. Is this fabric connect, single management view, governed in a federated fashion? >> Right. And it's available through the most common API's that these applications are already written in. So, everybody today's talking S3. I've got to get all of my data, I need to put it into an object store, it needs to be S3 compatible. So, we are extending this capability to be S3 native. But it's optimized for performance. Today, when you put data in an object store, it's kind of one size fits all. Well, we know for those streaming analytical capabilities, those high performance workloads, it needs to be tuned for that. So, how about I give you a very small object on the very fastest disk in your data center and maybe that cheaper location somewhere else. And so we're giving you that balance as part of the overall management estate. >> Holger, what's your take on this? I mean, Frank Slootman says we'll never, we're not going halfway house. We're never going to do on-prem, we're only in the cloud. So that basically says, okay, he's ignoring a pretty large market by choice. You're not, Matt, you must love those words. But what do you see as the public cloud players, kind of the moves on-prem, particularly in this realm? >> Well, we've seen lots of cloud players who were only cloud coming back towards on-premise, right? We call it the next generation compute platform where I can move data and workloads between on-premise and ideally, multiple clouds, right? Because I don't want to be logged into public cloud vendors. And we see two trends, right? One trend is the traditional hardware supplier of on-premise has not scaled to cloud technology in terms of big data analytics. They just missed the boat for that in the past, this is changing. You guys are a traditional player and changing this, so congratulations. The other thing, is there's been no innovation for the on-premise tech stack, right? The only technology stack to run modern application has been invested for a long time in the cloud. So what we see since two, three years, right? With the first one being Google with Kubernetes, that are good at GKE on-premise, then onto us, right? Bringing their tech stack with compromises to on-premises, right? Acknowledging exactly what we're talking about, the data is everywhere, data is important. Data gravity is there, right? It's just the network's fault, where the networks are too slow, right? If you could just move everything anywhere we want like juggling two balls, then we'd be in different place. But that's the not enough investment for the traditional IT players for that stack, and the modern stack being there. And now every public cloud player has an on-premise offering with different flavors, different capabilities. >> I want to give you guys Dave's story of kind of history and you can kind of course correct, and tell me how this, Matt, maybe fits into what's happened with customers. So, you know, before Hadoop, obviously you had to buy a big Oracle database and you know, you running Unix, and you buy some big storage subsystem if you had any money left over, you know, you maybe, you know, do some actual analytics. But then Hadoop comes in, lowers the cost, and then S3 kneecaps the entire Hadoop market, right? >> I wouldn't say that, I wouldn't agree. Sorry to jump on your history. Because the fascinating thing, what Hadoop brought to the enterprise for the first time, you're absolutely right, affordable, right, to do that. But it's not only about affordability because S3 as the affordability. The big thing is you can store information without knowing how to analyze it, right? So, you mentioned Snowflake, right? Before, it was like an Oracle database. It was Starschema for data warehouse, and so on. You had to make decisions how to store that data because compute capabilities, storage capabilities, were too limited, right? That's what Hadoop blew away. >> I agree, no schema on, right. But then that created data lakes, which create a data swamps, and that whole mess, and then Spark comes in and help clean it out, okay, fine. So, we're cool with that. But the early days of Hadoop, you had, companies would have a Hadoop monolith, they probably had their data catalog in Excel or Google sheets, right? And so now, my question to you, Matt, is there's a lot of customers that are still in that world. What do they do? They got an option to go to the cloud. I'm hearing that you're giving them another option? >> That's right. So we know that data is going to move to the cloud, as I mentioned. So let's keep that data in sync, and governed, and secured, like you expect. But for the data that can't move, let's bring those cloud native services to your data center. And so that's a big part of this announcement is this unified analytics. So that you can continue to run the tools that you want to today while bringing those next generation tools based on Apache Spark, using libraries like Delta Lake so you can go anything from Tableaux through Presto sequel, to advance machine learning in your Jupiter notebooks on-premises where you know your data is secured. And if it happens to sit in existing Hadoop data lake, that's fine too. We don't want our customers to have to make that trade off as they go from one to the other. Let's give you the best of both worlds, or as they say, you can eat your cake and have it too. >> Okay, so. Now let's talk about sort of developers on-prem, right? They've been kind of... If they really wanted to go cloud native, they had to go to the cloud. Do you feel like this changes the game? Do on-prem developers, do they want that capability? Will they lean into that capability? Or will they say no, no, the cloud is cool. What's your take? >> I love developers, right? But it's about who makes the decision, who pays the developers, right? So the CXOs in the enterprises, they need exactly, this is why we call the next-gen computing platform, that you can move your code assets. It's very hard to build software, so it's very valuable to an enterprise. I don't want to have limited to one single location or certain computing infrastructure, right? Luckily, we have Kubernetes to be able to move that, but I want to be able to deploy it on-premise if I have to. I want to deploy it, would be able to deploy in the multiple clouds which are available. And that's the key part. And that makes developers happy too, because the code you write has got to run multiple places. So you can build more code, better code, instead of building the same thing multiple places, because a little compiler change here, a little compiler change there. Nobody wants to do portability testing and rewriting, recertified for certain platforms. >> The head of application development or application architecture and the business are ultimately going to dictate that, number one. Number two, you're saying that developers shouldn't care because it can write once, run anywhere. >> That is the promise, and that's the interesting thing which is available now, 'cause people know, thanks to Kubernetes as a container platform and the abstraction which containers provide, and that makes everybody's life easier. But it goes much more higher than the Head of Apps, right? This is the digital transformation strategy, the next generation application the company has to build as a response to a pandemic, as a pivot, as digital transformation, as digital disruption capability. >> I mean, I see a lot of organizations basically modernizing by building some kind of abstraction to their backend systems, modernizing it through cloud native, and then saying, hey, as you were saying Holger, run it anywhere you want, or connect to those cloud apps, or connect across clouds, connect to other on-prem apps, and eventually out to the edge. Is that what you see? >> It's so much easier said than done though. Organizations have struggled so much with this, especially as we start talking about those data intensive app and workloads. Kubernetes and Hadoop? Up until now, organizations haven't been able to deploy those services. So, what we're offering as part of these GreenLake unified analytics services, a Kubernetes runtime. It's not ours. It's top of branch open source. And open source operators like Apache Spark, bringing in Delta Lake libraries, so that if your developer does want to use cloud native tools to build those next generation advanced analytics applications, but prod is still on-premises, they should just be able to pick that code up, and because we are deploying 100% open-source frameworks, the code should run as is. >> So, it seems like the strategy is to basically build, now that's what GreenLake is, right? It's a cloud. It's like, hey, here's your options, use whatever you want. >> Well, and it's your cloud. That's, what's so important about GreenLake, is it's your cloud, in your data center or co-lo, with your data, your tools, and your code. And again, we know that organizations are going to go to a multi or hybrid cloud location and through our management capabilities, we can reach out if you don't want us to control those, not necessarily, that's okay, but we should at least be able to monitor and audit the data that sits in those other locations, the applications that are running, maybe I register your GKE cluster. I don't manage it, but at least through a central pane of glass, I can tell the Head of Applications, what that person's utilization is across these environments. >> You know, and you said something, Matt, that struck, resonated with me, which is this is not trivial. I mean, not as simple to do. I mean what you see, you see a lot of customers or companies, what they're doing, vendors, they'll wrap their stack in Kubernetes, shove it in the cloud, it's essentially hosted stack, right? And, you're kind of taking a different approach. You're saying, hey, we're essentially building a cloud that's going to connect all these estates. And the key is you're going to have to keep, and you are, I think that's probably part of the reason why we're here, announcing stuff very quickly. A lot of innovation has to come out to satisfy that demand that you're essentially talking about. >> Because we've oversimplified things with containers, right? Because containers don't have what matters for data, and what matters for enterprise, which is persistence, right? I have to be able to turn my systems down, or I don't know when I'm going to use that data, but it has to stay there. And that's not solved in the container world by itself. And that's what's coming now, the heavy lifting is done by people like HPE, to provide that persistence of the data across the different deployment platforms. And then, there's just a need to modernize my on-premise platforms. Right? I can't run on a server which is two, three years old, right? It's no longer safe, it doesn't have trusted identity, all the good stuff that you need these days, right? It cannot be operated remotely, or whatever happens there, where there's two, three years, is long enough for a server to have run their course, right? >> Well you're a software guy, you hate hardware anyway, so just abstract that hardware complexity away from you. >> Hardware is the necessary evil, right? It's like TSA. I want to go somewhere, but I have to go through TSA. >> But that's a key point, let me buy a service, if I need compute, give it to me. And if I don't, I don't want to hear about it, right? And that's kind of the direction that you're headed. >> That's right. >> Holger: That's what you're offering. >> That's right, and specifically the services. So GreenLake's been offering infrastructure, virtual machines, IaaS, as a service. And we want to stop talking about that underlying capability because it's a dial tone now. What organizations and these developers want is the service. Give me a service or a function, like I get in the cloud, but I need to get going today. I need it within my security parameters, access to my data, my tools, so I can get going as quickly as possible. And then beyond that, we're going to give you that cloud billing practices. Because, just because you're deploying a cloud native service, if you're still still being deployed via CapEx, you're not solving a lot of problems. So we also need to have that cloud billing model. >> Great. Well Holger, we'll give you the last word, bring us home. >> It's very interesting to have the cloud qualities of subscription-based pricing maintained by HPE as the cloud vendor from somewhere else. And that gives you that flexibility. And that's very important because data is essential to enterprise processes. And there's three reasons why data doesn't go to the cloud, right? We know that. It's privacy residency requirement, there is no cloud infrastructure in the country. It's performance, because network latency plays a role, right? Especially for critical appraisal. And then there's not invented here, right? Remember Charles Phillips saying how old the CIO is? I know if they're going to go to the cloud or not, right? So, it was not invented here. These are the things which keep data on-premise. You know that load, and HP is coming on with a very interesting offering. >> It's physics, it's laws, it's politics, and sometimes it's cost, right? Sometimes it's too expensive to move and migrate. Guys, thanks so much. Great to see you both. >> Matt: Dave, it's always a pleasure. All right, and thank you for watching the Cubes continuous coverage of HPE's big GreenLake announcements. Keep it right there for more great content. (calm music begins)

Published Date : Sep 28 2021

SUMMARY :

And Matt Maccaux is the global field CTO I call it the force marks to digital. So data is main center to everything. 'Cause with oil, you can only use once. So to get to the data, you And so Matt, what are you I have to go to the cloud. capability to connect them, It's the same thing. You're going to unify that, and what, We need to be able to know So it's my choice? It's not always going to be in sync but that's the north star. I need to put it into an object store, But what do you see as for that in the past, I want to give you guys Sorry to jump on your history. And so now, my question to you, Matt, And if it happens to sit in they had to go to the cloud. because the code you write has and the business the company has to build as and eventually out to the edge. to pick that code up, So, it seems like the and audit the data that sits to have to keep, and you are, I have to be able to turn my systems down, guy, you hate hardware anyway, I have to go through TSA. And that's kind of the but I need to get going today. the last word, bring us home. I know if they're going to go Great to see you both. the Cubes continuous coverage

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Frank SlootmanPERSON

0.99+

MattPERSON

0.99+

Matt MaccauxPERSON

0.99+

HolgerPERSON

0.99+

DavePERSON

0.99+

Holger MuellerPERSON

0.99+

twoQUANTITY

0.99+

100%QUANTITY

0.99+

Charles PhillipsPERSON

0.99+

Constellation ResearchORGANIZATION

0.99+

HPEORGANIZATION

0.99+

ExcelTITLE

0.99+

HPORGANIZATION

0.99+

todayDATE

0.99+

three yearsQUANTITY

0.99+

GreenLakeORGANIZATION

0.99+

three reasonsQUANTITY

0.99+

TodayDATE

0.99+

GoogleORGANIZATION

0.99+

two ballsQUANTITY

0.98+

firstQUANTITY

0.98+

OracleORGANIZATION

0.98+

10 years agoDATE

0.98+

EzmeralORGANIZATION

0.98+

both worldsQUANTITY

0.98+

first timeQUANTITY

0.98+

S3TITLE

0.98+

One trendQUANTITY

0.98+

GreenLake ServicesORGANIZATION

0.98+

first oneQUANTITY

0.98+

SnowflakeTITLE

0.97+

both placesQUANTITY

0.97+

KubernetesTITLE

0.97+

onceQUANTITY

0.96+

bothQUANTITY

0.96+

two trendsQUANTITY

0.96+

Delta LakeTITLE

0.95+

GoogleTITLE

0.94+

HadoopTITLE

0.94+

CapExORGANIZATION

0.93+

TableauxTITLE

0.93+

AzureTITLE

0.92+

GKEORGANIZATION

0.92+

CubesORGANIZATION

0.92+

UnixTITLE

0.92+

one single locationQUANTITY

0.91+

single viewQUANTITY

0.9+

SparkTITLE

0.86+

ApacheORGANIZATION

0.85+

pandemicEVENT

0.82+

HadoopORGANIZATION

0.81+

three years oldQUANTITY

0.8+

singleQUANTITY

0.8+

KubernetesORGANIZATION

0.74+

big wavesEVENT

0.73+

Apache SparkORGANIZATION

0.71+

Number twoQUANTITY

0.69+

Steven Mih, Ahana and Sachin Nayyar, Securonix | AWS Startup Showcase


 

>> Voiceover: From theCUBE's Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is theCUBE Conversation. >> Welcome back to theCUBE's coverage of the AWS Startup Showcase. Next Big Thing in AI, Security and Life Sciences featuring Ahana for the AI Trek. I'm your host, John Furrier. Today, we're joined by two great guests, Steven Mih, Ahana CEO, and Sachin Nayyar, Securonix CEO. Gentlemen, thanks for coming on theCUBE. We're talking about the Next-Gen technologies on AI, Open Data Lakes, et cetera. Thanks for coming on. >> Thanks for having us, John. >> Thanks, John. >> What a great line up here. >> Sachin: Thanks, Steven. >> Great, great stuff. Sachin, let's get in and talk about your company, Securonix. What do you guys do? Take us through, I know you've got a slide to help us through this, I want to introduce your stuff first then jump in with Steven. >> Absolutely. Thanks again, Steven. Ahana team for having us on the show. So Securonix, we started the company in 2010. We are the leader in security analytics and response capability for the cybermarket. So basically, this is a category of solutions called SIEM, Security Incident and Event Management. We are the quadrant leaders in Gartner, we now have about 500 customers today and have been plugging away since 2010. Started the company just really focused on analytics using machine learning and an advanced analytics to really find the needle in the haystack, then moved from there to needle in the needle stack using more algorithms, analysis of analysis. And then kind of, I evolved the company to run on cloud and become sort of the biggest security data lake on cloud and provide all the analytics to help companies with their insider threat, cyber threat, cloud solutions, application threats, emerging internally and externally, and then response and have a great partnership with Ahana as well as with AWS. So looking forward to this session, thank you. >> Awesome. I can't wait to hear the news on that Next-Gen SIEM leadership. Steven, Ahana, talk about what's going on with you guys, give us the update, a lot of stuff happening. >> Yeah. Great to be here and thanks for that such, and we appreciate the partnership as well with both Securonix and AWS. Ahana is the open source company based on PrestoDB, which is a project that came out of Facebook and is widely used, one of the fastest growing projects in data analytics today. And we make a managed service for Presto easily on AWS, all cloud native. And we'll be talking about that more during the show. Really excited to be here. We believe in open source. We believe in all the challenges of having data in the cloud and making it easy to use. So thanks for having us again. >> And looking forward to digging into that managed service and why that's been so successful. Looking forward to that. Let's get into the Securonix Next-Gen SIEM leadership first. Let's share the journey towards what you guys are doing here. As the Open Data Lakes on AWS has been a hot topic, the success of data in the cloud, no doubt is on everyone's mind especially with the edge coming. It's just, I mean, just incredible growth. Take us through Sachin, what do you guys got going on? >> Absolutely. Thanks, John. We are hearing about cyber threats every day. No question about it. So in the past, what was happening is companies, what we have done as enterprise is put all of our eggs in the basket of solutions that were evaluating the network data. With cloud, obviously there is no more network data. Now we have moved into focusing on EDR, right thing to do on endpoint detection. But with that, we also need security analytics across on-premise and cloud. And your other solutions like your OT, IOT, your mobile, bringing it all together into a security data lake and then running purpose built analytics on top of that, and then having a response so we can prevent some of these things from happening or detect them in real time versus innovating for hours or weeks and months, which is is obviously too late. So with some of the recent events happening around colonial and others, we all know cybersecurity is on top of everybody's mind. First and foremost, I also want to. >> Steven: (indistinct) slide one and that's all based off on top of the data lake, right? >> Sachin: Yes, absolutely. Absolutely. So before we go into on Securonix, I also want to congratulate everything going on with the new cyber initiatives with our government and just really excited to see some of the things that the government is also doing in this space to bring, to have stronger regulation and bring together the government and the private sector. From a Securonix perspective, today, we have one third of the fortune 500 companies using our technology. In addition, there are hundreds of small and medium sized companies that rely on Securonix for their cyber protection. So what we do is, again, we are running the solution on cloud, and that is very important. It is not just important for hosting, but in the space of cybersecurity, you need to have a solution, which is not, so where we can update the threat models and we can use the intelligence or the Intel that we gather from our customers, partners, and industry experts and roll it out to our customers within seconds and minutes, because the game is real time in cybersecurity. And that you can only do in cloud where you have the complete telemetry and access to these environments. When we go on-premise traditionally, what you will see is customers are even thinking about pushing the threat models through their standard Dev test life cycle management, and which is just completely defeating the purpose. So in any event, Securonix on the cloud brings together all the data, then runs purpose-built analytics on it. Helps you find very few, we are today pulling in several million events per second from our customers, and we provide just a very small handful of events and reduce the false positives so that people can focus on them. Their security command center can focus on that and then configure response actions on top of that. So we can take action for known issues and have intelligence in all the layers. So that's kind of what the Securonix is focused on. >> Steven, he just brought up, probably the most important story in technology right now. That's ransomware more than, first of all, cybersecurity in general, but ransomware, he mentioned some of the government efforts. Some are saying that the ransomware marketplace is bigger than some governments, nation state governments. There's a business model behind it. It's highly active. It's dominating the scene and it's a real threat. This is the new world we're living in, cloud creates the refactoring capabilities. We're hearing that story here with Securonix. How does Presto and Securonix work together? Because I'm connecting the dots here in real time. I think you're going to go there. So take us through because this is like the most important topic happening. >> Yeah. So as Sachin said, there's all this data that needs to go into the cloud and it's all moving to the cloud. And there's a massive amounts of data and hundreds of terabytes, petabytes of data that's moving into the data lakes and that's the S3-based data lakes, which are the easiest, cheapest, commodified place to put all this data. But in order to deliver the results that Sachin's company is driving, which is intelligence on when there's a ransomware or possibility, you need to have analytics on them. And so Presto is the open source project that is a open source SQL query engine for data lakes and other data sources. It was created by Facebook as part of the Linux foundation, something called Presto foundation. And it was built to replace the complicated Hadoop stack in order to then drive analytics at very lightning fast queries on large, large sets of data. And so Presto fits in with this Open Data Lake analytics movement, which has made Presto one of the fastest growing projects out there. >> What is an Open Data Lake? Real quick for the audience who wants to learn on what it means. Does is it means it's open source in the Linux foundation or open meaning it's open to multiple applications? What does that even mean? >> Yeah. Open Data Lake analytics means that you're, first of all, your data lake has open formats. So it is made up of say something called the ORC or Parquet. And these are formats that any engine can be used against. That's really great, instead of having locked in data types. Data lakes can have all different types of data. It can have unstructured, semi-structured data. It's not just the structured data, which is typically in your data warehouses. There's a lot more data going into the Open Data Lake. And then you can, based on what workload you're looking to get benefit from, the insights come from that, and actually slide two covers this pictorially. If you look on the left here on slide two, the Open Data Lake is where all the data is pulling. And Presto is the layer in between that and the insights which are driven by the visualization, reporting, dashboarding, BI tools or applications like in Securonix case. And so analytics are now being driven by every company for not just industries of security, but it's also for every industry out there, retail, e-commerce, you name it. There's a healthcare, financials, all are looking at driving more analytics for their SaaSified applications as well as for their own internal analysts, data scientists, and folks that are trying to be more data-driven. >> All right. Let's talk about the relationship now with where Presto fits in with Securonix because I get the open data layer. I see value in that. I get also what we're talking about the cloud and being faster with the datasets. So how does, Sachin' Securonix and Ahana fit in together? >> Yeah. Great question. So I'll tell you, we have two customers. I'll give you an example. We have two fortune 10 customers. One has moved most of their operations to the cloud and another customer which is in the process, early stage. The data, the amount of data that we are getting from the customer who's moved fully to the cloud is 20 times, 20 times more than the customer who's in the early stages of moving to the cloud. That is because the ability to add this level of telemetry in the cloud, in this case, it happens to be AWS, Office 365, Salesforce and several other rescalers across several other cloud technologies. But the level of logging that we are able to get the telemetry is unbelievable. So what it does is it allows us to analyze more, protect the customers better, protect them in real time, but there is a cost and scale factor to that. So like I said, when you are trying to pull in billions of events per day from a customer billions of events per day, what the customers are looking for is all of that data goes in, all of data gets enriched so that it makes sense to a normal analyst and all of that data is available for search, sometimes 90 days, sometimes 12 months. And then all of that data is available to be brought back into a searchable format for up to seven years. So think about the amount of data we are dealing with here and we have to provide a solution for this problem at a price that is affordable to the customer and that a medium-sized company as well as a large organization can afford. So after a lot of our analysis on this and again, Securonix is focused on cyber, bringing in the data, analyzing it, so after a lot of our analysis, we zeroed in on S3 as the core bucket where this data needs to be stored because the price point, the reliability, and all the other functions available on top of that. And with that, with S3, we've created a great partnership with AWS as well as with Snowflake that is providing this, from a data lake perspective, a bigger data lake, enterprise data lake perspective. So now for us to be able to provide customers the ability to search that data. So data comes in, we are enriching it. We are putting it in S3 in real time. Now, this is where Presto comes in. In our research, Presto came out as the best search engine to sit on top of S3. The engine is supported by companies like Facebook and Uber, and it is open source. So open source, like you asked the question. So for companies like us, we cannot depend on a very small technology company to offer mission critical capabilities because what if that company gets acquired, et cetera. In the case of open source, we are able to adopt it. We know there is a community behind it and it will be kind of available for us to use and we will be able to contribute in it for the longterm. Number two, from an open source perspective, we have a strong belief that customers own their own data. Traditionally, like Steven used the word locked in, it's a key term, customers have been locked in into proprietary formats in the past and those days are over. You should be, you own the data and you should be able to use it with us and with other systems of choice. So now you get into a data search engine like Presto, which scales independently of the storage. And then when we start looking at Presto, we came across Ahana. So for every open source system, you definitely need a sort of a for-profit company that invests in the community and then that takes the community forward. Because without a company like this, the community will die. So we are very excited about the partnership with Presto and Ahana. And Ahana provides us the ability to take Presto and cloudify it, or make the cloud operations work plus be our conduit to the Ahana community. Help us speed up certain items on the roadmap, help our team contribute to the community as well. And then you have to take a solution like Presto, you have to put it in the cloud, you have to make it scale, you have to put it on Kubernetes. Standard thing that you need to do in today's world to offer it as sort of a micro service into our architecture. So in all of those areas, that's where our partnership is with Ahana and Presto and S3 and we think, this is the search solution for the future. And with something like this, very soon, we will be able to offer our customers 12 months of data, searchable at extremely fast speeds at very reasonable price points and you will own your own data. So it has very significant business benefits for our customers with the technology partnership that we have set up here. So very excited about this. >> Sachin, it's very inspiring, a couple things there. One, decentralize on your own data, having a democratized, that piece is killer. Open source, great point. >> Absolutely. >> Company goes out of business, you don't want to lose the source code or get acquired or whatever. That's a key enabler. And then three, a fast managed service that has a commercial backing behind it. So, a great, and by the way, Snowflake wasn't around a couple of years ago. So like, so this is what we're talking about. This is the cloud scale. Steven, take us home with this point because this is what innovation looks like. Could you share why it's working? What's some of the things that people could walk away with and learn from as the new architecture for the new NextGen cloud is here, so this is a big part of and share how this works? >> That's right. As you heard from Sachin, every company is becoming data-driven and analytics are central to their business. There's more data and it needs to be analyzed at lower cost without the locked in and people want that flexibility. And so a slide three talks about what Ahana cloud for Presto does. It's the best Presto out of the box. It gives you very easy to use for your operations team. So it can be one or two people just managing this and they can get up to speed very quickly in 30 minutes, be up and running. And that jump starts their movement into an Open Data Lake analytics architecture. That architecture is going to be, it is the one that is at Facebook, Uber, Twitter, other large web scale, internet scale companies. And with the amount of data that's occurring, that's now becoming the standard architecture for everyone else in the future. And so just to wrap, we're really excited about making that easy, giving an open source solution because the open source data stack based off of data lake analytics is really happening. >> I got to ask you, you've seen many waves on the industry. Certainly, you've been through the big data waves, Steven. Sachin, you're on the cutting edge and just the cutting edge billions of signals from one client alone is pretty amazing scale and refactoring that value proposition is super important. What's different from 10 years ago when the Hadoop, you mentioned Hadoop earlier, which is RIP, obviously the cloud killed it. We all know that. Everyone kind of knows that. But like, what's different now? I mean, skeptics might say, I don't believe you, but it's just crazy. There's no way it works. S3 costs way too much. Why is this now so much more of an attractive proposition? What do you say the naysayers out there? With Steve, we'll start with you and then Sachin, I want you to like weigh in too. >> Yeah. Well, if you think about the Hadoop era and if you look at slide three, it was a very complicated system that was done mainly on-prem. And you'd have to go and set up a big data team and a rack and stack a bunch of servers and then try to put all this stuff together and candidly, the results and the outcomes of that were very hard to get unless you had the best possible teams and invested a lot of money in this. What you saw in this slide was that, that right hand side which shows the stack. Now you have a separate compute, which is based off of Intel based instances in the cloud. We run the best in that and they're part of the Presto foundation. And that's now data lakes. Now the distributed compute engines are the ones that have become very much easier. So the big difference in what I see is no longer called big data. It's just called data analytics because it's now become commodified as being easy and the bar is much, much lower, so everyone can get the benefit of this across industries, across organizations. I mean, that's good for the world, reduces the security threats, the ransomware, in the case of Securonix and Sachin here. But every company can benefit from this. >> Sachin, this is really as an example in my mind and you can comment too on if you'd believe or not, but replatform with the cloud, that's a no brainer. People do that. They did it. But the value is refactoring in the cloud. It's thinking differently with the assets you have and making sure you're using the right pieces. I mean, there's no brainer, you know it's good. If it costs more money to stand up something than to like get value out of something that's operating at scale, much easier equation. What's your thoughts on this? Go back 10 years and where we are now, what's different? I mean, replatforming, refactoring, all kinds of happening. What's your take on all this? >> Agreed, John. So we have been in business now for about 10 to 11 years. And when we started my hair was all black. Okay. >> John: You're so silly. >> Okay. So this, everything has happened here is the transition from Hadoop to cloud. Okay. This is what the result has been. So people can see it for themselves. So when we started off with deep partnerships with the Hadoop providers and again, Hadoop is the foundation, which has now become EMR and everything else that AWS and other companies have picked up. But when you start with some basic premise, first, the racking and stacking of hardware, companies having to project their entire data volume upfront, bringing the servers and have 50, 100, 500 servers sitting in their data centers. And then when there are spikes in data, or like I said, as you move to the cloud, your data volume will increase between five to 20x and projecting for that. And then think about the agility that it will take you three to six months to bring in new servers and then bring them into the architecture. So big issue. Number two big issue is that the backend of that was built for HDFS. So Hadoop in my mind was built to ingest large amounts of data in batches and then perform some spark jobs on it, some analytics. But we are talking in security about real time, high velocity, high variety data, which has to be available in real time. It wasn't built for that, to be honest. So what was happening is, again, even if you look at the Hadoop companies today as they have kind of figured, kind of define their next generation, they have moved from HDFS to now kind of a cloud based platform capability and have discarded the traditional HDFS architecture because it just wasn't scaling, wasn't searching fast enough, wasn't searching fast enough for hundreds of analysts at the same time. And then obviously, the servers, et cetera wasn't working. Then when we worked with the Hadoop companies, they were always two to three versions behind for the individual services that they had brought together. And again, when you're talking about this kind of a volume, you need to be on the cutting edge always of the technologies underneath that. So even while we were working with them, we had to support our own versions of Kafka, Solr, Zookeeper, et cetera to really bring it together and provide our customers this capability. So now when we have moved to the cloud with solutions like EMR behind us, AWS has invested in in solutions like EMR to make them scalable, to have scale and then scale out, which traditional Hadoop did not provide because they missed the cloud wave. And then on top of that, again, rather than throwing data in that traditional older HDFS format, we are now taking the same format, the parquet format that it supports, putting it in S3 and now making it available and using all the capabilities like you said, the refactoring of that is critical. That rather than on-prem having servers and redundancies with S3, we get built in redundancy. We get built in life cycle management, high degree of confidence data reliability. And then we get all this innovation from companies like, from groups like Presto, companies like Ahana sitting on double that S3. And the last item I would say is in the cloud we are now able to offer multiple, have multiple resilient options on our side. So for example, with us, we still have some premium searching going on with solutions like Solr and Elasticsearch, then you have Presto and Ahana providing majority of our searching, but we still have Athena as a backup in case something goes down in the architecture. Our queries will spin back up to Athena, AWS service on Presto and customers will still get served. So all of these options, but what it doesn't cost us anything, Athena, if we don't use it, but all of these options are not available on-prem. So in my mind, I mean, it's a whole new world we are living in. It is a world where now we have made it possible for companies to even enterprises to even think about having true security data lakes, which are useful and having real-time analytics. From my perspective, I don't even sign up today for a large enterprise that wants to build a data lake on-prem because I know that is not, that is going to be a very difficult project to make it successful. So we've come a long way and there are several details around this that we've kind of endured through the process, but very excited where we are today. >> Well, we certainly follow up with theCUBE on all your your endeavors. Quickly on Ahana, why them, why their solution? In your words, what would be the advice you'd give me if I'm like, okay, I'm looking at this, why do I want to use it, and what's your experience? >> Right. So the standard SQL query engine for data lake analytics, more and more people have more data, want to have something that's based on open source, based on open formats, gives you that flexibility, pay as you go. You only pay for what you use. And so it proved to be the best option for Securonix to create a self-service system that has all the speed and performance and scalability that they need, which is based off of the innovation from the large companies like Facebook, Uber, Twitter. They've all invested heavily. We contribute to the open source project. It's a vibrant community. We encourage people to join the community and even Securonix, we'll be having engineers that are contributing to the project as well. I think, is that right Sachin? Maybe you could share a little bit about your thoughts on being part of the community. >> Yeah. So also why we chose Ahana, like John said. The first reason is you see Steven is always smiling. Okay. >> That's for sure. >> That is very important. I mean, jokes apart, you need a great partner. You need a great partner. You need a partner with a great attitude because this is not a sprint, this is a marathon. So the Ahana founders, Steven, the whole team, they're world-class, they're world-class. The depth that the CTO has, his experience, the depth that Dipti has, who's running the cloud solution. These guys are world-class. They are very involved in the community. We evaluated them from a community perspective. They are very involved. They have the depth of really commercializing an open source solution without making it too commercial. The right balance, where the founding companies like Facebook and Uber, and hopefully Securonix in the future as we contribute more and more will have our say and they act like the right stewards in this journey and then contribute as well. So and then they have chosen the right niche rather than taking portions of the product and making it proprietary. They have put in the effort towards the cloud infrastructure of making that product available easily on the cloud. So I think it's sort of a no-brainer from our side. Once we chose Presto, Ahana was the no-brainer and just the partnership so far has been very exciting and I'm looking forward to great things together. >> Likewise Sachin, thanks so much for that. And we've only found your team, you're world-class as well, and working together and we look forward to working in the community also in the Presto foundation. So thanks for that. >> Guys, great partnership. Great insight and really, this is a great example of cloud scale, cloud value proposition as it unlocks new benefits. Open source, managed services, refactoring the opportunities to create more value. Stephen, Sachin, thank you so much for sharing your story here on open data lakes. Can open always wins in my mind. This is theCUBE we're always open and we're showcasing all the hot startups coming out of the AWS ecosystem for the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (bright music)

Published Date : Jun 24 2021

SUMMARY :

leaders all around the world, of the AWS Startup Showcase. to help us through this, and provide all the what's going on with you guys, in the cloud and making it easy to use. Let's get into the Securonix So in the past, what was So in any event, Securonix on the cloud Some are saying that the and that's the S3-based data in the Linux foundation or open meaning And Presto is the layer in because I get the open data layer. and all the other functions that piece is killer. and learn from as the new architecture for everyone else in the future. obviously the cloud killed it. and the bar is much, much lower, But the value is refactoring in the cloud. So we have been in business and again, Hadoop is the foundation, be the advice you'd give me system that has all the speed The first reason is you see and just the partnership so in the community also in for the AWS Startup Showcase.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
StevenPERSON

0.99+

SachinPERSON

0.99+

JohnPERSON

0.99+

StevePERSON

0.99+

SecuronixORGANIZATION

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

Steven MihPERSON

0.99+

50QUANTITY

0.99+

UberORGANIZATION

0.99+

2010DATE

0.99+

StephenPERSON

0.99+

Sachin NayyarPERSON

0.99+

FacebookORGANIZATION

0.99+

20 timesQUANTITY

0.99+

oneQUANTITY

0.99+

12 monthsQUANTITY

0.99+

threeQUANTITY

0.99+

TwitterORGANIZATION

0.99+

AhanaPERSON

0.99+

two customersQUANTITY

0.99+

90 daysQUANTITY

0.99+

AhanaORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

100QUANTITY

0.99+

30 minutesQUANTITY

0.99+

PrestoORGANIZATION

0.99+

hundreds of terabytesQUANTITY

0.99+

fiveQUANTITY

0.99+

FirstQUANTITY

0.99+

OneQUANTITY

0.99+

twoQUANTITY

0.99+

hundredsQUANTITY

0.99+

six monthsQUANTITY

0.99+

S3TITLE

0.99+

ZookeeperTITLE

0.99+

Clayton Coleman, Red Hat | Red Hat Summit 2021 Virtual Experience


 

>>mhm Yes, Welcome back to the cubes coverage of red hat summit 2021 virtual, which we were in person this year but we're still remote. We still got the Covid coming around the corner. Soon to be in post. Covid got a great guest here, Clayton Coleman architect that red hat cuba love and I've been on many times expanded role again this year. More cloud, more cloud action. Great, great to see you. Thanks for coming on. >>It's a pleasure >>to be here. So great to see you were just riffing before we came on camera about distributed computing uh and the future of the internet, how it's all evolving, how much fun it is, how it's all changing still. The game is still the same, all that good stuff. But here at Red had some and we're gonna get into that, but I want to just get into the hard news and the real big, big opportunities here you're announcing with red hat new managed cloud services portfolio, take us through that. >>Sure. We're continuing to evolve our open shift managed offerings which has grown now to include um the redhead open shift service on amazon to complement our as your redhead open shift service. Um that means that we have um along with our partnership on IBM cloud and open ship dedicated on both a W S and G C P. We now have um managed open shift on all of the major clouds. And along with that we are bringing in and introducing the first, I think really the first step what we see as uh huh growing and involving the hybrid cloud ecosystem on top of open shift and there's many different ways to slice that, but it's about bringing capabilities on top of open shift in multiple environments and multiple clouds in ways that make developers and operation teams more productive because at the heart of it, that's our goal for open shift. And the broader, open source ecosystem is do what makes all of us safer, more, uh, more productive and able to deliver business value? >>Yeah. And that's a great steak you guys put in the ground. Um, and that's great messaging, great marketing, great value proposition. I want to dig into a little bit with you. I mean, you guys have, I think the only native offering on all the clouds out there that I know of, is that true? I mean, you guys have, it's not just, you know, you support AWS as your and I B M and G C P, but native offerings. >>We do not have a native offering on GCPD offered the same service. And this is actually interesting as we've evolved our approach. You know, everyone, when we talk about hybrid, Hybrid is, um, you know, dealing with the realities of the computing world, We live in, um, working with each of the major clouds, trying to deliver the best immigration possible in a way that drives that consistency across those environments. And so actually are open shift dedicated on AWS service gave us the inspiration a lot of the basic foundations for what became the integrated Native service. And we've worked with amazon very closely to make sure that that does the right thing for customers who have chosen amazon. And likewise, we're trying to continue to deliver the best experience, the best operational reliability that we can so that the choice of where you run your cloud, um, where you run your applications, um, matches the decisions you've already made and where your future investments are gonna be. So we want to be where customers are, but we also want to give you that consistency. That has been a hallmark of um of open shift since the beginning. >>Yeah. And thanks for clarifying, I appreciate that because the manage serves on GCB rest or native. Um let me ask about the application services because Jeff Barr from AWS posted a few weeks ago amazon celebrated their 15th birthday. They're still teenagers uh relatively speaking. But one comment he made that he that was interesting to me. And this applies kind of this cloud native megatrend happening is he says the A. P. I. S are basically the same and this brings up the hybrid environment. You guys are always been into the api side of the management with the cloud services and supporting all that. As you guys look at this ecosystem in open source. How is the role of A PS and these integrations? Because without solid integration all these services could break down and certainly the open source, more and more people are coding. So take me through how you guys look at these applications services because many people are predicting more service is going to be on boarding faster than ever before. >>It's interesting. So um for us working across multiple cloud environments, there are many similarities in those mps, but for every similarity there is a difference and those differences are actually what dr costs and drive complexity when you're integrating. Um and I think a lot of the role of this is, you know, the irresponsible to talk about the role of an individual company in the computing ecosystem moving to cloud native because as many of these capabilities are unlocked by large cloud providers and transformations in the kinds of software that we run at scale. You know, everybody is a participant in that. But then you look at the broad swath of developer and operator ecosystem and it's the communities of people who paper over those differences, who write run books and build um you know, the policies and who build the experience and the automation. Um not just in individual products or an individual clouds, but across the open source ecosystem. Whether it's technologies like answerable or Terror form, whether it's best practices websites around running kubernetes, um every every part of the community is really involved in driving up uh driving consistency, um driving predictability and driving reliability and what we try to do is actually work within those constraints um to take the ecosystem and to push it a little bit further. So the A. P. I. S. May be similar, but over time those differences can trip you up. And a lot of what I think we talked about where the industry is going, where where we want to be is everyone ultimately is going to own some responsibility for keeping their services running and making sure that their applications and their businesses are successful. The best outcome would be that the A. P. R. S are the same and they're open and that both the cloud providers and the open source ecosystem and vendors and partners who drive many of these open source communities are actually all working together to have the most consistent environment to make portability a true strength. But when someone does differentiate and has a true best to bring service, we don't want to build artificial walls between those. I mean, I mean, that's hybrid cloud is you're going to make choices that make sense for you if we tell people that their choices don't work or they can't integrate or, you know, an open source project doesn't support this vendor, that vendor, we're actually leaving a lot of the complexity buried in those organizations. So I think this is a great time to, as we turn over for cloud. Native looking at how we, as much as possible try to drive those ap is closer together and the consistency underneath them is both a community and a vendor. And uh for red hat, it's part of what we do is a core mission is trying to make sure that that consistency is actually real. You don't have to worry about those details when you're ignoring them. >>That's a great point. Before I get into some architectural impact, I want to get your thoughts on um, the, this trends going on, Everyone jumps on the bandwagon. You know, you say, oh yeah, I gotta, I want a data cloud, you know, everything is like the new, you know, they saw Snowflake Apollo, I gotta have some, I got some of that data, You've got streaming data services, you've got data services and native into the, these platforms. But a lot of these companies think it's just, you're just gonna get a data cloud, just, it's so easy. Um, they might try something and then they get stuck with it or they have to re factor, >>how do you look >>at that as an architect when you have these new hot trends like say a data cloud, how should customers be thinking about kicking the tires on services like that And how should they think holistically around architect in that? >>There's a really interesting mindset is, uh, you know, we deal with this a lot. Everyone I talked to, you know, I've been with red hat for 10 years now in an open shift. All 10 years of that. We've gone through a bunch of transformations. Um, and every time I talked to, you know, I've talked to the same companies and organizations over the last 10 years, each point in their evolution, they're making decisions that are the right decision at the time. Um, they're choosing a new capability. So platform as a service is a great example of a capability that allowed a lot of really large organizations to standardize. Um, that ties into digital transformation. Ci CD is another big trend where it's an obvious wind. But depending on where you jumped on the bandwagon, depending on when you adopted, you're going to make a bunch of different trade offs. And that, that process is how do we improve the ability to keep all of the old stuff moving forward as well? And so open api is open standards are a big part of that, but equally it's understanding the trade offs that you're going to make and clearly communicating those so with data lakes. Um, there was kind of the 1st and 2nd iterations of data lakes, there was the uh, in the early days these capabilities were knew they were based around open source software. Um, a lot of the Hadoop and big data ecosystem, you know, started based on some of these key papers from amazon and google and others taking infrastructure ideas bringing them to scale. We went through a whole evolution of that and the input and the output of that basically let us into the next phase, which I think is the second phase of data leak, which is we have this data are tools are so much better because of that first phase that the investments we made the first time around, we're going to have to pay another investment to make that transformation. And so I've actually, I never want to caution someone not to jump early, but it has to be the right jump and it has to be something that really gives you a competitive advantage. A lot of infrastructure technology is you should make the choices that you make one or two big bets and sometimes people say this, you call it using their innovation tokens. You need to make the bets on big technologies that you operate more effectively at scale. It is somewhat hard to predict that. I certainly say that I've missed quite a few of the exciting transformations in the field just because, um, it wasn't always obvious that it was going to pay off to the degree that um, customers would need. >>So I gotta ask you on the real time applications side of it, that's been a big trend, certainly in cloud. But as you look at hybrid hybrid cloud environments, for instance, streaming data has been a big issue. Uh any updates there from you on your managed service? >>That's right. So one of we have to manage services um that are both closely aligned three managed services that are closely aligned with data in three different ways. And so um one of them is redhead open shift streams for Apache Kafka, which is managed cloud service that focuses on bringing that streaming data and letting you run it across multiple environments. And I think that, you know, we get to the heart of what's the purpose of uh managed services is to reduce operational overhead and to take responsibilities that allow users to focus on the things that actually matter for them. So for us, um managed open shift streams is really about the flow of data between applications in different environments, whether that's from the edge to an on premise data center, whether it's an on premise data center to the cloud. And increasingly these services which were running in the public cloud, increasingly these services have elements that run in the public cloud, but also key elements that run close to where your applications are. And I think that bridge is actually really important for us. That's a key component of hybrid is connecting the different locations and different footprints. So for us the focus is really how do we get data moving to the right place that complements our API management service, which is an add on for open ship dedicated, which means once you've brought the data and you need to expose it back out to other applications in the environment, you can build those applications on open shift, you can leverage the capabilities of open shift api management to expose them more easily, both to end customers or to other applications. And then our third services redhead open shift data science. Um and that is a, an integration that makes it easy for data scientists in a kubernetes environment. On open shift, they easily bring together the data to make, to analyze it and to help route it is appropriate. So those three facets for us are pretty important. They can be used in many different ways, but that focus on the flow of data across these different environments is really a key part of our longer term strategy. >>You know, all the customer checkboxes there you mentioned earlier. I mean I'll just summarize that that you said, you know, obviously value faster application velocity time to value. Those are like the checkboxes, Gardner told analysts check those lower complexity. Oh, we do the heavy lifting, all cloud benefits, so that's all cool. Everyone kind of gets that, everyone's been around cloud knows devops all those things come into play right now. The innovation focuses on operations and day to operations, becoming much more specific. When people say, hey, I've done some lift and shift, I've done some Greenfield born in the cloud now, it's like, whoa, this stuff, I haven't seen this before. As you start scaling. So this brings up that concept and then you add in multi cloud and hybrid cloud, you gotta have a unified experience. So these are the hot areas right this year, I would say, you know, that day to operate has been around for a while, but this idea of unification around environments to be fully distributed for developers is huge. >>How do you >>architect for that? This is the number one question I get. And I tease out when people are kind of talking about their environments that challenges their opportunities, they're really trying to architect, you know, the foundation that building to be um future proof, they don't want to get screwed over when they have, they realize they made a decision, they weren't thinking about day to operation or they didn't think about the unified experience across clouds across environments and services. This is huge. What's your take on this? >>So this is um, this is probably one of the hardest questions I think I could get asked, which is uh looking into the crystal ball, what are the aspects of today's environments that are accidental complexity? That's really just a result of the slow accretion of technologies and we all need to make bets when, when the time is right within the business, um and which parts of it are essential. What are the fundamental hard problems and so on. The accidental complexity side for red hat, it's really about um that consistent environment through open shift bringing capabilities, our connection to open source and making sure that there's an open ecosystem where um community members, users vendors can all work together to um find solutions that work for them because there's not, there's no way to solve for all of computing. It's just impossible. I think that is kind of our that's our development process and that's what helps make that accidental complexity of all that self away over time. But in the essential complexity data is tied the location, data has gravity data. Lakes are a great example of because data has gravity. The more data that you bring together, the bigger the scale the tools you can bring, you can invest in more specialized tools. I've almost do that as a specialization centralization. There's a ton of centralization going on right now at the same time that these new technologies are available to make it easier and easier. Whether that's large scale automation um with conflict management technologies, whether that's kubernetes and deploying it in multiple sites in multiple locations and open shift, bringing consistency so that you can run the apps the same way. But even further than that is concentrating, mhm. More of what would have typically been a specialist problem, something that you build a one off around in your organization to work through the problem. We're really getting to a point where pretty soon now there is a technology or a service for everyone. How do you get the data into that service out? How do you secure it? How do you glue it together? Um I think of, you know, some people might call this um you know, the ultimate integration problem, which is we're going to have all of this stuff and all of these places, what are the core concepts, location, security, placement, topology, latency, where data resides, who's accessing that data, We think of these as kind of the building blocks of where we're going next. So for us trying to make investments in, how do we make kubernetes work better across lots of environments. I have a coupon talk coming up this coupon, it's really exciting for me to talk about where we're going with, you know, the evolution of kubernetes, bringing the different pieces more closely together across multiple environments. But likewise, when we talk about our managed services, we've approached the strategy for managed services as it's not just the service in isolation, it's how it connects to the other pieces. What can we learn in the community, in our services, working with users that benefits that connectivity. So I mentioned the open shift streams connecting up environments, we'd really like to improve how applications connect across disparate environments. That's a fundamental property of if you're going to have data uh in one geographic region and you didn't move services closer to that well, those services I need to know and encode and have that behavior to get closer to where the data is, whether it's one data lake or 10. We gotta have that flexibility in place. And so those obstructions are really, and to >>your point about the building blocks where you've got to factor in those building blocks, because you're gonna need to understand the latency impact, that's going to impact how you're gonna handle the compute piece, that's gonna handle all these things are coming into play. So, again, if you're mindful of the building blocks, just as a cloud concept, um, then you're okay. >>We hear this a lot. Actually, there's real challenges in the, the ecosystem of uh, we see a lot of the problems of I want to help someone automate and improved, but the more balkanize, the more spread out, the more individual solutions are in play, it's harder for someone to bring their technology to bear to help solve the problem. So looking for ways that we can um, you know, grease the skids to build the glue. I think open source works best when it's defining de facto solutions that everybody agrees on that openness and the easy access is a key property that makes de facto standards emerged from open source. What can we do to grow defacto standards around multi cloud and application movement and application interconnect I think is a very, it's already happening and what can we do to accelerate it? That's it. >>Well, I think you bring up a really good point. This is probably a follow up, maybe a clubhouse talk or you guys will do a separate session on this. But I've been riffing on this idea of uh, today's silos, tomorrow's component, right, or module. If most people don't realize that these silos can be problematic if not thought through. So you have to kill the silos to bring in kind of an open police. So if you're open, not closed, you can leverage a monolith. Today's monolithic app or full stack could be tomorrow's building block unless you don't open up. So this is where interesting design question comes in, which is, it's okay to have pre existing stuff if you're open about it. But if you stay siloed, you're gonna get really stuck >>and there's going to be more and more pre existing stuff I think, you know, uh even the data lake for every day to lake, there is a huge problem of how to get data into the data lake or taking existing applications that came from the previous data link. And so there's a, there's a natural evolutionary process where let's focus on the mechanisms that actually move that day to get that data flowing. Um, I think we're still in the early phases of thinking about huge amounts of applications. Microservices or you know, 10 years old in the sense of it being a fairly common industry talking point before that we have service oriented architecture. But the difference now is that we're encouraging and building one developer, one team might run several services. They might use three or four different sas vendors. They might depend on five or 10 or 15 cloud services. Those integration points make them easier. But it's a new opportunity for us to say, well, what are the differences to go back to? The point is you can keep your silos, we just want to have great integration in and out of >>those. Exactly, they don't have to you have to break down the silos. So again, it's a tried and true formula integration, interoperability and abstracting away the complexity with some sort of new software abstraction layer. You bring that to play as long as you can paddle with that, you apply the new building blocks, you're classified. >>It sounds so that's so simple, doesn't it? It does. And you know, of course it'll take us 10 years to get there. And uh, you know, after cloud native will be will be galactic native or something like that. You know, there's always going to be a new uh concept that we need to work in. I think the key concepts we're really going after our everyone is trying to run resilient and reliable services and the clouds give us in the clouds take it away. They give us those opportunities to have some of those building blocks like location of geographic hardware resources, but they will always be data that spread. And again, you still have to apply those principles to the cloud to get the service guarantees that you need. I think there's a completely untapped area for helping software developers and software teams understand the actual availability and guarantees of the underlying environment. It's a property of the services you run with. If you're using a disk in a particular availability zone, that's a property of your application. I think there's a rich area that hasn't been mined yet. Of helping you understand what your effective service level goals which of those can be met. Which cannot, it doesn't make a lot of sense in a single cluster or single machine or a single location world the moment you start to talk about, Well I have my data lake. Well what are the ways my data leg can fail? How do we look at your complex web of interdependencies and say, well clearly if you lose this cloud provider, you're going to lose not just the things that you have running there, but these other dependencies, there's a lot of, there's a lot of next steps that we're just learning what happens when a major cloud goes down for a day or a region of a cloud goes down for a day. You still have to design and work around those >>cases. It's distributed computing. And again, I love the space where galactic cloud, you got SpaceX? Where's Cloud X? I mean, you know, space is the next frontier. You know, you've got all kinds of action happening in space. Great space reference there. Clayton, Great insight. Thanks for coming on. Uh, Clayton Coleman architect at red Hat. Clayton, Thanks for coming on. >>Pretty pleasure. >>Always. Great chat. I'm talking under the hood. What's going on in red hats? New managed cloud service portfolio? Again, the world's getting complex, abstract away. The complexities with software Inter operate integrate. That's the key formula with the cloud building blocks. I'm john ferry with the cube. Thanks for watching. Yeah.

Published Date : Apr 28 2021

SUMMARY :

We still got the Covid coming around the corner. So great to see you were just riffing before we came on camera about distributed computing in and introducing the first, I think really the first step what we see as uh I mean, you guys have, it's not just, you know, you support AWS as so that the choice of where you run your cloud, um, So take me through how you guys Um and I think a lot of the role of this is, you know, the irresponsible to I want a data cloud, you know, everything is like the new, you know, they saw Snowflake Apollo, I gotta have some, But depending on where you jumped on the bandwagon, depending on when you adopted, you're going to make a bunch of different trade offs. So I gotta ask you on the real time applications side of it, that's been a big trend, And I think that, you know, we get to the heart of what's the purpose of You know, all the customer checkboxes there you mentioned earlier. you know, the foundation that building to be um future proof, shift, bringing consistency so that you can run the apps the same way. latency impact, that's going to impact how you're gonna handle the compute piece, that's gonna handle all you know, grease the skids to build the glue. So you have to kill the silos to bring in kind and there's going to be more and more pre existing stuff I think, you know, uh even the data lake for You bring that to play as long as you can paddle with that, you apply the new building blocks, the things that you have running there, but these other dependencies, there's a lot of, there's a lot of next I mean, you know, space is the next frontier. That's the key formula with the cloud building blocks.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jeff BarrPERSON

0.99+

fiveQUANTITY

0.99+

amazonORGANIZATION

0.99+

oneQUANTITY

0.99+

ClaytonPERSON

0.99+

GardnerPERSON

0.99+

10 yearsQUANTITY

0.99+

threeQUANTITY

0.99+

CovidPERSON

0.99+

1stQUANTITY

0.99+

AWSORGANIZATION

0.99+

Clayton ColemanPERSON

0.99+

first phaseQUANTITY

0.99+

three facetsQUANTITY

0.99+

10QUANTITY

0.99+

first timeQUANTITY

0.99+

TodayDATE

0.99+

john ferryPERSON

0.99+

fourQUANTITY

0.99+

one teamQUANTITY

0.99+

RedORGANIZATION

0.99+

googleORGANIZATION

0.99+

two big betsQUANTITY

0.99+

2nd iterationsQUANTITY

0.99+

second phaseQUANTITY

0.99+

firstQUANTITY

0.99+

tomorrowDATE

0.99+

single machineQUANTITY

0.99+

15 cloud servicesQUANTITY

0.98+

15th birthdayQUANTITY

0.98+

this yearDATE

0.98+

red HatORGANIZATION

0.98+

bothQUANTITY

0.98+

each pointQUANTITY

0.98+

eachQUANTITY

0.98+

third servicesQUANTITY

0.98+

one commentQUANTITY

0.98+

todayDATE

0.98+

a dayQUANTITY

0.97+

IBMORGANIZATION

0.97+

first stepQUANTITY

0.97+

red hat summit 2021EVENT

0.96+

three different waysQUANTITY

0.96+

Red HatORGANIZATION

0.96+

ApacheORGANIZATION

0.95+

Cloud XTITLE

0.95+

one developerQUANTITY

0.95+

single clusterQUANTITY

0.94+

Snowflake ApolloTITLE

0.94+

three managed servicesQUANTITY

0.9+

SpaceXORGANIZATION

0.87+

Red Hat Summit 2021 Virtual ExperienceEVENT

0.85+

W SORGANIZATION

0.83+

few weeks agoDATE

0.82+

red hatsORGANIZATION

0.82+

one data lakeQUANTITY

0.78+

GCBORGANIZATION

0.77+

A. P. R.ORGANIZATION

0.77+

GreenfieldORGANIZATION

0.74+

single locationQUANTITY

0.72+

G C P.ORGANIZATION

0.71+

GCPDTITLE

0.7+

Ci CDTITLE

0.68+

last 10 yearsDATE

0.66+

G C PORGANIZATION

0.63+

B MCOMMERCIAL_ITEM

0.62+

hatORGANIZATION

0.58+

A. P. I. S.ORGANIZATION

0.56+

redORGANIZATION

0.54+

themQUANTITY

0.5+

HadoopTITLE

0.43+

Kiernan Taylor, Kevin Surace and Issac Sacolick | BizOps Chaos to Clarity 2021


 

(upbeat music) >> Welcome to this BizOps Manifesto Power Panel, Data Lake or Data Landfill. We're going to be talking about that today. I've got three guests joining me. We're going to dive through that. Kieran Taylor is here the CMO of Broadcom's Enterprise Software Division. Kieran, great to have you on the program. >> Thank you, Lisa. >> Kevin Surace is here as well. Chairman and CTO of Appvance, hey Kevin. >> Hey Lisa. >> And Isaac Sacolick Author and CEO of StarCIO. Isaac, welcome. >> Hi Lisa, thanks for having me. >> So we're going to spend the next 25 to 30 minutes talking about the challenges and the opportunities that data brings to organizations. You guys are going to share some of your best practices for how organizations can actually sort through all this data to make data-driven decisions. We're also going to be citing some statistics from the Inaugural BizOps Industry Survey of the State of Digital Business in which 519 business and technology folks were surveyed across five nations. Let's go ahead and jump right in and the first one in that server that I just mentioned 97% of organizations say we've got data related challenges, limiting the amount of information that we actually have available to the business. Big conundrum there. How do organizations get out of that conundrum? Kieran, we're going to start with you. >> Thanks Lisa. You know, I think, I don't know if it's so much limiting information as it is limiting answers. There's no real shortage of data I don't think being captured, recently met with a unnamed auto manufacturer Who's collecting petabytes of data from their connected cars and they're doing that because they don't really yet know what questions they have of the data. So I think you get out of this Data Landfill conundrum by first understanding what questions to ask. It's not algorithms, it's not analytics. It's not, you know, math that's going to solve this problem. It's really, really understanding your customer's issues and what questions to ask of the data >> Understanding what questions to ask of the data. Kevin, what are your thoughts? >> Yeah, look, I think it gets down to what questions you want to ask and what you want out of it, right? So there's questions you want to ask but what are the business outcomes you're looking for, which is the core of BizOps anyway, right? What are the business outcomes and what business outcomes can I act upon? So there are so many business outcomes you can get from data and you go, well, I can't legally act upon that. I can't practically act upon that. I can't, whether it's lay off people or hire people or whatever it is, right? So what are the actionable items? There is plenty of data. We would argue too much data. Now we could say, is the data good? Is the data bad? Is it poorly organized? Is it, noisy? There's all other problems, right? There's plenty of data. What do I do with it? What can I do that's actionable? If I was an automaker and I had lots of sensors on the road, I had petabytes, as Kieran says and I'd probably bringing in petabytes potentially every day. Well, I could make myself driving systems better. That's an obvious place to start or that's what I would do but I could also potentially use that to change people's insurance and say, if you drive in a certain way something we've never been able to do. If you drive in a certain way, based on the sensors you get a lower insurance rate, then nobody's done that. But now there's interesting business opportunities for that data that you didn't have one minute ago and I just gave away. So, (laughs) it's all about the actionable items in the data. How do you drive something to the top line and the bottom line? 'Cause in the end, that's how we're all measured. >> And Isaac, I know you say data is the lifeblood. What are your thoughts on this conundrum? >> Well, I think, you know, they gave you the start and the end of the equation, start with a question. What are you really trying to answer? What you don't understand that you want to learn about your business connect it to an outcome that is valuable to you. And really what most organizations struggle with is a process that goes through discovery, learning what's in the data, addressing data, quality issues, loading new data sources if required and really doing that iteratively and we're all agile people here at BizOps, right? So doing it iteratively, getting some answers out and understanding what the issues are with the underlying data and then going back and revisiting and reprioritizing what you want to do next. Do you want to go look at another question? Is the answer heading down a path that you can drive outcomes? Do you got to go cleanse some data? So it's really that, how do you put it together so that you can peel the onion back and start looking at data and getting insights out of it. >> Great advice, another challenge though, that the survey identified was that nearly 70% of the respondents and again, 519 business and technology professionals from five countries said, we are struggling to create business metrics from our data with so much data, so much that we can't access. Can you guys share best practices for how organizations would sort through and identify the best data sources from which they can identify the ideal business metrics? Kieran, take it away. >> Sure thing, I guess I'll build on Isaac's statements. Every company has some gap in data, right? And so when you do that, that data gap analysis I think you really, I don't know. It's like Alice in Wonderland, begin at the beginning, right? You start with that question like Isaac said, And I think the best questions are really born from an understanding of what your customers value. And if you dig into that, you understand what the customers value, you build it off of actual customer feedback, market research then you know what questions to ask and then from that, hey, what inputs do I need to really understand how to solve that particular business issue or problem. >> Kevin, what are your thoughts? >> Yeah, I'm going to add to that, completely agree but, look, let's start with sales data, right? So sales data is something, everybody watching this understands, even if they're not in sales, they go well, okay, I understand sales data. What's interesting there is we know who our customers are. We could probably figure out if we have enough data, why they buy, are they buying because of a certain sales person? Are they buying because it's a certain region? Are they buying because of some demographic that we don't understand, but AI can pull out, right? So I would like to know, who's buying and why they're buying. Because if I know that I might make more of what more of those people want whatever that is, certain fundamental sales changes or product changes or whatever it is. So if you could certainly start there, if you start nowhere else, say I sell X today. I'd like to sell X times 1.2 by next year. Okay, great. Can I learn from the last five years of sales, millions of units or million or whatever it is, how to do that better and the answer is for sure yes and yes there's problems with the data and there's holes in the data as Kieran said and there's missing data. It doesn't matter, there's a lot of data around sales. So you can just start there and probably drive some top line growth, just doing what you're already doing but doing it better and learning how to do it better. >> Learning how to do it better. Isaac, talk to us about what your thoughts are here with respect to this challenge. >> Well, when you look at that percentage 70% struggling with business metrics, you know what I see is some companies struggling when they have too few metrics and you know, their KPIs, it really doesn't translate well to people doing work for a customer for an application, responding to an issue. So when you have too few in there too disconnected from the work, people don't understand how to use them and then on the flip side I see other organizations trying to create metrics around every single part of the operation, you know, dozens of different ways of measuring user experience and so forth. And that doesn't work because now we don't know what to prioritize. So I think the art of this is management coming back and saying, what are the metrics? Do we want to see impact and changes over in a short amount of time, over the next quarter, over the next six months and to pick a couple in each category, certainly starting with the customer, certainly looking at sales but then also looking at operations and looking at quality and looking at risk and say to the organization, these are the two or three we're going to focus on in the next six months and then I think that's what simplifies it for organizations. >> Thanks, Isaac. So something that I found interesting, it's not surprising in that the survey found that too much data is one of the biggest challenges that organizations have followed by the limitations that we just talked about in terms of identifying what are the ideal business metrics, but a whopping 74% of survey respondents said we failed to have key data available in real time, which is a big inhibitor for data-driven decision-making. Can you guys offer some advice to organizations? How can they harness this data and glean insights from it faster, Kieran, take it away. >> Yeah, I think there are probably five steps to establishing business KPIs and Lisa your first two questions and these gentleman's answers laid out the first two that is define the questions that you want answers for and then identify what those data inputs would be. You know, if you've got a formula in mind, what data inputs do do you need? The remaining three steps. One is, you know, to evaluate the data you've got and then identify what's missing, you know what do you need to then fetch? And then that fetching, you need to think about the measurement method, the frequency I think Isaac mentioned, you know this concept of tools for all. We have too many tools to collect data. So, the measurement method and frequency is important standardizing on tools and automating that collection wherever possible. And then the last step, this is really the people component of the formula. You need to identify stakeholders that will own those business KPIs and even communicate them within the organization. That human element is sometimes forgotten and it's really important. >> It is important, it's one of the challenges as well. Kevin, talk to us about your thoughts here. >> Yeah, again I mean, for sure you've got in the end you've got the human element. You can give people all kinds of KPIs as Isaac said, often it's too many. You have now KPI the business to death and nobody can get out and do anything that doesn't work. Obviously you can't improve things until you measure them. So you have to measure, we get that. But this question of live data is interesting. My personal view is only certain kinds of data are interesting, absolutely live in the moment. So I think people get in their mind, oh, well if I could deploy IOT everywhere and get instantaneous access within one second to the amalgam of that data, I'm making up words too. That would be interesting. Are you sure that'd be interesting? I might rather analyze the last week of real, real data, really deep analysis, right? Build you know, a real model around that and say, okay for the next week, you ought to do the following. Now I get that if you're in the high-frequency stock trading business you know, every millisecond counts, okay? But most of our businesses do not run by the millisecond and we're not going to make a business decision especially humans involved in a millisecond anyway. We make business decisions based on a fair bit of data, days and weeks. So this is just my own personal opinion. I think people get hung up on this. I've got to have all this live data. No, you want great data analysis using AI and machine learning to evaluate as much data as you can get over whatever period of time that is a week, a month a year and start making some rational decisions off of that information. I think that is how you run a business that's going to crush your competition. >> Good advice, Isaac what are your thoughts on these comments? >> Yeah, I'm going to pair off of Kevin's comments. You know, how do you chip away at this problem at getting more real time data? And I'll share two insights first, from the top down, you know, when StarCIO works with a group of CEO and their executive group, you know how are they getting their data? Well, they're getting it in a boardroom with PowerPoints with spreadsheets behind those PowerPoints, with analysts doing a lot of number crunching and behind all that are all the systems of record around the CRM and the ERP and all the other systems that are telling them how they're performing. And I suggest to them for a month, leave the world of PowerPoint and Excel and bring your analysts in to show you the data live in the systems, ask questions and see what it's like to work with real time data. That first changes the perspective in terms of all the manual work that goes into homogenizing that data for them. But then they start getting used to looking at the tools where the data is actually living. So that's an exercise from the top down from the bottom up when we talk to the it groups, you know so much of our data technologies were built at a time when batch processing in our data centers was the only way to go. We ran these things overnight to move data from point a to point B and with the Cloud, with data streaming technologies it's really a new game in town. And so it's really time for many organizations to modernize and thinking about how they're streaming data. Doesn't necessarily have to be real time. It's not really IOT but it's really saying, I need to have my data updated on a regular basis with an SLA against it so that my teams and my businesses can make good decisions around things. >> So let's talk now about digital transformation. We've been talking about that for years. We talked a lot about in 2020, the acceleration of digital transformation for obvious reasons. But when organizations are facing this data conundrum that we talked about, this sort of data disconnect too much can't get what we need right away. Do we need it right away? How did they flip the script on that so that it doesn't become an impediment to digital transformation but it becomes an accelerant. Kieran >> You know, a lot of times you'll hear vendors talk about technology as being the answer, right? So MI, ML, my math is better than your math, et cetera. And technology is important but it's only effective to the point that which people can actually interpret understand and use the data. And so I would put forth this notion of having data at all levels throughout an organization too often. What you'll see is that I think Isaac mentioned it, you know the data is delivered to the C-suite via PowerPoint and it's been sanitized and scrubbed, et cetera. But heck, by the time it gets to the C-suite it's three weeks old. Data at all levels is making sure that throughout organization, the right people have real-time access to data and can make actionable decisions based upon that. So I think that's a real vital ingredient to successful digital transformation. >> Kevin. >> Well, I like to think of digital transformation as looking at all of your relatively manual or paper-based or other processes whatever they are throughout the organization and saying is this something that can now be done for lack of a better word by a machine, right? And that machine could be algorithms. It could be computers, it could be humans it could be Cloud, it could be AI it could be IOT doesn't really matter. (clears throat) And so there's a reason to do that and of course, the basis of that is the data. You've got to collect data to say, this is how we've been performing. This is what we've been doing. So an example, a simple example of digitalization is people doing RPA around customer support. Now you collect a lot of data on how customer support has been supporting customers. You break that into tiers and you say, here's the easiest, lowest tier. I had farmed that out to probably some other country 20 years ago or 10 years ago. Can I even with the systems in place, can I automate that with a set of processes, Robotic Process Automation that digitizes that process now, Now there still might be, you know 20 different screens that click on all different kinds of things, whatever it is, but can I do that? Can I do it with some Chatbots? Can I do it with it? No, I'm not going to do all the customer support that way but I could probably do a fair bit. Can I digitize that process? Can I digitize the process? Great example we all know is insurance companies taking claims. Okay, I have a phone. Can, I take a picture of my car that just got smashed send it in, let AI analyze it and frankly, do an ACH transfer within the hour, because if it costs them insurance company on average 300 to $500 depending on who they are to process a claim, it's cheaper to just send me the $500 then even question it. And if I did it two or three times, well then I'm trying to steal their money and I should go to jail, right? So these are just, I'm giving these as examples 'cause they're examples that everyone who is watching this would go, oh I understand you're digitizing a process. So now when we get to much more complex processes that we're digitizing in data or hiring or whatever, those are a little harder to understand but I just tried to give those as like everyone understands yes, you should digitize those. Those are obvious, right? >> Now those are great examples, you're right. They're relatable across the board here. Isaac, talk to me about what your thoughts are about. Okay, let's do the conundrum. How do we flip the script and leverage data, access to it insights to drive and facilitate digital transformation rather than impede it. >> Well remember, you know, digital transformation is really about changing the business model, changing how you're working with customers and what markets you're going after. You're being forced to do that because of the pace digital technologies are enabling competitors to outpace you. And so we really like starting digital transformations with a vision. What does this business need to do better, differently more of what markets are we going to go after? What types of technologies are important? And we're going to create that vision but we know long-term planning, doesn't work. We know multi-year planning, doesn't work. So we're going to send our teams out on an agile journey over the next sprint, over the next quarter and we're going to use data to give us information about whether we're heading in the right direction. Should we do more of something? Is this feature higher priority? Is there a certain customer segment that we need to pay attention to more? Is there a set of defects happening in our technology that we have to address? Is there a new competitor stealing market share all that kind of data is what the organization needs to be looking at on a very regular basis to say, do we need to pivot, what we're doing? Do we need to accelerate something? Are we heading in the right direction? Should we give ourselves high fives and celebrate a quick win? Because we've accomplished something 'cause so much of transformation is what we're doing today. We're going to change what we're doing over the next three years, and then guess what? There's going to be a new set of technologies. There's going to be another disruption that we can't anticipate and we want our teams sitting on their toes waiting to look at data and saying, what should we do next? >> That's a great segue Isaac into our last question, which is around culture that's always one of those elephants in the room, right? Because so much cultural transformation is necessary but it's incredibly difficult. So question for you guys, Kieran we'll start with you is, should you advise leadership, should really create a culture, a company-wide culture around data? What do you think? >> Absolutely. I mean, this reminds me of DevOps in many ways and you know, the data has to be shared at all levels and has to empower people to make decisions at their respective levels so that we're not, you know kind of siloed in our knowledge or our decision-making, it's through that collective intelligence that I think organizations can move forward more quickly but they do have to change the culture and they've got to have everyone in the room. Everyone's got a stake in driving business success from the C-suite down to the individual contributor >> Right, Kevin, your thoughts >> You know what? Kieran's right. Data silos, one of the biggest brick walls in all of our way, all the time, you know SecOps says there is no way I'm going to share that database because it's got PII. Okay, well, how about if we strip the PII? Well, then that won't be good for something else and you're getting these huge arguments and if you're not driving it from the top, certainly the CIO, maybe the CFO, maybe the CEO I would argue the CEO, drives it from the top. 'Cause the CEO drives company culture and you know, we talk BizOps and the first word of that is Biz. It's the business, right? It's Ops being driven by business goals and the CEO has to set the business goals. It's not really up to the CIO to set business goals. They're setting operational goals, it's up to the CEO. So when the CEO comes out and says our business goals are to drive up sales by this drive down cost by this drive up speed of product development, whatever it is and we're going to digitize all of our processes to do that. We're going to set in KPIs. We're going to measure everything that we do and everybody's going to work around this table. By the way just like we did with DevOps a decade ago, right? And said, Dev, you actually have to work with Ops now and they go, those dangerous guys way over in that other building, we don't even know who they are but in time people realize that we're all on the same team and that if developers develop something that operations can't host and support and keep alive, it's junk right? And we used to do that and now we're much better at it. And whether it's Dev, SecOps or Dev two-way Ops, whatever all those teams working together. Now we're going to spread that out and make it a bigger pyre on the company and it starts with the CEO. And when the CEO makes it a directive for the company I think we're all going to be successful. >> Isaac, what are your thoughts? >> I think we're really talking about a culture of transformation and a culture of collaboration. I mean, again, everything that we're doing now we're going to build, we're going to learn. We're going to use data to pivot what we're doing. We're going to release a product to customers. We're going to get feedback. We're going to continue to iterate over those things. Same thing when it comes to sales, same things that you know, the experiments that we do for marketing, what we're doing today, we're constantly learning. We're constantly challenging our assumptions. We're trying to throw out the sacred cows with status quo, 'cause we know there's going to be another Island that we have to go after and that's the transformation part. The collaboration part is really you know, what you're hearing. Multiple teams, not just Dev and Ops and not just data and Dev, but really the spectrum of business of product, of stakeholders, of marketing and sales, working with technologists and saying, look this is the things that we need to go after over these time periods and work collaboratively and iteratively around them. And again, the data is the foundation for this, right? And we talk about a learning culture as part of that, the data is a big part of that learning, learning new skills and what new skills to learn is as part of that. But when I think about culture, you know the things that slow down organizations is when they're not transforming fast enough, or they're going in five or six different directions, they're not collaborative enough and the data is the element in there that is an equalizer. It's what you show everybody to say, look what we're doing today is not going to make us survive over the next three years. >> The data equalizer, that sounds like it could be movie coming out in 2021. (laughing) Gentlemen, thank you for walking us through some of those interesting metrics coming out of the BizOps Inaugural Survey. Yes, there are challenges with data. Many of them aren't surprising but there's also a lot of tremendous opportunity and I liked how you kind of brought it around to from a cultural perspective. It's got to start from that C-suite to Kieran's point all the way down. I know we could keep talking, we're out of time, but we'll have to keep following, this as a very interesting topic. One that is certainly pervasive across industries. Thanks guys for sharing your insights. >> Than you. >> Thank you, Lisa. >> Thank you, Lisa. >> For Kieran Taylor, Kevin Surace and Isaac Sacolick. I'm Lisa Martin. Thanks for watching. (upbeat music)

Published Date : Apr 22 2021

SUMMARY :

Kieran, great to have you on the program. Chairman and CTO of Appvance, hey Kevin. Author and CEO of StarCIO. and the first one in that So I think you get out of questions to ask of the data. and what you want out of it, right? And Isaac, I know you and the end of the equation, and identify the best data sources And so when you do that, but doing it better and learning how to do it better. Learning how to do it better. the operation, you know, dozens in that the survey found and then identify what's missing, you know of the challenges as well. You have now KPI the business to death and behind all that are all the systems to digital transformation it gets to the C-suite and of course, the basis Isaac, talk to me about what We're going to change what we're doing elephants in the room, right? from the C-suite down to and the CEO has to set the business goals. and Dev, but really the and I liked how you kind Surace and Isaac Sacolick.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
KevinPERSON

0.99+

Isaac SacolickPERSON

0.99+

IsaacPERSON

0.99+

KieranPERSON

0.99+

Kevin SuracePERSON

0.99+

LisaPERSON

0.99+

Kieran TaylorPERSON

0.99+

Kevin SuracePERSON

0.99+

Lisa MartinPERSON

0.99+

twoQUANTITY

0.99+

Issac SacolickPERSON

0.99+

Kiernan TaylorPERSON

0.99+

$500QUANTITY

0.99+

2021DATE

0.99+

fiveQUANTITY

0.99+

2020DATE

0.99+

74%QUANTITY

0.99+

97%QUANTITY

0.99+

70%QUANTITY

0.99+

ExcelTITLE

0.99+

threeQUANTITY

0.99+

PowerPointTITLE

0.99+

todayDATE

0.99+

five countriesQUANTITY

0.99+

next yearDATE

0.99+

BroadcomORGANIZATION

0.99+

three timesQUANTITY

0.99+

a weekQUANTITY

0.99+

three weeksQUANTITY

0.99+

five nationsQUANTITY

0.99+

20 different screensQUANTITY

0.99+

next weekDATE

0.99+

five stepsQUANTITY

0.99+

one secondQUANTITY

0.99+

StarCIOORGANIZATION

0.98+

Alice in WonderlandTITLE

0.98+

first two questionsQUANTITY

0.98+

each categoryQUANTITY

0.98+

three guestsQUANTITY

0.98+

OneQUANTITY

0.98+

first wordQUANTITY

0.98+

PowerPointsTITLE

0.98+

AppvanceORGANIZATION

0.98+

first oneQUANTITY

0.98+

first twoQUANTITY

0.98+

10 years agoDATE

0.98+

20 years agoDATE

0.98+

dozensQUANTITY

0.98+

firstQUANTITY

0.98+

oneQUANTITY

0.97+

BizOpsORGANIZATION

0.97+

a decade agoDATE

0.97+

last weekDATE

0.97+

Inaugural BizOps Industry SurveyEVENT

0.96+

a month a yearQUANTITY

0.96+

nearly 70%QUANTITY

0.96+

JG Chirapurath, Microsoft | theCUBE on Cloud 2021


 

>>from around the globe. It's the Cube presenting Cuban cloud brought to you by silicon angle. Okay, >>we're now going to explore the vision of the future of cloud computing From the perspective of one of the leaders in the field, J G >>Share >>a pure off is the vice president of As Your Data ai and Edge at Microsoft G. Welcome to the Cuban cloud. Thanks so much for participating. >>Well, thank you, Dave, and it's a real pleasure to be here with you. And I just wanna welcome the audience as well. >>Well, jg judging from your title, we have a lot of ground to cover, and our audience is definitely interested in all the topics that are implied there. So let's get right into it. You know, we've said many times in the Cube that the new innovation cocktail comprises machine intelligence or a I applied to troves of data. With the scale of the cloud. It's it's no longer, you know, we're driven by Moore's law. It's really those three factors, and those ingredients are gonna power the next wave of value creation and the economy. So, first, do you buy into that premise? >>Yes, absolutely. we do buy into it. And I think, you know, one of the reasons why we put Data Analytics and Ai together is because all of that really begins with the collection of data and managing it and governing it, unlocking analytics in it. And we tend to see things like AI, the value creation that comes from a I as being on that continues off, having started off with really things like analytics and proceeding toe. You know, machine learning and the use of data. Interesting breaks. Yes. >>I'd like to get some more thoughts around a data and how you see the future data and the role of cloud and maybe how >>Microsoft, you >>know, strategy fits in there. I mean, you, your portfolio, you got you got sequel server, Azure, Azure sequel. You got arc, which is kinda azure everywhere for people that aren't familiar with that. You've got synapse. Which course that's all the integration a data warehouse, and get things ready for B I and consumption by the business and and the whole data pipeline and a lot of other services as your data bricks you got You got cosmos in their, uh, Blockchain. You've got open source services like Post Dress and my sequel. So lots of choices there. And I'm wondering, you know, how do you think about the future of Of of Cloud data platforms? It looks like your strategies, right tool for the right job? Is that fair? >>It is fair, but it's also just to step back and look at it. It's fundamentally what we see in this market today is that customer was the Sikh really a comprehensive proposition? And when I say a comprehensive proposition, it is sometimes not just about saying that. Hey, listen way No, you're a sequel server company. We absolutely trust that you have the best Azure sequel database in the cloud, but tell us more. We've got data that's sitting in her group systems. We've got data that's sitting in Post Press in things like mongo DB, right? So that open source proposition today and data and data management and database management has become front and center, so are really sort of push. There is when it comes to migration management, modernization of data to present the broadest possible choice to our customers so we can meet them where they are. However, when it comes to analytics. One of the things they asked for is give us a lot more convergence use. You know it, really, it isn't about having 50 different services. It's really about having that one comprehensive service that is converged. That's where things like synapse Fitzer, where in just land any kind of data in the leg and then use any compute engine on top of it to drive insights from it. So, fundamentally, you know, it is that flexibility that we really sort of focus on to meet our customers where they are and really not pushing our dogma and our beliefs on it. But to meet our customers according to the way they have deployed stuff like this. >>So that's great. I want to stick on this for a minute because, you know, I know when when I have guests on like yourself, do you never want to talk about you know, the competition? But that's all we ever talk about. That's all your customers ever talk about, because because the counter to that right tool for the right job and that I would say, is really kind of Amazon's approach is is that you got the single unified data platform, the mega database that does it all. And that's kind of Oracle's approach. It sounds like you wanna have your cake and eat it, too, so you you got the right tool for the right job approach. But you've got an integration layer that allows you to have that converge database. I wonder if you could add color to that and you confirm or deny what I just said. >>No, that's a That's a very fair observation, but I I say there's a nuance in what I sort of describe when it comes to data management. When it comes to APS, we have them customers with the broadest choice. Even in that, even in that perspective, we also offer convergence. So, case in point, when you think about Cosmos TV under that one sort of service, you get multiple engines, but with the same properties, right global distribution, the five nines availability. It gives customers the ability to basically choose when they have to build that new cloud native AB toe, adopt cosmos Davey and adopted in a way that it's and choose an engine that is most flexible. Tow them, however you know when it comes to say, you know, writing a sequel server, for example from organizing it you know you want. Sometimes you just want to lift and shift it into things. Like I asked In other cases, you want to completely rewrite it, so you need to have the flexibility of choice there that is presented by a legacy off What's its on premises? When it moved into things like analytics, we absolutely believe in convergence, right? So we don't believe that look, you need to have a relation of data warehouse that is separate from a loop system that is separate from, say, a B I system. That is just, you know, it's a bolt on for us. We love the proposition off, really building things that are so integrated that once you land data, once you prep it inside the lake, you can use it for analytics. You can use it for being. You can use it for machine learning. So I think you know, are sort of differentiated. Approach speaks for itself there. Well, >>that's that's interesting, because essentially, again, you're not saying it's an either or, and you're seeing a lot of that in the marketplace. You got some companies say no, it's the Data Lake and others saying No, no put in the data warehouse and that causes confusion and complexity around the data pipeline and a lot of calls. And I'd love to get your thoughts on this. Ah, lot of customers struggled to get value out of data and and specifically data product builders of frustrated that it takes too long to go from. You know, this idea of Hey, I have an idea for a data service and it could drive monetization, but to get there, you gotta go through this complex data lifecycle on pipeline and beg people to add new data sources. And do you do you feel like we have to rethink the way that we approach data architectures? >>Look, I think we do in the cloud, and I think what's happening today and I think the place where I see the most amount of rethink the most amount of push from our customers to really rethink is the area of analytics in a I. It's almost as if what worked in the past will not work going forward. Right? So when you think about analytics on in the Enterprise today, you have relational systems, you have produced systems. You've got data marts. You've got data warehouses. You've got enterprise data warehouses. You know, those large honking databases that you use, uh, to close your books with right? But when you start to modernize it, what deep you are saying is that we don't want to simply take all of that complexity that we've built over say, you know, 34 decades and simply migrated on mass exactly as they are into the cloud. What they really want is a completely different way of looking at things. And I think this is where services like synapse completely provide a differentiated proposition to our customers. What we say there is land the data in any way you see shape or form inside the lake. Once you landed inside the lake, you can essentially use a synapse studio toe. Prep it in the way that you like, use any compute engine of your choice and and operate on this data in any way that you see fit. So, case in point, if you want to hydrate relation all data warehouse, you can do so if you want to do ad hoc analytics using something like spark. You can do so if you want to invoke power. Bi I on that data or b i on that data you can do so if you want to bring in a machine learning model on this breath data you can do so, so inherently. So when customers buy into this proposition, what it solves for them and what it gives them is complete simplicity, right? One way to land the data, multiple ways to use it. And it's all eso. >>Should we think of synapse as an abstraction layer that abstracts away the complexity of the underlying technology? Is that a fair way toe? Think about it. >>Yeah, you can think of it that way. It abstracts away, Dave a couple of things. It takes away the type of data, you know, sort of the complexities related to the type of data. It takes away the complexity related to the size of data. It takes away the complexity related to creating pipelines around all these different types of data and fundamentally puts it in a place where it can be now consumed by any sort of entity inside the actual proposition. And by that token, even data breaks. You know, you can, in fact, use data breaks in in sort off an integrated way with a synapse, Right, >>Well, so that leads me to this notion of and then wonder if you buy into it s Oh, my inference is that a data warehouse or a data lake >>could >>just be a node in inside of a global data >>mesh on. >>Then it's synapses sort of managing, uh, that technology on top. Do you buy into that that global data mesh concept >>we do. And we actually do see our customers using synapse and the value proposition that it brings together in that way. Now it's not where they start. Often times when a customer comes and says, Look, I've got an enterprise data warehouse, I want to migrate it or I have a group system. I want to migrate it. But from there, the evolution is absolutely interesting to see. I give you an example. You know, one of the customers that we're very proud off his FedEx And what FedEx is doing is it's completely reimagining its's logistics system that basically the system that delivers What is it? The three million packages a day on in doing so in this covert times, with the view of basically delivering our covert vaccines. One of the ways they're doing it is basically using synapse. Synapse is essentially that analytic hub where they can get complete view into their logistic processes. Way things are moving, understand things like delays and really put all that together in a way that they can essentially get our packages and these vaccines delivered as quickly as possible. Another example, you know, is one of my favorite, uh, we see once customers buy into it, they essentially can do other things with it. So an example of this is, uh is really my favorite story is Peace Parks Initiative. It is the premier Air White Rhino Conservancy in the world. They essentially are using data that has landed in azure images in particular. So, basically, you know, use drones over the vast area that they patrol and use machine learning on this data to really figure out where is an issue and where there isn't an issue so that this part with about 200 rangers can scramble surgically versus having to read range across the last area that they cover. So What do you see here is you know, the importance is really getting your data in order. Landed consistently. Whatever the kind of data ideas build the right pipelines and then the possibilities of transformation are just endless. >>Yeah, that's very nice how you worked in some of the customer examples. I appreciate that. I wanna ask you, though, that that some people might say that putting in that layer while it clearly adds simplification and e think a great thing that they're begins over time to be be a gap, if you will, between the ability of that layer to integrate all the primitives and all the peace parts on that, that you lose some of that fine grain control and it slows you down. What would you say to that? >>Look, I think that's what we excel at, and that's what we completely sort of buy into on. It's our job to basically provide that level off integration that granularity in the way that so it's an art, absolutely admit it's an art. There are areas where people create simplicity and not a lot of you know, sort of knobs and dials and things like that. But there are areas where customers want flexibility, right? So I think just to give you an example of both of them in landing the data inconsistency in building pipelines, they want simplicity. They don't want complexity. They don't want 50 different places to do this. Just 100 to do it. When it comes to computing and reducing this data analyzing this data, they want flexibility. This is one of the reasons why we say, Hey, listen, you want to use data breaks? If you're you're buying into that proposition and you're absolutely happy with them, you can plug plug it into it. You want to use B I and no, essentially do a small data mart. You can use B I If you say that. Look, I've landed in the lake. I really only want to use em melt, bringing your animal models and party on. So that's where the flexibility comes in. So that's sort of really sort of think about it. Well, >>I like the strategy because, you know, my one of our guest, Jim Octagon, e E. I think one of the foremost thinkers on this notion of off the data mesh and her premises that that that data builders, data product and service builders air frustrated because the the big data system is generic to context. There's no context in there. But by having context in the big data architecture and system, you could get products to market much, much, much faster. So but that seems to be your philosophy. But I'm gonna jump ahead to do my ecosystem question. You've mentioned data breaks a couple of times. There's another partner that you have, which is snowflake. They're kind of trying to build out their own, uh, data cloud, if you will, on global mesh in and the one hand, their partner. On the other hand, there are competitors. How do you sort of balance and square that circle? >>Look, when I see snowflake, I actually see a partner. You know that when we essentially you know, we are. When you think about as you know, this is where I sort of step back and look at Azure as a whole and in azure as a whole. Companies like snowflakes are vital in our ecosystem, right? I mean, there are places we compete, but you know, effectively by helping them build the best snowflake service on Asia. We essentially are able toe, you know, differentiate and offer a differentiated value proposition compared to, say, a Google or on AWS. In fact, that's being our approach with data breaks as well, where you know they are effectively on multiple club, and our opportunity with data breaks is toe essentially integrate them in a way where we offer the best experience. The best integrations on Azure Barna That's always been a focus. >>That's hard to argue with. Strategy. Our data with our data partner eat er, shows Microsoft is both pervasive and impressively having a lot of momentum spending velocity within the budget cycles. I wanna come back thio ai a little bit. It's obviously one of the fastest growing areas in our in our survey data. As I said, clearly, Microsoft is a leader in this space. What's your what's your vision of the future of machine intelligence and how Microsoft will will participate in that opportunity? >>Yeah, so fundamentally, you know, we've built on decades of research around, you know, around, you know, essentially, you know, vision, speech and language that's being the three core building blocks and for the for a really focused period of time we focused on essentially ensuring human parody. So if you ever wondered what the keys to the kingdom are it, czar, it's the most we built in ensuring that the research posture that we've taken there, what we then done is essentially a couple of things we focused on, essentially looking at the spectrum. That is a I both from saying that, Hollis and you know it's gotta work for data. Analysts were looking toe basically use machine learning techniques, toe developers who are essentially, you know, coding and building a machine learning models from scratch. So for that select proposition manifesto us, as you know, really a. I focused on all skill levels. The other court thing we've done is that we've also said, Look, it will only work as long as people trust their data and they can trust their AI models. So there's a tremendous body of work and research we do in things like responsibility. So if you ask me where we sort of push on is fundamentally to make sure that we never lose sight of the fact that the spectrum off a I, and you can sort of come together for any skill level, and we keep that responsibly. I proposition. Absolutely strong now against that canvas, Dave. I'll also tell you that you know, as edge devices get way more capable, right where they can input on the edge, see a camera or a mike or something like that, you will see us pushing a lot more of that capability onto the edge as well. But to me, that's sort of a modality. But the core really is all skill levels and that responsible denia. >>Yeah, So that that brings me to this notion of wanna bring an edge and and hybrid cloud Understand how you're thinking about hybrid cloud multi cloud. Obviously one of your competitors, Amazon won't even say the word multi cloud you guys have, Ah, you know, different approach there. But what's the strategy with regard? Toe, toe hybrid. You know, Do you see the cloud you bringing azure to the edge? Maybe you could talk about that and talk about how you're different from the competition. >>Yeah, I think in the edge from Annette, you know, I live in I'll be the first one to say that the word nge itself is conflated. Okay, It's, uh but I will tell you, just focusing on hybrid. This is one of the places where you know I would say the 2020 if I would have looked back from a corporate perspective. In particular, it has Bean the most informative because we absolutely saw customers digitizing moving to the cloud. And we really saw hybrid in action. 2020 was the year that hybrid sort of really became really from a cloud computing perspective and an example of this is we understood it's not all or nothing. So sometimes customers want azure consistency in their data centers. This is where things like Azure stack comes in. Sometimes they basically come to us and say, We want the flexibility of adopting flexible pattern, you know, platforms like, say, containers orchestra, Cuban Pettis, so that we can essentially deployed wherever you want. And so when we design things like art, it was built for that flexibility in mind. So here is the beauty of what's something like our can do for you. If you have a kubernetes endpoint anywhere we can deploy and as your service onto it, that is the promise, which means if for some reason, the customer says that. Hey, I've got this kubernetes endpoint in AWS and I love as your sequel. You will be able to run as your sequel inside AWS. There's nothing that stops you from doing it so inherently you remember. Our first principle is always to meet our customers where they are. So from that perspective, multi cloud is here to stay. You know, we're never going to be the people that says, I'm sorry, we will never see a But it is a reality for our customers. >>So I wonder if we could close. Thank you for that by looking, looking back and then and then ahead. And I wanna e wanna put forth. Maybe it's, Ah criticism, but maybe not. Maybe it's an art of Microsoft, but But first you know, you get Microsoft an incredible job of transitioning. It's business as your nominee president Azzawi said. Our data shows that so two part question First, Microsoft got there by investing in the cloud, really changing its mind set, I think, in leveraging its huge software state and customer base to put Azure at the center of its strategy, and many have said me included that you got there by creating products that air Good enough. You know, we do a 1.0, it's not that great. And the two Dato, and maybe not the best, but acceptable for your customers. And that's allowed you to grow very rapidly expanding market. >>How >>do you respond to that? Is that is that a fair comment? Ume or than good enough? I wonder if you could share your >>thoughts, gave you? You hurt my feelings with that question. I don't hate me, g getting >>it out there. >>So there was. First of all, thank you for asking me that. You know, I am absolutely the biggest cheerleader. You'll find a Microsoft. I absolutely believe you know that I represent the work off almost 9000 engineers and we wake up every day worrying about our customer and worrying about the customer condition and toe. Absolutely. Make sure we deliver the best in the first time that we do. So when you take the platter off products we've delivered in nausea, be it as your sequel, be it as your cosmos TV synapse as your data breaks, which we did in partnership with data breaks, a za machine learning and recently when we prevail, we sort off, you know, sort of offered the world's first comprehensive data government solution in azure purview. I would humbly submit to you that we're leading the way and we're essentially showing how the future off data ai and the actual work in the cloud. >>I'd be disappointed if you if you had If you didn't, if you capitulated in any way J g So so thank you for that. And the kind of last question is, is looking forward and how you're thinking about the future of cloud last decade. A lot about your cloud migration simplifying infrastructure management, deployment SAS if eyeing my enterprise, lot of simplification and cost savings. And, of course, the redeployment of resource is toward digital transformation. Other other other valuable activities. How >>do >>you think this coming decade will will be defined? Will it be sort of more of the same? Or is there Is there something else out there? >>I think I think that the coming decade will be one where customers start one law outside value out of this. You know what happened in the last decade when people leave the foundation and people essentially looked at the world and said, Look, we've got to make the move, you know, the largely hybrid, but we're going to start making steps to basically digitize and modernize our platforms. I would tell you that with the amount of data that people are moving to the cloud just as an example, you're going to see use of analytics ai for business outcomes explode. You're also going to see a huge sort of focus on things like governance. You know, people need to know where the data is, what the data catalog continues, how to govern it, how to trust this data and given all other privacy and compliance regulations out there. Essentially, they're complying this posture. So I think the unlocking of outcomes versus simply Hey, I've saved money Second, really putting this comprehensive sort off, you know, governance, regime in place. And then, finally, security and trust. It's going to be more paramount than ever before. Yeah, >>nobody's gonna use the data if they don't trust it. I'm glad you brought up your security. It's It's a topic that hits number one on the CEO list. J G. Great conversation. Obviously the strategy is working, and thanks so much for participating in Cuba on cloud. >>Thank you. Thank you, David. I appreciate it and thank you to. Everybody was tuning in today. >>All right? And keep it right there. I'll be back with our next guest right after this short break.

Published Date : Jan 22 2021

SUMMARY :

cloud brought to you by silicon angle. a pure off is the vice president of As Your Data ai and Edge at Microsoft And I just wanna welcome the audience as you know, we're driven by Moore's law. And I think, you know, one of the reasons why And I'm wondering, you know, how do you think about the future of Of So, fundamentally, you know, it is that flexibility that we really sort of focus I want to stick on this for a minute because, you know, I know when when I have guests So I think you know, are sort of differentiated. but to get there, you gotta go through this complex data lifecycle on pipeline and beg people to in the Enterprise today, you have relational systems, you have produced systems. Is that a fair way toe? It takes away the type of data, you know, sort of the complexities related Do you buy into that that global data mesh concept is you know, the importance is really getting your data in order. that you lose some of that fine grain control and it slows you down. So I think just to give you an example of both I like the strategy because, you know, my one of our guest, Jim Octagon, I mean, there are places we compete, but you know, effectively by helping them build It's obviously one of the fastest growing areas in our So for that select proposition manifesto us, as you know, really a. You know, Do you see the cloud you bringing azure to the edge? Cuban Pettis, so that we can essentially deployed wherever you want. Maybe it's an art of Microsoft, but But first you know, you get Microsoft You hurt my feelings with that question. when we prevail, we sort off, you know, sort of offered the world's I'd be disappointed if you if you had If you didn't, if you capitulated in any way J g So Look, we've got to make the move, you know, the largely hybrid, I'm glad you brought up your security. I appreciate it and thank you to. And keep it right there.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

DavePERSON

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

AnnettePERSON

0.99+

HollisPERSON

0.99+

FedExORGANIZATION

0.99+

JG ChirapurathPERSON

0.99+

AsiaLOCATION

0.99+

Jim OctagonPERSON

0.99+

AWSORGANIZATION

0.99+

100QUANTITY

0.99+

OracleORGANIZATION

0.99+

firstQUANTITY

0.99+

bothQUANTITY

0.99+

50 different servicesQUANTITY

0.99+

twoQUANTITY

0.99+

2020DATE

0.99+

OneQUANTITY

0.99+

AzzawiPERSON

0.99+

FirstQUANTITY

0.99+

oneQUANTITY

0.99+

todayDATE

0.99+

34 decadesQUANTITY

0.99+

CubaLOCATION

0.99+

singleQUANTITY

0.99+

J G.PERSON

0.99+

first timeQUANTITY

0.98+

SecondQUANTITY

0.98+

first oneQUANTITY

0.98+

first principleQUANTITY

0.98+

last decadeDATE

0.98+

Cosmos TVORGANIZATION

0.98+

SikhORGANIZATION

0.98+

about 200 rangersQUANTITY

0.97+

J GPERSON

0.96+

three factorsQUANTITY

0.96+

two partQUANTITY

0.96+

50 differentQUANTITY

0.96+

AzureTITLE

0.96+

decadesQUANTITY

0.96+

presidentPERSON

0.96+

Air White Rhino ConservancyORGANIZATION

0.95+

CubanOTHER

0.94+

almost 9000 engineersQUANTITY

0.91+

Post PressORGANIZATION

0.89+

As Your Data ai and EdgeORGANIZATION

0.88+

MoorePERSON

0.88+

cosmos DaveyORGANIZATION

0.87+

Peace Parks InitiativeORGANIZATION

0.86+

three million packages a dayQUANTITY

0.85+

DressTITLE

0.85+

waveEVENT

0.84+

theCUBEORGANIZATION

0.83+

synapseORGANIZATION

0.8+

CubeCOMMERCIAL_ITEM

0.79+

three core building blocksQUANTITY

0.78+

one comprehensive serviceQUANTITY

0.77+

Data LakeORGANIZATION

0.77+

Brian Bohan and Chris Wegmann | AWS Executive Summit 2020


 

>> Announcer: From around the globe, it's theCUBE. With digital coverage of AWS reInvent Executive Summit 2020, sponsored by Accenture and AWS. >> Hello and welcome back to theCUBE's coverage of AWS reInvent 2020. This is special programming for the Accenture Executive Summit where all the thought leaders are going to extract the signal from those share with you their perspective of this year's reInvent conference as it respects the customers' digital transformation. Brian Bohan is the director and head of Accenture, AWS Business Group at Amazon web services. Brian, great to see you. And Chris Wegmann is the Accenture Amazon Business Group technology lead at Accenture. Guys this is about technology vision this conversation. Chris, I want to start with you because you're Andy Jackson's keynote. You heard about the strategy of digital transformation, how you got to lean into it. You got to have the guts to go for it and you got to decompose. He went everywhere.(chuckles) So what did you hear? What was striking about the keynote? Because he covered a lot of topics. >> Yeah. It was epic as always from Andy. Lot of topics, a lot to cover in the three hours. There was a couple of things that stood out for me. First of all, hybrid. The concept, the new concept of hybrid and how Andy talked about it, bringing the compute and the power to all parts of an enterprise, whether it be at the edge or are in the big public cloud, whether it be in an Outpost or wherever it'd be, right with containerization now. Being able to do Amazon containerization in my data center and that's awesome. I think that's going to make a big difference. All that being underneath the Amazon console and billing and things like that, which is great. I'll also say the chips, right? I know computer is always something that we always kind of take for granted but I think again, this year, Amazon and Andy really focused on what they're doing with the chips and compute and the compute is still at the heart of everything in cloud. And that continued advancement is making an impact and will make and continue to make a big impact. >> Yeah, I would agree. I think one of the things that really... I mean the container thing was I think really kind of a nuance point. When you've got Deepak Singh on the opening day with Andy Jassy and he runs a container group over there. When we need a small little team, he's on the front stage. That really is the key to the hybrid. I think this showcases this new layer. We're taking advantage of the Graviton2 chips, which I thought was huge. Brian, this is really a key part of the platform change, not change, but the continuation of AWS. Higher level servers, >> Yep. building blocks that provide more capabilities, heavy lifting as they say but the new services that are coming on top really speaks to hybrid and speaks to the edge. >> It does. Yeah. I think like Andy talks about and we talked about we really want to provide choice to our customers, first and foremost. And you can see that in the array of services we have, we can see it in the the hybrid options that Chris talked about. Being able to run your containers through ECS or EKS anywhere. It just get to the customers choice. And one of the things that I'm excited about as you talk about going up the stack and on the edge are things, most certainly Outpost, right? So now Outpost was launched last year but then with the new form factors and then you look at services like Panorama, right? Being able to take computer vision and embed machine learning and computer vision, and do that as a managed capability at the edge for customers. And so we see this across a number of industries. And so what we're really thinking about is customers no longer have to make trade-offs and have to think about those choices, that they can really deploy natively in the cloud and then they can take those capabilities, train those models, and then deploy them where they need to whether that's on premises or at the edge, whether it be in a factory or retail environment. I think we're really well positioned when hopefully next year we start seeing the travel industry rebound and the need more than ever really to kind of rethink about how we kind of monitor and make those environments safe. Having this kind of capability at the edge is really going to help our customers as we come out of this year and hopefully rebound next year. >> Chris, I want to go back to you for a second. It's hard to pick your favorite innovation from the keynote because, Brian, just reminded me of some things I forgot happened. It was like a buffet of innovation. Some keynotes have one or two, there was like 20. You got the industrial piece that was huge. Computer vision, machine learning, that's just a game changer. The connect thing came out of nowhere in my opinion. I mean, it's a call center technology so it's boring as hell, what are you going to do with that?(Brian and Chris chuckle) It turns out it's a game changer. It's not about the calls but the contact and that's distant intermediating in the stack as well. So again, a feature that looks old is actually new and relevant. What was your favorite innovation announcement? >> It's hard to say. I will say my personal favorite was the Mac OS. I think that is a phenomenal just addition, right? And the fact that AWS has worked with Apple to integrate the Nitro chip into the iMac and offer that out. A lot of people are doing development for IOS and that stuff and that's just been a huge benefit for the development teams. But I will say, I'll come back to Connect. You mentioned it but you're right. It's a boring area but it's an area that we've seen huge success with since Connect was launched and the additional features that Amazon continues to bring, obviously with the pandemic and now that customer engagement through the phone, through omni-channel has just been critical for companies, right? And to be able to have those agents at home, working from home versus being in the office, it was a huge advantage for several customers that are using Connect. We did some great stuff with some different customers but the continue technology like you said, the call translation and during a call to be able to pop up those keywords and have a supervisor listen is awesome. And some of that was already being done but we are stitching multiple services together. Now that's right out of the box. And that Google's location is only going to make that go faster and make us to be able to innovate faster for that piece of the business. >> It's interesting not to get all nerdy and business school like but you've got systems of records, systems of engagement. If you look at the call center and the Connect thing, what got my attention was not only the model of disintermediating that part of the engagement in the stack but what actually cloud does to something that's a feature or something that could be an element like say call center, the old days of calling the 800 number and getting some support. You got infra chip, you have machine learning, you actually have stuff in the in the stack that actually makes that different now. The thing that impressed me was Andy was saying, you could have machine learning detect pauses, voice inflections. So now you have technology making that more relevant and better and different. So a lot going on. This is just one example of many things that are happening from a disruption innovation standpoint. What do you guys think about that? Am I getting it right? Can you share other examples? >> I think you are right and I think what's implied there and what you're saying and even in the other Mac OS example is the ability... We're talking about features, right? Which by themselves you're saying, Oh, wow! What's so unique about that? But because it's on AWS and now because whether you're a developer working with Mac iOS and you have access to the 175 plus services that you can then weave into your new application. Talk about the Connect scenario. Now we're embedding that kind of inference and machine learning to do what you say, but then your data Lake is also most likely running in AWS, right? And then the other channels whether they be mobile channels or web channels or in-store physical channels, that data can be captured and that same machine learning could be applied there to get that full picture across the spectrum, right? So that's the power of bringing you together on AWS, the access to all those different capabilities and services and then also where the data is and pulling all that together for that end to end view. >> Can you guys give some examples of work you've done together? I know there's stuff we've reported on, in the last session we talked about some of the connect stuff but that kind of encapsulates where this is all going with respect to the tech. >> Yeah. I think one of them, it was called out on Doug's Partner Summit is a SAP Data Lake Accelerator, right? Almost every enterprise has SAP, right? And getting data out of SAP has always been a challenge, right? Whether it be through data warehouses and AWS, or sorry, SAP BW. What we've focused on is getting that data when you have SAP on AWS, getting that data into the Data Lake, right? Getting it into a model that you can pull the value out and the customers can pull the value out, use those AI models. So that's one thing we worked on in the last 12 months. Super excited about seeing great success with customers. A lot of customers had ideas. They want to do this, they had different models. What we've done is made it very simplified. Framework which allows customers to do it very quickly, get the data out there and start getting value out of it and iterating on that data. We saw customers are spending way too much time trying to stitch it all together and trying to get it to work technically. And we've now cut all of that out and they can immediately start getting down to the data and taking advantage of those different services that are out there by AWS. >> Brian, you want to weigh in as things you see as relevant builds that you guys done together that kind of tease out the future and connect the dots to what's coming? >> I'm going to use a customer example. We worked with, it just came out, with Unilever around their blue air, connected, smart air purifier. And what I think is interesting about that, I think it touches on some of the themes we're talking about as well as some of the themes we talked about in the last session, which is we started that program before the pandemic, but Unilever recognized that they needed to differentiate their product in the marketplace, move to more of a services oriented business which we're seeing as a trend. We enabled this capability. So now it's a smart air purifier that can be remote managed. And now when the pandemic hit, they are in a really good position, obviously, with a very relevant product and capability to be used. And so, that data then as we were talking about is going to reside on the cloud. And so the learning that can now happen about usage and about filter changes, et cetera can find its way back into future iterations of that picked out that product. And I think that's keeping with what Chris is talking about where we might be systems of record like in SAP, how do we bring those in and then start learning from that data so that we can get better on our future iterations? >> Hey, Chris, on the last segment we did on the business mission session, Andy Tay from your team talked about partnerships within a century and working with other folks. I want to take that now on the technical side because one of the things that we heard from Doug's keynote and during the partner day was integrations and data were two big themes. When you're in the cloud technically, the integrations are different. You're going to get unique things in the public cloud that you're just not going to get on-premise access to other cloud native technologies and companies. How do you see the partnering of Accenture with people within your ecosystem and how the data and the integration play together? What's your vision? >> Yeah. I think there's two parts of it. One there's from a commercial standpoint, right? Some marketplace, you heard Dave talk about that in the partner summit, right? That marketplace is now bringing together this ecosystem in a very easy way to consume by the customers and by the users and bringing multiple partners together. And we're working with our ecosystem to put more products out in the marketplace that are integrated together already. I think one from a technical perspective though. If you look at Salesforce, I talked a little earlier about Connect. Another good example technically underneath the covers, how we've integrated Connect and Salesforce, some of it being pre-built by AWS and Salesforce, other things that we've added on top of it, I think are good examples. And I think as these ecosystems these ISVs put their products out there and start exposing more and more APIs on the Amazon platform may opening it up, having those pre-built network connections there between the different VPCs of the different areas within within a customer's network and having them all opened up and connected and having all that networking done underneath the covers. It's one thing to call the APIs, it's one thing to have access to those and that's not a big focus of a lot of ISVs and customers who build those APIs and expose them but having that network infrastructure underneath and being able to stay within the cloud, within AWS to make those connections that pass that data. We always talk about scale, right? It's one thing if I just need to pass like a simple user ID back and forth, right? That's fine. We're not talking massive data sets, whether it be seismic data or whatever it be, passing those large data sets between customers across the Amazon network is going to open up the world. >> Yeah, I see huge possibilities there and love to keep on this story. I think it's going to be important and something to keep track of. I'm sure you guys will be on top of it. One of the things I want to dig into with you guys now is Andy had kind of this philosophical thing in his keynote talk about societal change and how tough the pandemic is. Everything's on full display and this kind of brings out kind of like where we are and the truth. If you look at the truth it's a virtual event. I mean, it's a website and you got some sessions out there, we're doing remote best we can and you've got software and you've got technology and the other concept of a mechanism, it's software, it does something It does a purpose. Accenture, you guys have a concept called Living Systems where growth strategy powered by technology. How do you take the concept of a living organism or a system and replace the mechanism staleness of computing and software? And this is kind of interesting because we're on the cusp of a major inflection point post COVID. I get the digital transformation being slow. That's yes, that's happening. There's other things going on in society. What do you guys think about this Living Systems concept? Yeah. I'll start. I think the living system concept, it started out very much thinking about how do you rapidly change your system, right? And because of cloud, because of DevOps, because of all these software technologies and processes that we've created, that's where it started making it much easier, make it a much faster being able to change rapidly. But you're right. I think if you now bring in more technologies, the AI technology, self-healing technologies. Again, you heard Andy in his keynote talk about the systems and services they're building to detect problems and resolve those problems, right? Obviously automation is a big part of that. Living Systems, being able to bring that all together and to be able to react in real time to either when a customer asks, either through the AI models that have been generated and turning those AI models around much faster and being able to get all the information that came in the last 20 minutes, right? Society is moving fast and changing fast and even in one part of the world, if something in 10 minutes can change. And being able to have systems to react to that, learn from that and be able to pass that on to the next country especially in this world of COVID and things changing very quickly and diagnosis and medical response all that so quickly to be able to react to that and have systems pass that information, learn from that information is going to be critical. >> That's awesome. Brian, one of the things that comes up every year is, oh, the cloud's scalable. This year I think we've talked on theCUBE before, years ago certainly with the Accenture and Amazon. I think it was like three or four years ago. Yeah. The clouds horizontally scalable but vertically specialized at the application layer. But if you look at the Data Lake stuff that you guys have been doing where you have machine learning, the data is horizontally scalable and then you got the specialization in the app changes the whole vertical thing. You don't need to have a whole vertical solution or do you? So, how has this year's cloud news impacted vertical industries? Because it used to be, oh, oil and gas, financial services. They've got a team for that. We got a stack for that. Not anymore. Is it going away? What's changing? >> Well. It's a really good question. I think what we're seeing, and I was just on a call this morning talking about banking and capital markets and I do think the challenges are still pretty sector specific. But what we do see is the kind of commonality when we start looking at the, and we talked about this, the industry solutions that we're building as a partnership, most of them follow the pattern of ingesting data, analyzing that data and then being able to provide insights and then actions, right? So if you think about creating that kind of common chassis of that in just the Data Lake and then the machine learning, and you talk about the nuances around SageMaker and being able to manage these models, what changes then really are the very specific industries' algorithms that you're writing, right, within that framework. And so, we're doing a lot and Connect is a good example of this too, where you look at it and yeah, customer service is a horizontal capability that we're building out, but then when you stamp it into insurance or retail banking, or utilities, there are nuances then that we then extend and build so that we meet the unique needs of those industries and that's usually around those models. >> Yeah. I think this year was the first reInvent that I saw real products coming out that actually solved that problem. I mean, it was there last year SageMaker was kind of moving up the stack, but now you have apps embedding machine learning directly in and users don't even know it's in there. I mean, cause this is kind of where it's going, right? I mean-- >> You saw that was in announcements, right? How many announcements where machine learning is just embedded in? I mean, CodeGuru, DevOps Guru, the Panorama we talked about, it's just there. >> Yeah. I mean having that knowledge about the linguistics and the metadata, knowing the business logic, those are important specific use cases for the vertical and you can get to it faster. Chris, how is this changing on the tech side, your perspective? >> Yeah. I keep coming back to AWS and cloud makes it easier, right? All this stuff can be done and some of it has been done, but what Amazon continues to do is make it easier to consume by the developer, by the customer and to actually embed it into applications much easier than it would be if I had to go set up the stack and build it all on them and embed it, right? So it's shortcoming that process and again, as these products continue to mature, right, and some of this stuff is embedded, it makes that process so much faster. It reduces the amount of work required by the developers the engineers to get there. So, I'm expecting you're going to see more of this, right. I think you're going to see more and more of these multi connected services by AWS, that has a lot of the AI ML pre-configured Data Lakes, all that kind of stuff embedded in those services. So you don't have to do it yourself and continue to go up the stack. And we always talk about Amazon's built for builders, right? But, builders have been super specialized and are becoming, as engineers were being asked to be bigger and bigger and to be be able to do more stuff and I think these kind of integrated services are going to help us do that >> And certainly needed more now when you have hybrid edge that they're going to be operating with microservices on a cloud model and with all those advantages that are going to come around the corner for being in the cloud. I mean, I think there's going to be a whole clarity around benefits in the cloud with all these capabilities and benefits. Cloud Guru I think it's my favorite this year because it just points to why that could happen. I mean that happens because of the cloud data.(laughs) If you're on-premise, you may not have a little Cloud Guru. you are going to get more data but they're all different. Edge certainly will come in too. Your vision on the edge, Chris, how you see that evolving for customers because that could be complex, new stuff. How is it going to get easier? >> Yeah. It's super complex now, right? I mean, you got to design for all the different edge 5G protocols are out there and solutions, right? Amazon's simplifying that. Again, I come back to simplification, right? I can build an app that works on any 5G network that's been integrated with AWS, right. I don't have to set up all the different layers to get back to my cloud or back to my my bigger data set. And that's kind of choking. I don't even know where to call the cloud anymore. I got big cloud which is a central and I go down then you've got a cloud at the edge. Right? So what do I call that? >> Brian: It's just really computing.(laughing) Exactly. So, again, I think is this next generation of technology with the edge comes right and we put more and more data at the edge. We're asking for more and more compute at the edge, right? Whether it be industrial or for personal use or consumer use, that processing is going to get more and more intense to be able to maintain under a single console, under a single platform and be able to move the code that I developed across that entire platform, whether I have to go all the way down to the very edge at the 5G level, right, or all the way back into the bigger cloud and how that processing in there, being able to do that seamlessly is going to allow the speed of development that's needed. >> Wow. You guys done a great job and no better time to be a techie or interested in technology or computer science or social science for that matter. This is a really perfect store. A lot of problems to solve, a lot of change happening, positive change opportunities, a lot of great stuff. Final question guys. Five years working together now on this partnership with AWS and Accenture. Congratulations, you guys are in pole position for the next wave coming. What's exciting you guys? Chris, what's on your mind? Brian, what's getting you guys pumped up? >> Well, again, I come back to Andy mentioned it in his keynote, right? We're seeing customers move now, right. Five years ago we knew customers were going to do this. We built a partnership to enable these enterprise customers to make that journey, right? But now, even more we're seeing them move at such great speed, right? Which is super excites me, right? Because I can see... Being in this for a long time now, I can see the value on the other end. We've been wanting to push our customers as fast as they can through the journey and now they're moving. Now they're getting the religion, they're getting there. They see they need to do it to change your business so that's what excites me. It just the excites me, it's just the speed at which we're going to to see the movement. >> Yeah. >> Yeah. I'd agree with that. I mean, I just think getting customers to the cloud is super important work and we're obviously doing that and helping accelerate that. It's what we've been talking about when we're there all the possibilities that become available, right? Through the common data capabilities, the access to the 175 somewhat AWS services. I also think and this is kind of permeated through this week at Re:invent is the opportunity, especially in those industries that do have an industrial aspect, a manufacturing aspect, or a really strong physical aspect of bringing together IT and operational technology and the business with all these capabilities and I think edge and pushing machine learning down to the edge and analytics at the edge is really going to help us do that. And so I'm super excited by all that possibility because I feel like we're just scratching the surface there. >> It's a great time to be building out. and this is the time for reconstruction, reinvention. Big theme, so many storylines in the keynote and the events . It's going to keep us busy here at SiliconANGLE on theCUBE for the next year. Gentlemen, thank you for coming on. I really appreciate it. Thanks. >> Thank you. All right. Great conversation. We're getting technical. We're going to go another 30 minutes A lot to talk about. A lot of storylines here at AWS Re:Invent 2020 at the Accenture Executive Summit. I'm John Furrier. Thanks for watching. (upbeat music)

Published Date : Dec 16 2020

SUMMARY :

Announcer: From around the globe, and you got to decompose. and the compute is still That really is the key to the hybrid. and speaks to the edge. and on the edge are things, back to you for a second. and the additional features of the engagement in the stack and machine learning to do what you say, in the last session we talked about and the customers can pull the value out, and capability to be used. and how the data and the and by the users and bringing and even in one part of the world, and then you got the of that in just the Data Lake and users don't even know it's in there. DevOps Guru, the Panorama we talked about, and the metadata, knowing and to be be able to do more stuff that are going to come around the corner I don't have to set up and be able to move the and no better time to be a techie I can see the value on the other end. and the business with in the keynote and the events . at AWS Re:Invent 2020 at the

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Chris WegmannPERSON

0.99+

ChrisPERSON

0.99+

Andy TayPERSON

0.99+

BrianPERSON

0.99+

AndyPERSON

0.99+

John FurrierPERSON

0.99+

AWSORGANIZATION

0.99+

Brian BohanPERSON

0.99+

Andy JacksonPERSON

0.99+

DavePERSON

0.99+

UnileverORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

next yearDATE

0.99+

last yearDATE

0.99+

GoogleORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

Five yearsQUANTITY

0.99+

Deepak SinghPERSON

0.99+

IOSTITLE

0.99+

Andy JassyPERSON

0.99+

AppleORGANIZATION

0.99+

iMacCOMMERCIAL_ITEM

0.99+

oneQUANTITY

0.99+

twoQUANTITY

0.99+

threeDATE

0.99+

DougPERSON

0.99+

Five years agoDATE

0.99+

two partsQUANTITY

0.99+

AWS Business GroupORGANIZATION

0.98+

This yearDATE

0.98+

10 minutesQUANTITY

0.98+

175 plus servicesQUANTITY

0.98+

Accenture Executive SummitEVENT

0.98+

20QUANTITY

0.98+

four years agoDATE

0.98+

three hoursQUANTITY

0.98+

this yearDATE

0.98+

800OTHER

0.98+

OneQUANTITY

0.98+

Rahul Pathak, AWS | AWS re:Invent 2020


 

>>from around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. Yeah, welcome back to the cubes. Ongoing coverage of AWS reinvent virtual Cuba's Gone Virtual along with most events these days are all events and continues to bring our digital coverage of reinvent With me is Rahul Pathak, who is the vice president of analytics at AWS A Ro. It's great to see you again. Welcome. And thanks for joining the program. >>They have Great co two and always a pleasure. Thanks for having me on. >>You're very welcome. Before we get into your leadership discussion, I want to talk about some of the things that AWS has announced. Uh, in the early parts of reinvent, I want to start with a glue elastic views. Very notable announcement allowing people to, you know, essentially share data across different data stores. Maybe tell us a little bit more about glue. Elastic view is kind of where the name came from and what the implication is, >>Uh, sure. So, yeah, we're really excited about blue elastic views and, you know, as you mentioned, the idea is to make it easy for customers to combine and use data from a variety of different sources and pull them together into one or many targets. And the reason for it is that you know we're really seeing customers adopt what we're calling a lake house architectural, which is, uh, at its core Data Lake for making sense of data and integrating it across different silos, uh, typically integrated with the data warehouse, and not just that, but also a range of other purpose. Both stores like Aurora, Relation of Workloads or dynamodb for non relational ones. And while customers typically get a lot of benefit from using purpose built stores because you get the best possible functionality, performance and scale forgiven use case, you often want to combine data across them to get a holistic view of what's happening in your business or with your customers. And before glue elastic views, customers would have to either use E. T. L or data integration software, or they have to write custom code that could be complex to manage, and I could be are prone and tough to change. And so, with elastic views, you can now use sequel to define a view across multiple data sources pick one or many targets. And then the system will actually monitor the sources for changes and propagate them into the targets in near real time. And it manages the anti pipeline and can notify operators if if anything, changes. And so the you know the components of the name are pretty straightforward. Blues are survivalists E T Elling data integration service on blue elastic views about our about data integration their views because you could define these virtual tables using sequel and then elastic because it's several lists and will scale up and down to deal with the propagation of changes. So we're really excited about it, and customers are as well. >>Okay, great. So my understanding is I'm gonna be able to take what's called what the parlance of materialized views, which in my laypersons terms assumes I'm gonna run a query on the database and take that subset. And then I'm gonna be ableto thio. Copy that and move it to another data store. And then you're gonna automatically keep track of the changes and keep everything up to date. Is that right? >>Yes. That's exactly right. So you can imagine. So you had a product catalog for example, that's being updated in dynamodb, and you can create a view that will move that to Amazon Elasticsearch service. You could search through a current version of your catalog, and we will monitor your dynamodb tables for any changes and make sure those air all propagated in the real time. And all of that is is taken care of for our customers as soon as they defined the view on. But they don't be just kept in sync a za long as the views in effect. >>Let's see, this is being really valuable for a person who's building Looks like I like to think in terms of data services or data products that are gonna help me, you know, monetize my business. Maybe, you know, maybe it's a simple as a dashboard, but maybe it's actually a product. You know, it might be some content that I want to develop, and I've got transaction systems. I've got unstructured data, may be in a no sequel database, and I wanna actually combine those build new products, and I want to do that quickly. So So take me through what I would have to do. You you sort of alluded to it with, you know, a lot of e t l and but take me through in a little bit more detail how I would do that, you know, before this innovation. And maybe you could give us a sense as to what the possibilities are with glue. Elastic views? >>Sure. So, you know, before we announced elastic views, a customer would typically have toe think about using a T l software, so they'd have to write a neat L pipeline that would extract data periodically from a range of sources. They then have to write transformation code that would do things like matchup types. Make sure you didn't have any invalid values, and then you would combine it on periodically, Write that into a target. And so once you've got that pipeline set up, you've got to monitor it. If you see an unusual spike in data volume, you might have to add more. Resource is to the pipeline to make a complete on time. And then, if anything changed in either the source of the destination that prevented that data from flowing in the way you would expect it, you'd have toe manually, figure that out and have data, quality checks and all of that in place to make sure everything kept working but with elastic views just gets much simpler. So instead of having to write custom transformation code, you right view using sequel and um, sequel is, uh, you know, widely popular with data analysts and folks that work with data, as you well know. And so you can define that view and sequel. The view will look across multiple sources, and then you pick your destination and then glue. Elastic views essentially monitors both the source for changes as well as the source and the destination for any any issues like, for example, did the schema changed. The shape of the data change is something briefly unavailable, and it can monitor. All of that can handle any errors, but it can recover from automatically. Or if it can't say someone dropped an important table in the source. That was part of your view. You can actually get alerted and notified to take some action to prevent bad data from getting through your system or to prevent your pipeline from breaking without your knowledge and then the final pieces, the elasticity of it. It will automatically deal with adding more resource is if, for example, say you had a spiky day, Um, in the markets, maybe you're building a financial services application and you needed to add more resource is to process those changes into your targets more quickly. The system would handle that for you. And then, if you're monetizing data services on the back end, you've got a range of options for folks subscribing to those targets. So we've got capabilities like our, uh, Amazon data exchange, where people can exchange and monetize data set. So it allows this and to end flow in a much more straightforward way. It was possible before >>awesome. So a lot of automation, especially if something goes wrong. So something goes wrong. You can automatically recover. And if for whatever reason, you can't what happens? You quite ask the system and and let the operator No. Hey, there's an issue. You gotta go fix it. How does that work? >>Yes, exactly. Right. So if we can recover, say, for example, you can you know that for a short period of time, you can't read the target database. The system will keep trying until it can get through. But say someone dropped a column from your source. That was a key part of your ultimate view and destination. You just can't proceed at that point. So the pipeline stops and then we notify using a PS or an SMS alert eso that programmatic action can be taken. So this effectively provides a really great way to enforce the integrity of data that's going between the sources and the targets. >>All right, make it kindergarten proof of it. So let's talk about another innovation. You guys announced quicksight que, uh, kind of speaking to the machine in my natural language, but but give us some more detail there. What is quicksight Q and and how doe I interact with it. What What kind of questions can I ask it >>so quick? Like you is essentially a deep, learning based semantic model of your data that allows you to ask natural language questions in your dashboard so you'll get a search bar in your quick side dashboard and quick site is our service B I service. That makes it really easy to provide rich dashboards. Whoever needs them in the organization on what Q does is it's automatically developing relationships between the entities in your data, and it's able to actually reason about the questions you ask. So unlike earlier natural language systems, where you have to pre define your models, you have to pre define all the calculations that you might ask the system to do on your behalf. Q can actually figure it out. So you can say Show me the top five categories for sales in California and it'll look in your data and figure out what that is and will prevent. It will present you with how it parse that question, and there will, in line in seconds, pop up a dashboard of what you asked and actually automatically try and take a chart or visualization for that data. That makes sense, and you could then start to refine it further and say, How does this compare to what happened in New York? And we'll be able to figure out that you're tryingto overlay those two data sets and it'll add them. And unlike other systems, it doesn't need to have all of those things pre defined. It's able to reason about it because it's building a model of what your data means on the flight and we pre trained it across a variety of different domains So you can ask a question about sales or HR or any of that on another great part accused that when it presents to you what it's parsed, you're actually able toe correct it if it needs it and provide feedback to the system. So, for example, if it got something slightly off you could actually select from a drop down and then it will remember your selection for the next time on it will get better as you use it. >>I saw a demo on in Swamis Keynote on December 8. That was basically you were able to ask Quick psych you the same question, but in different ways, you know, like compare California in New York or and then the data comes up or give me the top, you know, five. And then the California, New York, the same exact data. So so is that how I kind of can can check and see if the answer that I'm getting back is correct is ask different questions. I don't have to know. The schema is what you're saying. I have to have knowledge of that is the user I can. I can triangulate from different angles and then look and see if that's correct. Is that is that how you verify or there are other ways? >>Eso That's one way to verify. You could definitely ask the same question a couple of different ways and ensure you're seeing the same results. I think the third option would be toe, uh, you know, potentially click and drill and filter down into that data through the dash one on, then the you know, the other step would be at data ingestion Time. Typically, data pipelines will have some quality controls, but when you're interacting with Q, I think the ability to ask the question multiple ways and make sure that you're getting the same result is a perfectly reasonable way to validate. >>You know what I like about that answer that you just gave, and I wonder if I could get your opinion on this because you're you've been in this business for a while? You work with a lot of customers is if you think about our operational systems, you know things like sales or E r. P systems. We've contextualized them. In other words, the business lines have inject context into the system. I mean, they kind of own it, if you will. They own the data when I put in quotes, but they do. They feel like they're responsible for it. There's not this constant argument because it's their data. It seems to me that if you look back in the last 10 years, ah, lot of the the data architecture has been sort of generis ized. In other words, the experts. Whether it's the data engineer, the quality engineer, they don't really have the business context. But the example that you just gave it the drill down to verify that the answer is correct. It seems to me, just in listening again to Swamis Keynote the other day is that you're really trying to put data in the hands of business users who have the context on the domain knowledge. And that seems to me to be a change in mindset that we're gonna see evolve over the next decade. I wonder if you could give me your thoughts on that change in the data architecture data mindset. >>David, I think you're absolutely right. I mean, we see this across all the customers that we speak with there's there's an increasing desire to get data broadly distributed into the hands of the organization in a well governed and controlled way. But customers want to give data to the folks that know what it means and know how they can take action on it to do something for the business, whether that's finding a new opportunity or looking for efficiencies. And I think, you know, we're seeing that increasingly, especially given the unpredictability that we've all gone through in 2020 customers are realizing that they need to get a lot more agile, and they need to get a lot more data about their business, their customers, because you've got to find ways to adapt quickly. And you know, that's not gonna change anytime in the future. >>And I've said many times in the The Cube, you know, there are industry. The technology industry used to be all about the products, and in the last decade it was really platforms, whether it's SAS platforms or AWS cloud platforms, and it seems like innovation in the coming years, in many respects is coming is gonna come from the ecosystem and the ability toe share data we've We've had some examples today and then But you hit on. You know, one of the key challenges, of course, is security and governance. And can you automate that if you will and protect? You know the users from doing things that you know, whether it's data access of corporate edicts for governance and compliance. How are you handling that challenge? >>That's a great question, and it's something that really emphasized in my leadership session. But the you know, the notion of what customers are doing and what we're seeing is that there's, uh, the Lake House architectural concept. So you've got a day late. Purpose build stores and customers are looking for easy data movement across those. And so we have things like blue elastic views or some of the other blue features we announced. But they're also looking for unified governance, and that's why we built it ws late formation. And the idea here is that it can quickly discover and catalog customer data assets and then allows customers to define granular access policies centrally around that data. And once you have defined that, it then sets customers free to give broader access to the data because they put the guardrails in place. They put the protections in place. So you know you can tag columns as being private so nobody can see them on gun were announced. We announced a couple of new capabilities where you can provide row based control. So only a certain set of users can see certain rose in the data, whereas a different set of users might only be able to see, you know, a different step. And so, by creating this fine grained but unified governance model, this actually sets customers free to give broader access to the data because they know that they're policies and compliance requirements are being met on it gets them out of the way of the analyst. For someone who can actually use the data to drive some value for the business, >>right? They could really focus on driving value. And I always talk about monetization. However monetization could be, you know, a generic term, for it could be saving lives, admission of the business or the or the organization I meant to ask you about acute customers in bed. Uh, looks like you into their own APs. >>Yes, absolutely so one of quick sites key strengths is its embed ability. And on then it's also serverless, so you could embed it at a really massive scale. And so we see customers, for example, like blackboard that's embedding quick side dashboards into information. It's providing the thousands of educators to provide data on the effectiveness of online learning. For example, on you could embed Q into that capability. So it's a really cool way to give a broad set of people the ability to ask questions of data without requiring them to be fluent in things like Sequel. >>If I ask you a question, we've talked a little bit about data movement. I think last year reinvent you guys announced our A three. I think it made general availability this year. And remember Andy speaking about it, talking about you know, the importance of having big enough pipes when you're moving, you know, data around. Of course you do. Doing tearing. You also announced Aqua Advanced Query accelerator, which kind of reduces bringing the computer. The data, I guess, is how I would think about that reducing that movement. But then we're talking about, you know, glue, elastic views you're copying and moving data. How are you ensuring you know, maintaining that that maximum performance for your customers. I mean, I know it's an architectural question, but as an analytics professional, you have toe be comfortable that that infrastructure is there. So how does what's A. W s general philosophy in that regard? >>So there's a few ways that we think about this, and you're absolutely right. I think there's data volumes were going up, and we're seeing customers going from terabytes, two petabytes and even people heading into the exabyte range. Uh, there's really a need to deliver performance at scale. And you know, the reality of customer architectures is that customers will use purpose built systems for different best in class use cases. And, you know, if you're trying to do a one size fits all thing, you're inevitably going to end up compromising somewhere. And so the reality is, is that customers will have more data. We're gonna want to get it to more people on. They're gonna want their analytics to be fast and cost effective. And so we look at strategies to enable all of this. So, for example, glue elastic views. It's about moving data, but it's about moving data efficiently. So What we do is we allow customers to define a view that represents the subset of their data they care about, and then we only look to move changes as efficiently as possible. So you're reducing the amount of data that needs to get moved and making sure it's focused on the essential. Similarly, with Aqua, what we've done, as you mentioned, is we've taken the compute down to the storage layer, and we're using our nitro chips to help with things like compression and encryption. And then we have F. P. J s in line to allow filtering an aggregation operation. So again, you're tryingto quickly and effectively get through as much data as you can so that you're only sending back what's relevant to the query that's being processed. And that again leads to more performance. If you can avoid reading a bite, you're going to speed up your queries. And that Awkward is trying to do. It's trying to push those operations down so that you're really reducing data as close to its origin as possible on focusing on what's essential. And that's what we're applying across our analytics portfolio. I would say one other piece we're focused on with performance is really about innovating across the stack. So you mentioned network performance. You know, we've got 100 gigabits per second throughout now, with the next 10 instances and then with things like Grab it on to your able to drive better price performance for customers, for general purpose workloads. So it's really innovating at all layers. >>It's amazing to watch it. I mean, you guys, it's a It's an incredible engineering challenge as you built this hyper distributed system. That's now, of course, going to the edge. I wanna come back to something you mentioned on do wanna hit on your leadership session as well. But you mentioned the one size fits all, uh, system. And I've asked Andy Jassy about this. I've had a discussion with many folks that because you're full and and of course, you mentioned the challenges you're gonna have to make tradeoffs if it's one size fits all. The flip side of that is okay. It's simple is you know, 11 of the Swiss Army knife of database, for example. But your philosophy is Amazon is you wanna have fine grained access and to the primitives in case the market changes you, you wanna be able to move quickly. So that puts more pressure on you to then simplify. You're not gonna build this big hairball abstraction layer. That's not what he gonna dio. Uh, you know, I think about, you know, layers and layers of paint. I live in a very old house. Eso your That's not your approach. So it puts greater pressure on on you to constantly listen to your customers, and and they're always saying, Hey, I want to simplify, simplify, simplify. We certainly again heard that in swamis presentation the other day, all about, you know, minimizing complexity. So that really is your trade office. It puts pressure on Amazon Engineering to continue to raise the bar on simplification. Isn't Is that a fair statement? >>Yeah, I think so. I mean, you know, I think any time we can do work, so our customers don't have to. I think that's a win for both of us. Um, you know, because I think we're delivering more value, and it makes it easier for our customers to get value from their data way. Absolutely believe in using the right tool for the right job. And you know you talked about an old house. You're not gonna build or renovate a house of the Swiss Army knife. It's just the wrong tool. It might work for small projects, but you're going to need something more specialized. The handle things that matter. It's and that is, uh, that's really what we see with that, you know, with that set of capabilities. So we want to provide customers with the best of both worlds. We want to give them purpose built tools so they don't have to compromise on performance or scale of functionality. And then we want to make it easy to use these together. Whether it's about data movement or things like Federated Queries, you can reach into each of them and through a single query and through a unified governance model. So it's all about stitching those together. >>Yeah, so far you've been on the right side of history. I think it serves you well on your customers. Well, I wanna come back to your leadership discussion, your your leadership session. What else could you tell us about? You know, what you covered there? >>So we we've actually had a bunch of innovations on the analytics tax. So some of the highlights are in m r, which is our managed spark. And to do service, we've been able to achieve 1.7 x better performance and open source with our spark runtime. So we've invested heavily in performance on now. EMR is also available for customers who are running and containerized environment. So we announced you Marnie chaos on then eh an integrated development environment and studio for you Marco D M R studio. So making it easier both for people at the infrastructure layer to run em are on their eks environments and make it available within their organizations but also simplifying life for data analysts and folks working with data so they can operate in that studio and not have toe mess with the details of the clusters underneath and then a bunch of innovation in red shift. We talked about Aqua already, but then we also announced data sharing for red Shift. So this makes it easy for red shift clusters to share data with other clusters without putting any load on the central producer cluster. And this also speaks to the theme of simplifying getting data from point A to point B so you could have central producer environments publishing data, which represents the source of truth, say into other departments within the organization or departments. And they can query the data, use it. It's always up to date, but it doesn't put any load on the producers that enables these really powerful data sharing on downstream data monetization capabilities like you've mentioned. In addition, like Swami mentioned in his keynote Red Shift ML, so you can now essentially train and run models that were built in sage maker and optimized from within your red shift clusters. And then we've also automated all of the performance tuning that's possible in red ships. So we really invested heavily in price performance, and now we've automated all of the things that make Red Shift the best in class data warehouse service from a price performance perspective up to three X better than others. But customers can just set red shift auto, and it'll handle workload management, data compression and data distribution. Eso making it easier to access all about performance and then the other big one was in Lake Formacion. We announced three new capabilities. One is transactions, so enabling consistent acid transactions on data lakes so you can do things like inserts and updates and deletes. We announced row based filtering for fine grained access control and that unified governance model and then automated storage optimization for Data Lake. So customers are dealing with an optimized small files that air coming off streaming systems, for example, like Formacion can auto compact those under the covers, and you can get a 78 x performance boost. It's been a busy year for prime lyrics. >>I'll say that, z that it no great great job, bro. Thanks so much for coming back in the Cube and, you know, sharing the innovations and, uh, great to see you again. And good luck in the coming here. Well, >>thank you very much. Great to be here. Great to see you. And hope we get Thio see each other in person against >>I hope so. All right. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after this short break

Published Date : Dec 10 2020

SUMMARY :

It's great to see you again. They have Great co two and always a pleasure. to, you know, essentially share data across different And so the you know the components of the name are pretty straightforward. And then you're gonna automatically keep track of the changes and keep everything up to date. So you can imagine. services or data products that are gonna help me, you know, monetize my business. that prevented that data from flowing in the way you would expect it, you'd have toe manually, And if for whatever reason, you can't what happens? So if we can recover, say, for example, you can you know that for a So let's talk about another innovation. that you might ask the system to do on your behalf. but in different ways, you know, like compare California in New York or and then the data comes then the you know, the other step would be at data ingestion Time. But the example that you just gave it the drill down to verify that the answer is correct. And I think, you know, we're seeing that increasingly, You know the users from doing things that you know, whether it's data access But the you know, the notion of what customers are doing and what we're seeing is that admission of the business or the or the organization I meant to ask you about acute customers And on then it's also serverless, so you could embed it at a really massive But then we're talking about, you know, glue, elastic views you're copying and moving And you know, the reality of customer architectures is that customers will use purpose built So that puts more pressure on you to then really what we see with that, you know, with that set of capabilities. I think it serves you well on your customers. speaks to the theme of simplifying getting data from point A to point B so you could have central in the Cube and, you know, sharing the innovations and, uh, great to see you again. thank you very much. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Rahul PathakPERSON

0.99+

Andy JassyPERSON

0.99+

AWSORGANIZATION

0.99+

DavidPERSON

0.99+

CaliforniaLOCATION

0.99+

New YorkLOCATION

0.99+

AndyPERSON

0.99+

Swiss ArmyORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

December 8DATE

0.99+

Dave VolontePERSON

0.99+

last yearDATE

0.99+

2020DATE

0.99+

third optionQUANTITY

0.99+

SwamiPERSON

0.99+

eachQUANTITY

0.99+

bothQUANTITY

0.99+

A. WPERSON

0.99+

this yearDATE

0.99+

10 instancesQUANTITY

0.98+

A threeCOMMERCIAL_ITEM

0.98+

78 xQUANTITY

0.98+

two petabytesQUANTITY

0.98+

fiveQUANTITY

0.97+

Amazon EngineeringORGANIZATION

0.97+

Red Shift MLTITLE

0.97+

FormacionORGANIZATION

0.97+

11QUANTITY

0.96+

oneQUANTITY

0.96+

one wayQUANTITY

0.96+

IntelORGANIZATION

0.96+

OneQUANTITY

0.96+

five categoriesQUANTITY

0.94+

AquaORGANIZATION

0.93+

ElasticsearchTITLE

0.93+

terabytesQUANTITY

0.93+

both worldsQUANTITY

0.93+

next decadeDATE

0.92+

two data setsQUANTITY

0.91+

Lake FormacionORGANIZATION

0.9+

single queryQUANTITY

0.9+

Data LakeORGANIZATION

0.89+

thousands of educatorsQUANTITY

0.89+

Both storesQUANTITY

0.88+

ThioPERSON

0.88+

agileTITLE

0.88+

CubaLOCATION

0.87+

dynamodbORGANIZATION

0.86+

1.7 xQUANTITY

0.86+

SwamisPERSON

0.84+

EMRTITLE

0.82+

one sizeQUANTITY

0.82+

Red ShiftTITLE

0.82+

up to three XQUANTITY

0.82+

100 gigabits per secondQUANTITY

0.82+

MarniePERSON

0.79+

last decadeDATE

0.79+

reinvent 2020EVENT

0.74+

InventEVENT

0.74+

last 10 yearsDATE

0.74+

CubeCOMMERCIAL_ITEM

0.74+

todayDATE

0.74+

A RoEVENT

0.71+

three new capabilitiesQUANTITY

0.71+

twoQUANTITY

0.7+

E T EllingPERSON

0.69+

EsoORGANIZATION

0.66+

AquaTITLE

0.64+

CubeORGANIZATION

0.63+

QueryCOMMERCIAL_ITEM

0.63+

SASORGANIZATION

0.62+

AuroraORGANIZATION

0.61+

Lake HouseORGANIZATION

0.6+

SequelTITLE

0.58+

P.PERSON

0.56+