Mohan Rokkam & Greg Gibby | 4th Gen AMD EPYC on Dell PowerEdge: Virtualization

(cheerful music) >> Welcome to theCUBE's continuing coverage of AMD's 4th Generation EPYC launch. I'm Dave Nicholson, and I'm here in our Palo Alto studios talking to Greg Gibby, senior product manager, data center products from AMD, and Mohan Rokkam, technical marketing engineer at Dell. Welcome, gentlemen. >> Mohan: Hello, hello. >> Greg: Thank you. Glad to be here. >> Good to see each of you. Just really quickly, I want to start out. Let us know a little bit about yourselves. Mohan, let's start with you. What do you do at Dell exactly? >> So I'm a technical marketing engineer at Dell. I've been with Dell for around 15 years now and my goal is to really look at the Dell powered servers and see how do customers take advantage of some of the features we have, especially with the AMD EPYC processors that have just come out. >> Greg, and what do you do at AMD? >> Yeah, so I manage our software-defined infrastructure solutions team, and really it's a cradle to grave where we work with the ISVs in the market, so VMware, Nutanix, Microsoft, et cetera, to integrate the features that we're putting into our processors and make sure they're ready to go and enabled. And then we work with our valued partners like Dell on putting those into actual solutions that customers can buy and then we work with them to sell those solutions into the market. >> Before we get into the details on the 4th Generation EPYC launch and what that means and why people should care. Mohan, maybe you can tell us a little about the relationship between Dell and AMD, how that works, and then Greg, if you've got commentary on that afterwards, that'd be great. Yeah, Mohan. >> Absolutely. Dell and AMD have a long standing partnership, right? Especially now with EPYC series. We have had products since EPYC first generation. We have been doing solutions across the whole range of Dell ecosystem. We have integrated AMD quite thoroughly and effectively and we really love how performant these systems are. So, yeah. >> Dave: Greg, what are your thoughts? >> Yeah, I would say the other thing too is, is that we need to point out is that we both have really strong relationships across the entire ecosystem. So memory vendors, the software providers, et cetera, we have technical relationships. We're working with them to optimize solutions so that ultimately when the customer buys that, they get a great user experience right out of the box. >> So, Mohan, I know that you and your team do a lot of performance validation testing as time goes by. I suspect that you had early releases of the 4th Gen EPYC processor technology. What have you been seeing so far? What can you tell us? >> AMD has definitely knocked it out of the park. Time and again, in the past four generations, in the past five years alone, we have done some database work where in five years, we have seen five exit performance. And across the board, AMD is the leader in benchmarks. We have done virtualization where we would consolidate from five into one system. We have world records in AI, we have world records in databases, we have world records in virtualization. The AMD EPYC solutions has been absolutely performant. I'll leave you with one number here. When we went from top of Stack Milan to top of Stack Genoa, we saw a performance bump of 120%. And that number just blew my mind. >> So that prompts a question for Greg. Often we, in industry insiders, think in terms of performance gains over the last generation or the current generation. A lot of customers in the real world, however, are N - 2. They're a ways back, so I guess two points on that. First of all, the kinds of increases the average person is going to see when they move to this architecture, correct me if I'm wrong, but it's even more significant than a lot of the headline numbers because they're moving two generations, number one. Correct me if I'm wrong on that, but then the other thing is the question to you, Greg. I like very long complicated questions, as you can tell. The question is, is it okay for people to skip generations or make the case for upgrades, I guess is the problem? >> Well, yeah, so a couple thoughts on that first too. Mohan talked about that five X over the generation improvements that we've seen. The other key point with that too is that we've made significant process improvements along the way moving to seven nanocomputer to now five nanocomputer and that's really reducing the total amount of power or the performance per watt the customers can realize as well. And when we look at why would a customer want to upgrade, right? And I want to rephrase that as to why aren't you? And there is a real cost of not upgrading. And so when you look at infrastructure, the average age of a server in the data center is over five years old. And if you look at the most popular processors that were sold in that timeframe, it's 8, 10, 12 cores. So now you've got a bunch of servers that you need in order to deliver the applications and meet your SLAs to your end users, and all those servers pull power. They require maintenance. They have the opportunity to go down, et cetera. You got to pay licensing and service and support costs and all those. And when you look at all the costs that roll up, even though the hardware is paid for just to keep the lights on, and not even talking about the soft costs of unplanned downtime, and, "I'm not meeting your SLAs," et cetera, it's very expensive to keep those servers running. Now, if you refresh, and now you have processors that have 32, 64, 96 cores, now you can consolidate that infrastructure and reduce your total power bill. You can reduce your CapEx, you reduce your ongoing OpEx, you improve your performance, and you improve your security profile. So it really is more cost effective to refresh than not to refresh. >> So, Mohan, what has your experience been double clicking on this topic of consolidation? I know that we're going to talk about virtualization in some of the results that you've seen. What have you seen in that regard? Does this favor better consolidation and virtualized environments? And are you both assuring us that the ROI and TCO pencil out on these new big, bad machines? >> Greg definitely hit the nail on the head, right? We are seeing tremendous savings really, if you're consolidating from two generations old. We went from, as I said, five is to one. You're going from five full servers, probably paid off down to one single server. That itself is, if you look at licensing costs, which again, with things like VMware does get pretty expensive. If you move to a single system, yes, we are at 32, 64, 96 cores, but if you compare to the licensing costs of 10 cores, two sockets, that's still pretty significant, right? That's one huge thing. Another thing which actually really drives the thing is we are looking at security, and in today's environment, security becomes a major driving factor for upgrades. Dell has its own setups, cyber-resilient architecture, as we call it, and that really is integrated from processor all the way up into the OS. And those are some of the features which customers really can take advantage of and help protect their ecosystems. >> So what kinds of virtualized environments did you test? >> We have done virtualization across primary codes with VMware, but the Azure Stack, we have looked at Nutanix. PowerFlex is another one within Dell. We have vSAN Ready Nodes. All of these, OpenShift, we have a broad variety of solutions from Dell and AMD really fits into almost every one of them very well. >> So where does hyper-converged infrastructure fit into this puzzle? We can think of a server as something that contains not only AMD's latest architecture but also latest PCIe bus technology and all of the faster memory, faster storage cards, faster nicks, all of that comes together. But how does that play out in Dell's hyper-converged infrastructure or HCI strategy? >> Dell is a leader in hyper-converged infrastructure. We have the very popular VxRail line, we have the PowerFlex, which is now going into the AWS ecosystem as well, Nutanix, and of course, Azure Stack. With all these, when you look at AMD, we have up to 96 cores coming in. We have PCIe Gen 5 which means you can now connect dual port, 100 and 200 gig nicks and get line rate on those so you can connect to your ecosystem. And I don't know if you've seen the news, 200, 400 gig routers and switchers are selling out. That's not slowing down. The network infrastructure is booming. If you want to look at the AI/ML side of things, the VDI side of things, accelerator cards are becoming more and more powerful, more and more popular. And of course they need that higher end data path that PCIe Gen 5 brings to the table. GDDR5 is another huge improvement in terms of performance and latencies. So when we take all this together, you talk about hyper-converged, all of them add into making sure that A, with hyper-converged, you get ease of management, but B, just 'cause you have ease of management doesn't mean you need to compromise on anything. And the AMD servers effectively are a no compromise offering that we at Dell are able to offer to our customers. >> So Greg, I've got a question a little bit from left field for you. We covered Supercompute Conference 2022. We were in Dallas a couple of weeks ago, and there was a lot of discussion of the current processor manufacturer battles, and a lot of buzz around 4th Gen EPYC being launched and what's coming over the next year. Do you have any thoughts on what this architecture can deliver for us in terms of things like AI? We talk about virtualization, but if you look out over the next year, do you see this kind of architecture driving significant change in the world? >> Yeah, yeah, yeah, yeah. It has the real potential to do that from just the building blocks. So we have our chiplet architecture we call it. So you have an IO die and then you have your core complexes that go around that. And we integrate it all with our infinity fabric. That architecture allows you, if we wanted to, replace some of those CCDs with specific accelerators. And so when we look two, three, four years down the road, that architecture and that capability already built into what we're delivering and can easily be moved in. We just need to make sure that when you look at doing that, that the power that's required to do that and the software, et cetera, and those accelerators actually deliver better performance as a dedicated engine versus just using standard CPUs. The other things that I would say too is if you look at emerging workloads. So data center modernization is one of the buzzwords in cloud native, right? And these container environments, well, AMD'S architecture really just screams support for those type of environments, right? Where when you get into these larger core accounts and the consolidation that Mohan talked about. Now when I'm in a container environment, that blast radius so a lot of customers have concerns around, "Hey, having a single point of failure and having more than X number of cores concerns me." If I'm in containers, that becomes less of a concern. And so when you look at cloud native, containerized applications, data center modernization, AMD's extremely well positioned to take advantage of those use cases as well. >> Yeah, Mohan, and when we talk about virtualization, I think sometimes we have to remind everyone that yeah, we're talking about not only virtualization that has a full-blown operating system in the bucket, but also virtualization where the containers have microservices and things like that. I think you had something to add, Mohan. >> I did, and I think going back to the accelerator side of business, right? When we are looking at the current technology and looking at accelerators, AMD has done a fantastic job of adding in features like AVX-512, we have the bfloat16 and eight features. And some of what these do is they're effectively built-in accelerators for certain workloads especially in the AI and media spaces. And in some of these use cases we look at, for example, are inference. Traditionally we have used external accelerator cards, but for some of the entry level and mid-level use cases, CPU is going to work just fine especially with the newer CPUs that we are seeing this fantastic performance from. The accelerators just help get us to the point where if I'm at the edge, if I'm in certain use cases, I don't need to have an accelerator in there. I can run most of my inference workloads right on the CPU. >> Yeah, yeah. You know the game. It's an endless chase to find the bottleneck. And once we've solved the puzzle, we've created a bottleneck somewhere else. Back to the supercompute conversations we had, specifically about some of the AMD EPYC processor technology and the way that Dell is packaging it up and leveraging things like connectivity. That was one of the things that was also highlighted. This idea that increasingly connectivity is critically important, not just for supercomputing, but for high-performance computing that's finding its way out of the realms of Los Alamos and down to the enterprise level. Gentlemen, any more thoughts about the partnership or maybe a hint at what's coming in the future? I know that the original AMD announcement was announcing and previewing some things that are rolling out over the next several months. So let me just toss it to Greg. What are we going to see in 2023 in terms of rollouts that you can share with us? >> That I can share with you? Yeah, so I think look forward to see more advancements in the technology at the core level. I think we've already announced our product code name Bergamo, where we'll have up to 128 cores per socket. And then as we look in, how do we continually address this demand for data, this demand for, I need actionable insights immediately, look for us to continue to drive performance leadership in our products that are coming out and address specific workloads and accelerators where appropriate and where we see a growing market. >> Mohan, final thoughts. >> On the Dell side, of course, we have four very rich and configurable options with AMD EPYC servers. But beyond that, you'll see a lot more solutions. Some of what Greg has been talking about around the next generation of processors or the next updated processors, you'll start seeing some of those. and you'll definitely see more use cases from us and how customers can implement them and take advantage of the features that. It's just exciting stuff. >> Exciting stuff indeed. Gentlemen, we have a great year ahead of us. As we approach possibly the holiday seasons, I wish both of you well. Thank you for joining us. From here in the Palo Alto studios, again, Dave Nicholson here. Stay tuned for our continuing coverage of AMD's 4th Generation EPYC launch. Thanks for joining us. (cheerful music)

Published Date : Dec 14 2022

SUMMARY :

talking to Greg Gibby, Glad to be here. What do you do at Dell exactly? of some of the features in the market, so VMware, on the 4th Generation EPYC launch the whole range of Dell ecosystem. is that we need to point out is that of the 4th Gen EPYC processor technology. Time and again, in the the question to you, Greg. of servers that you need in some of the results that you've seen. really drives the thing is we have a broad variety and all of the faster We have the very popular VxRail line, over the next year, do you that the power that's required to do that in the bucket, but also but for some of the entry I know that the original AMD in the technology at the core level. and take advantage of the features that. From here in the Palo Alto studios,

ENTITIES

Entity	Category	Confidence
Greg	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
AMD	ORGANIZATION	0.99+
Greg Gibby	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Dave	PERSON	0.99+
8	QUANTITY	0.99+
Mohan	PERSON	0.99+
32	QUANTITY	0.99+
Mohan Rokkam	PERSON	0.99+
100	QUANTITY	0.99+
200	QUANTITY	0.99+
10 cores	QUANTITY	0.99+
10	QUANTITY	0.99+
Dallas	LOCATION	0.99+
120%	QUANTITY	0.99+
two sockets	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
12 cores	QUANTITY	0.99+
two generations	QUANTITY	0.99+
2023	DATE	0.99+
five	QUANTITY	0.99+
64	QUANTITY	0.99+
200 gig	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.99+
five full servers	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
400 gig	QUANTITY	0.99+
EPYC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
five years	QUANTITY	0.99+
one system	QUANTITY	0.99+
three	QUANTITY	0.99+
Los Alamos	LOCATION	0.99+
next year	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
two generations	QUANTITY	0.99+
four years	QUANTITY	0.98+
both	QUANTITY	0.98+
Azure Stack	TITLE	0.98+
five nanocomputer	QUANTITY	0.98+

Krishna Mohan & Sowmya Rajagopalan, Tata Consultancy Services | AWS re:Invent 2022

(corporate electronic xylophone jingle intro) >> Good afternoon and welcome back to our very last segment of Tuesday's live broadcast here on theCUBE from AWS re:Invent in fabulous Las Vegas, Nevada. My name is Savannah Peterson and I am joined here by the brilliant Paul Gillin. Paul, end of our first day. You holding up, are you still feeling overwhelmed with fire hose... >> Savannah, yet my feet are killing me. (savannah laughs) >> Yeah, we've done so much walking in these chairs. >> 14,000 steps already today. It's not even dinner time. >> Hey, well, at least you've earned your dinner, Paul. I love that. I love that. I'm very excited about our next guests. We have Krishna and Sowmya joining us from Tata Consultancy Services. Now, I was impressed when I was doing my background research on you all. The Tata Group has locations in 150 different spots, 46 different countries. You have over 600,000 employees on the team. We are talking about absolutely massive scale here but, today we're going to be focused specifically on the Tata Consultancy Services. Sowmya, can you tell me what you all do? What is that team specifically in charge of? >> Yeah, TCS, first of all, thank you very much for inviting us. >> Savannah: Our pleasure. >> Maybe the last session but, we'll make it very lively. >> Savannah: It's going to be the best session. That's the best part of the day. >> Yes, that's the attitude. From a company standpoint, we are a 50 plus year old company. Part of the Tata group. We focus on IT services. We are categorized as industry verticals and we have horizontal services where AWS is one of the horizontal services that we have. And, when I talk about TCS, we focus a lot more on growth and transformation of our customers. That is one of the key objectives of the current company's growth, I would say. So, that is TCS in a nutshell. >> Extraordinarily important topic to be focused on right now. Growth, transformation, pretty much the core topics of the show. I know you're on the hospitality and transportation side of the business, which is very exciting. And, we're going to dig into that a little bit more. Krishna, you're overseeing the world. Tell us a little bit more about your role within the whole ecosystem. >> Yeah, thank you for the opportunity. Great meeting all of you. It's been awesome experience here. re:Invent is coming back, catching up, right? 50,000 people compared to 25,000 last year. So, great to see and meet all of you. Coming to my role, I am responsible for AWS Business Unit within TCS. That means I am responsible for anything that happens on cloud, on AWS. It's a Full Stack unit. I have the global responsibility. That's whether it's a applications, data, infrastructure, transformation that happens, as well as OT at the edge. So, that's my responsibility. >> Savannah: Well, I love talking about the edge. One of my favorite. >> Transformation is a theme of what you do. We heard that the pandemic accelerated digital transformation initiatives at many companies. How did you see the pandemic affecting your business, affecting the customers you were working with? >> Pandemic definitely kind of accelerated a lot of cloud adoption, right? A lot of companies initially focused on resiliency, coming back to handling the pandemic, the situation. But, it also drove a lot of innovation in the business models. They had to think on their feet, re-look at their business models, change the channels and that continued. Pandemic is thankfully gone by but, the transformation actually continued. The way that we actually see on cloud, especially transformation, it has evolved. What we call as Cloud 2.0. Now, cloud is actually more focused on future-proofing the businesses. And, the initial days it was more about future-proofing the technology and technology architecture. But, it has evolved to future-proofing businesses. That means implementing new business models, bringing in agility, measuring the business value. And, that's where we see a significant traction. >> So, it's not about technology then. It's not about infrastructure. >> It is about technology but, really delivering business value. It's about, how can I improve the customer experience? >> Well, can you give us a couple of examples of companies you work with that embody this idea? >> I can imagine in the travel and hospitality zone. Probably few communities more sensitive than when someone's having a disruption or frustration within that process. And, perhaps few time periods less chaotic than the last few years. Tell us about your experience and what you've seen. >> Absolutely. To answer your question, first of all, coming out of pandemic, right? Many customers in the travel and hospitality industry where legacy, did not modernize for the last decade or so because, there have been many ups and downs in the industry. So, during pandemic, post-pandemic, one of the the way they wanted to rebound was, can we do the transformation? First of all, cloud as a technology adoption, but, beyond that, how do customers derive value, business value? That is one of the key aspects of the old transformation. And, if you take, I can give a couple of examples. Avis Car Rental, they had monolith mainframe applications and, that was there for almost couple of decades, right? But, over a period of time, they were not able to have the availability of those applications. There were many outages. As a result, businesses could not do the bookings. Like OTAs, customers could not do the bookings, the application was not available most of the time. And, it's all legacy, right? So, that is where we all came in, TCS. How do we first of all, simplify the complexity of the landscape? That is one. Then, second is, modernize the legacy application. That's the second thing. Third is, how do you scale it? Because, everyone wants to go faster, right? How do you scale it? That is where we partnered with AWS as well, to bring in some specific solutions. One example for Avis', their Rent Shop. Because, of the lack of availability, because, it's monolith application and legacy application. It was not available. So, as a result, we partnered and we brought in our contextual knowledge of the car rental industry to kind of transform, move it to cloud. And, today, as a result of it, Avis was able to save millions of dollars from a MIB standpoint. Second, in terms of availability, that was 99.9% availability. As a result, they had a pick in their business revenue as well. So, this is one of the ways that its helped. The second example I want to quote is, United Airlines. Here again, we've been present for a long time. We have a deep industry knowledge of the airline industry. So, we brought in our airline contextual knowledge and the United landscape to bring in a TCS's solution that we developed. It's called the Aviana. It's an intelligent operations solution for the airline industry, which we have developed. It's on AWS as well, that is being implemented in United. As a result, the ground staff, they have to take decisions on the moment when there is a irregular operation. That could be flight delays, as a result, customers connections will be lost. >> Savannah: Baggage. >> Baggage, right? Baggage delays. >> So many variables. The complexity... >> exactly >> in this matrix is wild. >> So, leveraging the Aviana solution, the ground staff were able to take decisions based on exceptions. They were able to take decisions quickly so that, they improved the customer experience. I think that was one of the key successes for United in the recent times. So, those two are the examples that I would call where customers have the right business value. So, cloud was not just for technology. They all are deriving a lot of business value as well. I would say. >> How important do you think it is for companies facing these unique challenges and scaling to work with partners like TCS? And, I'm sure you would say very important, but, tell me a little bit more why it's so important and those core benefits that they're going to get. Krishna, let's start off with you. Yeah, let me take again the AWS cloud transformation, right? TCS has formed AWS Business Unit two years back. So, we are a covid baby in a way. We have been working with the AWS for more than a decade but, we formed a dedicated Full-Stack Unit to drive cloud transformation on AWS. In these last two years, we've grown three X and customers we have added 400 new customers we have added. >> Nicely done. Just want to see you there. That's huge. Especially during these times. Congratulations. >> So, it's basically about the scale that we bring in. What we have done as a differentiation is, if you look at the entire cloud journey, right from taking a decision which cloud is, right, all the way to the cloud migration modernization and running operations. So, we have built complete platform. AML based platforms, where we have taken our delivery wisdom and codified it onto these platforms. So, we support around thousand plus customers on AWS in varying capacity. All of that knowledge is codified and, that is what we bring to the table, to the customers. And, so, customers obviously appreciate that value that best practices that are coming. And, coupled with that, the industry knowledge that we have on banking, life sciences, healthcare, automotive. So, it's partly the IT, it is the industry transformation as well. Because, we are working on connected cars, for example, in automotive. We are working on accelerated drug development platforms. We're working on complete banks as a platform that we have. TCS has built on AWS. So, 400 customers are there. It's the complete banking and insurance platform. So, this is the combination of the technical expertize that is digitized using platforms, as well as the industry knowledge, is the reason why customers work with us on the cloud transformation. >> So, we're seeing you talk about the vertical industry knowledge. AWS also has its own vertical industry plays. How do you, I guess, coordinate with them or, do you compete with them or, do you stay out of each other's way? >> No, we actually collaborate aggressively. >> Savannah: I like that (laughs) >> Right, so, it's not.. >> Savannah: With vigor. >> With vigor. TCS supports approximately 14 verticals. With AWS, we went with the focused industry play. We said we look at financial services, travel, transportation, hospitality, healthcare, life sciences and automotive, to start with. And, we have Go Big plans with AWS. very focused. The collaboration is actually at the industry solutions because, AWS is a great platform, ever evolving, keeps you on on your toes to really adapt it. But, that is always going on, the collaboration. But, the industry, I'm actually glad AWS last year took a pivot on focusing on industries. Now, we talk the same language when we go in front of a board or a CEO or COO. Present it. We are talking about the future of the industry not just the future of the technology. So, it's a win-win. >> You are also developing products on top of AWS that are not industry verticals, that build on the platform. What kinds of products are those? >> For cloud transformation, for example, consulting. We have a product called Cloud Counsell. We have a decision engine on the data side. We have something called Cloud Foundation, Mason. CloudMason. It's just the foundation, right? And, entire migration and modernization factory. And, the last one on cloud operations is actually Cloud Exponence. So, these are time tested. You have Fortune 500 customers using this regularly actively leveraging that. And, these are all AWS in a well architecture framework certified. So, they work well and they're designed to work on cloud, not only in the native environment, but, also legacy environment. Because, enterprises is not just only native, cloud-native. There is a lot of legacy. Sowmya spoke about the mainframe model... >> So much legacy, we were talking about it. >> So, you have to have a combination of solutions. So, the platforms that we're building, the products we're building, work in both the environments. >> Yeah, and that agility and ability to help customers navigate that prioritization. I mean, there's so many options. We talk about how many new companies there are every year. New solutions. Our adoption of technology is accelerating. As, McKinsey said, we went through 10 years of technological evolution and workplace evolution over the first six months of the pandemic. So, really everything's moving at unprecedented velocity unlike ever before. We have a new game here on theCUBE specifically for this show. And, we are challenging our guests, prompting our guests, to give us a 30 second sizzly sound bite with your hot take on the most important themes of this year's show. Think of it as a thought leadership moment. Opportunity to plug if you really want it. Krishna, you've just given me the nod. I'm going to start with you first and then we'll then we'll pass it along, yeah >> Sure. I think on thought leadership, the way that on cloud, business value is the focus, not the technology. Technology is important, but business value is the focus. And, the way that I see it evolving is with quantum computing coming out more and more, becoming relevant, and Edge is actually becoming quite active as well. All this while on cloud, we focused on business value at the centralized place at the corporate. But, I think the real value of cloud is when you deliver the results, business results, where the customers consume it, that is at the edge. I think that's basically the combination of centralized and the edge is where the real value of cloud is, right. And, I also loud, I know you said 30 seconds but, give me 30 more seconds. >> I like your answer right now. So, I'm going to give you a little more time. Yeah, thank you. >> You've earned more time. (laughs) >> So, I like the way Adam said in the keynote, if you look at it broadly, I categorizes two things. There are a lot of offerings that are becoming comprehensive, like AWS Connect, bringing in workforce management into it, making it a complete end to end product. Similarly, Security Lake, all bringing in the entire security and compliance under one, similarly data. So, there are lot of things that he announced where it is an end to end comprehensiveness of the thing. But, what I love about is, what Amazon is known for, supply chain. So, they rolled out AWS Supply Chain offering. Walk Out technology. So, the Amazon proposition is actually being brought to AWS as a core proposition. I think that's very futuristic and I think we can see more and more customers, enterprise customers, adopting AWS more to drive transformation >> Badly needed right now. Supply chain resiliency. >> Supply chain really having its moment the last two years. File under two words. No one knew, many of us did who worked in it before this. And, here we are, soon as we lost our toilet paper, everyone's freaked out. I love that you talked about business value and also that the end customer is on the edge and, everyone kind of forgets we are essentially the edge device. This is the edge device, it's all around us. And, all the technology that we're all using that you're even talking about is built right inside here from my airlines app to my car rentals to all of it. All right Sowmya, give us your 30 second hot take, roughly. >> Taking the cue from Krishna, right? Today, things are available on AWS Marketplace. So, tomorrow, somebody wants to start an airline, they just have to come and plug and play the apps that are available in the marketplace. Especially your supply chain. The Amazon is known for that. And, a small and medium business they want to start something, right, a .com. It's very easy. So, that's something that we are all looking for. The future is going to be very, very bright and great for the businesses, is what I would say because, most of it could be plug and play with all the solutions. >> Paul: It's already been built. >> On the cloud, so, we are looking forward to it. The second thing I would talk about is, we have to take it to scale. How more and more people can leverage AWS, right? The talent is very important and, that is where partners like us focus on re-scaling our talent. We have 600,000 people, right? We are not just... >> 600,000 people! That's basically as many people live in the San Francisco Bay area for contexts for our listeners. It's how many people work for Walmart? >> It's 1.2 million in Walmart? >> Is it really? >> It is, yes, yes. That's work for Walmart, sidebar. >> So from that standpoint, as the company, we are focusing on re-skilling, up-skilling our talent in order to work AWS cloud and so on, so, that they can go and support our customers. That is something that is very important and that's going to be the future as well. Bring it to scale, go faster. >> I love that you just touched on the fact that you essentially have to practice what you preach because, you've got to think about those 600,000 people in a 100 locations across 40 plus different countries. I love it. Sowmya, I'm going to close on that note. The future is bright, just like your fabulous blazer. >> Thank you so much. Krishna, Sowmya, thank you so much for being here with us. We can't wait to see what happens next, who you help next, and how Tata continues to transform. Thank all of you for tuning in today. A full jam packed day of coverage live here from Las Vegas, Nevada. We are at AWS re:Invent with Paul Gillin. I'm Savannah Peterson. We're theCUBE, the leader in High-Tech Coverage. (corporate electronic xylophone jingle outro)

Published Date : Nov 30 2022

SUMMARY :

by the brilliant Paul Gillin. Yeah, we've done so much It's not even dinner time. on the Tata Consultancy Services. Yeah, TCS, first of Maybe the last session That's the best part of the day. Part of the Tata group. of the business, which is very exciting. I have the global responsibility. talking about the edge. We heard that the pandemic of innovation in the business models. So, it's not about technology then. the customer experience? I can imagine in the Because, of the lack of availability, Baggage, right? The complexity... So, leveraging the Aviana solution, Yeah, let me take again the AWS Just want to see you there. the table, to the customers. about the vertical industry knowledge. No, we actually future of the industry that build on the platform. And, the last one on cloud operations So much legacy, we So, the platforms that we're building, over the first six months of the pandemic. it, that is at the edge. So, I'm going to give You've earned more time. So, I like the way Badly needed right now. and also that the end that are available in the marketplace. On the cloud, so, we in the San Francisco Bay area for contexts That's work for Walmart, sidebar. standpoint, as the company, I love that you just Thank all of you for tuning in today.

ENTITIES

Entity	Category	Confidence
Savannah	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
Adam	PERSON	0.99+
Krishna	PERSON	0.99+
Paul	PERSON	0.99+
Tata Consultancy Services	ORGANIZATION	0.99+
Sowmya	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
30 second	QUANTITY	0.99+
1.2 million	QUANTITY	0.99+
two	QUANTITY	0.99+
Sowmya Rajagopalan	PERSON	0.99+
400 new customers	QUANTITY	0.99+
400 customers	QUANTITY	0.99+
one	QUANTITY	0.99+
San Francisco Bay	LOCATION	0.99+
30 seconds	QUANTITY	0.99+
100 locations	QUANTITY	0.99+
tomorrow	DATE	0.99+
last year	DATE	0.99+
Tata Group	ORGANIZATION	0.99+
United Airlines	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
14,000 steps	QUANTITY	0.99+
10 years	QUANTITY	0.99+
Second	QUANTITY	0.99+
Krishna Mohan	PERSON	0.99+
50,000 people	QUANTITY	0.99+
Tuesday	DATE	0.99+
30 more seconds	QUANTITY	0.99+
savannah	PERSON	0.99+
46 different countries	QUANTITY	0.99+
today	DATE	0.99+
600,000 people	QUANTITY	0.99+
second example	QUANTITY	0.99+
99.9%	QUANTITY	0.99+
Today	DATE	0.99+
Las Vegas, Nevada	LOCATION	0.99+
Third	QUANTITY	0.99+
pandemic	EVENT	0.99+
over 600,000 employees	QUANTITY	0.99+
Avis'	ORGANIZATION	0.99+
Avis Car Rental	ORGANIZATION	0.99+
second thing	QUANTITY	0.99+
both	QUANTITY	0.99+
Avis	ORGANIZATION	0.98+
second	QUANTITY	0.98+
three X	QUANTITY	0.98+

Sanjeev Mohan, SanjMo | MongoDB World 2022

>>Mhm. Mhm. Yeah. Hello, everybody. Welcome to the Cubes. Coverage of Mongo db World 2022. This is the first Mongo live mongo DB World. Since 2019, the Cube has covered a number of of mongo shows actually going back to when the company was called Engine. Some of you may recall Margo since then has done an i p o p o in 2017, it's It's been a rocket ship company. It's up. It'll probably do 1.2 billion in revenue this year. It's got a billion dollars in cash on the balance sheet. Uh, despite the tech clash, it's still got a 19 or $20 million valuation growing above 50% a year. Uh, company just had a really strong quarter, and and there seems to be hitting on all cylinders. My name is Dave Volonte. And here to kick it off with me as Sanjeev Mohan, who was the principal at Sanremo. So great to see you. You become a wonderful cube contributor, Former Gartner analyst. Really sharp? No, the database space in the data space generally really well, so thanks for coming back on >>you. You know, it's just amazing how exciting. The entire data space is like they used to say. Companies are All companies are software companies. All companies are data >>companies, >>so data has become the the foundation. >>They say software is eating the world. Data is eating software and a little little quips here. But this is a good size show. Four or 5000 people? I don't really know exactly. You know the numbers, but it's exciting. And of course, a lot of financial services were here at the Javits Centre. Um, let's let's lay down the basics for people of Mongo, DB is a is a document database, but they've been advancing. That's a document database as an alternative to R D. B M s. Explain that, but explain also how Mongo has broadened its capabilities and serving a lot more use cases. >>So that's my forte is like databases technology. But before even I talk about that, I have to say I am blown away by this mongo db world because mongo db uh, in beckons to all of us during the pandemic has really come of age, and it's a billion dollar company. Now we are in this brand new Javits Centre That's been built during the pandemic. And and now the company is holding this event the high 1000 people last year. So I think this company has really grown. And why has it drawn is because its offerings have grown to more developers than just a document database document databases. Revolution revolutionised the whole DBM s space where no sequel came up. Because for a change, you don't need a structured schema. You could start bringing data in this document model scheme, uh, like varying schema. But since then, they've added, uh, things like such. So they have you seen such? They added a geospatial. They had a time series last year, and this year they keep adding more and more so like, for example, they are going to add some column store indexes. So from being a purely transactional, they are now starting to address analytical. And they're starting to address more use cases, like, you know, uh, like what? What was announced this morning at keynote was faceted search. So they're expanding the going deeper and deeper into these other data >>structures. Taking Lucy made a search of first class citizens, but I want to ask you some basic questions about document database. So it's no fixed schemes. You put anything in there? Actually, so more data friendly. They're trying to simplify the use of data. Okay, that's that's pretty clear. >>What are the >>trade offs of a document database? >>So it's not like, you know, one technology has solved every problem. Every technology comes with its own tradeoffs. So in a document, you basically get rid of joining tables with primary foreign keys because you can have a flexible schemer and so and wouldn't sing single document. So it's very easy to write and and search. But when you have a lot of repeated elements and you start getting more and more complex, your document size can start expanding quite a bit because you're trying to club everything into a single space. So So that is where the complexity goes >>up. So what does that mean for for practitioner, it means they have to think about what? How they how they are ultimately gonna structure, how they're going to query so they can get the best performances that right. So they're gonna put some time in up front in order to make it pay back at the tail end, but clearly it's it's working. But is that the correct way of thinking about >>100% in, uh, the sequel world? You didn't care about the sequel. Analytical queries You just cared about how your data model was structured and then sequel would would basically such any model. But in the new sequel world, you have to know your patterns before you. You invest into the database so it's changed that equation where you come in knowing what you are signing up. >>So a couple of questions, if I can kind of Colombo questions so to Margo talks about how it's really supporting mission critical applications and at the same time, my understanding is the architecture of mongo specifically, or a document database in general. But specifically, you've got a a primary, uh, database, and you and that is the sort of the master, if you will, right and then you can create secondaries. But so help me square the circle between mission critical and really maybe a more of a focus on, say, consistency versus availability. Do customers have to sort of think about and design in that availability? How do they do that? How a Mongol customers handling that. >>So I have to say, uh, my experience of mongo db was was that the whole company, the whole ethos was developed a friendly. So, to be honest, I don't think Mongo DB was as much focused on high availability, disaster, recovery, even security. To some extent, they were more focused on developer productivity. >>And you've experienced >>simplicity. Make it simple, make the developers productive as fast as you can. What has really, uh, was an inflexion point for Mongo DB was the launch of Atlas because the atlas they were able to introduce all of these management features and hide it abstracted from the end users. So now they've got, you know, like 2014 is when Atlas came out and it was in four regions. But today they're in 100 regions, so they keep expanding, then every hyper scale cloud provider, and they've abstracted that whole managed. >>So Atlas, of course, is the managed database as a service in the cloud. And so it's those clouds, cloud infrastructure and cloud tooling that has allowed them to go after those high available application. My other question is when you talk about adding search, geospatial time series There are a lot of specialised databases that take time series persons. You have time series specialists that go deep into time series can accompany like Mongo with an all in one strategy. Uh, how close can they get to that functionality? Do they have to be? You know, it's kind of a classic Microsoft, you know, maybe not perfect, but good enough. I mean, can they compete with those other areas? Uh, with those other specialists? And what happens to those specialists if the answer is yes. What's your take on that? If that question >>makes sense So David, this is not a mongo db only issue This is this is an issue with, you know, anytime serious database, any graph database Should I put a graph database or should I put a multifunctional database multidimensional database? And and I really think there is no right or wrong answer. It just really comes down to your use case. If you have an extremely let's, uh, complex graph, you know, then maybe you should go with best of breed purpose built database. But more and more, we're starting to see that organisations are looking to simplify their environment by going in for maybe a unified database that has multiple data structures. Yeah, well, >>it's certainly it's interesting when you hear Mongo speak. They don't They don't call out Oracle specifically, but when they talk about legacy r d m r d B m s that don't scale and are complex and are expensive, they're talking about Oracle first. And of course, there are others. Um, And then when they talk about, uh, bespoke databases the horses for courses, databases that they show a picture of that that's like the poster child for Amazon. Of course, they don't call out Amazon. They're a great partner of Amazon's. But those are really the sort of two areas that mangoes going after, Um, now Oracle. Of course, we'll talk about their converged strategy, and they're taking a similar approach. But so help us understand the difference. There is just because they're sort of or close traditional r d B M s, and they have all the drawbacks associated with that. But by the way, there are some benefits as well. So how do you see that all playing >>out? So you know it. Really, uh, it's coming down to the the origins of these databases. Uh, I think they're converging to a point where they are offering similar services. And if you look at some of the benchmark numbers or you talk to users, I from a business point of view, I I don't think there's too much of a difference. Uh, technology writes. The difference is that Mongo DB started in the document space. They were more interested in availability rather than consistency. Oracle started in the relation database with focus on financial services, so asset compliance is what they're based on. And since then they've been adding other pieces, so so they differ from where they started. Oracle has been in the industry for some since 19 seventies, so they have that maturity. But then they have that legacy, >>you know, I love. Recently, Oracle announced the mongo db uh, kpi. So basically saying why? Why leave Oracle when you can just, you know, do the market? So that, to me, is a sign that Mongo DB is doing well because the Oracle calls you out, whether your workday or snowflake or mongo. You know, whoever that's a sign to me that you've got momentum and you're stealing share in that marketplace, and clearly Mongo is they're growing at 50 plus percent per year. So thinking about the early I mentioned 10 gen Early on, I remember that one of the first conferences I went to mongo conferences. It was just It was all developers. A lot of developers here as well. But they have really, since 2014, expanded the capabilities you talk about, Atlas, you talked about all these other you know, types of databases that they've added. If it seems like Mongo is becoming a platform company, uh, what are your thoughts on that in terms of them sort of up levelling the message there now, a billion dollar plus company. What's the next? You know, wave for Mongo. >>So, uh, Oracle announced mongo db a p i s a W s has document d. B has cost most db so they all have a p. I compatible a p. I s not the source code because, you know, mongo DB has its own SPL licence, so they have written their own layer on top. But at the end of the day, you know, if you if you these companies have to keep innovating to catch up with Mongo DB because we can announce a brand new capability, then all these other players have to catch up. So other cloud providers have 80% or so of capabilities, but they'll never have 100% of what Mongo DB has. So people who are diehard Mongo DB fans they prefer to stay on mongo db. They are now able to write more applications like you know, mongo DB bought realm, which is their front end. Uh, like, you know, like, if you're on social media kind of thing, you can build your applications and sink it with Atlas. So So mongo DB is now at a point where they are adding more capabilities that more like developers like, You know, five G is coming. Autonomous cars are coming, so now they can address Iot kind of use cases. So that's why it's becoming such a juggle, not because it's becoming a platform rather than a single document database. >>So atlases, the near the midterm future. Today it's about 60% of revenues, but they have what we call self serve, which is really the traditional on premise stuff. They're connecting those worlds. You're bringing up the point that. Of course, they go across clouds. You also bring up the point that they've got edge plays. We're gonna talk to Verizon later on today. And they're they've got, uh, edge edge activity going on with developers. I I call it Super Cloud. Right, This layer that floats above. Now, of course, a lot of the super Cloud concert says we're gonna hide the underlying complexity. But for developers, they wanna they might want to tap those primitives, so presumably will let them do that. But But that hybrid that what we call Super Cloud that is a new wave of innovation, is it not? And do you? Do you agree with that? And do you see that as a real opportunity from Mongo in terms of penetrating a new tan? >>Yes. So I see this is a new opportunity. In fact, one of the reasons mongo DB has grown so quickly is because they are addressing more markets than they had three pandemic. Um, Also, there are all gradations of users. Some users want full control. They want an eye as kind of, uh, someone passed. And some businesses are like, you know, we don't care. We don't want to deal with the database. So today we heard, uh, mongo db. Several went gear. So now they have surveillance capability, their past. But if you if you're more into communities, they have communities. Operator. So they're addressing the full stack of different types of developers different workloads, different geographical regions. So that that's why the market is expected. >>We're seeing abstraction layers, you know, throughout the started a physical virtual containers surveillance and eventually SuperClubs Sanjeev. Great analysis. Thanks so much for taking your time to come with the cube. Alright, Keep it right there. But right back, right after this short break. This is Dave Volonte from the Javits Centre. Mongo db World 2022. Thank you. >>Mm.

Published Date : Jun 7 2022

SUMMARY :

So great to see you. like they used to say. You know the numbers, but it's exciting. So they have you seen such? Taking Lucy made a search of first class citizens, but I want to ask you So it's not like, you know, one technology has solved every problem. But is that the correct way of thinking about But in the new sequel world, you have to know your patterns before you. is the sort of the master, if you will, right and then you can create secondaries. So I have to say, uh, my experience of mongo db was was that the So now they've got, you know, like 2014 is when Atlas came out and So Atlas, of course, is the managed database as a service in the cloud. let's, uh, complex graph, you know, then maybe you should go So how do you see that all playing in the industry for some since 19 seventies, so they have that So that, to me, is a sign that Mongo DB is doing well because the Oracle calls you out, db. They are now able to write more applications like you know, mongo DB bought realm, So atlases, the near the midterm future. So now they have surveillance We're seeing abstraction layers, you know, throughout the started a physical virtual containers surveillance

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave Volonte	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Verizon	ORGANIZATION	0.99+
Four	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
1.2 billion	QUANTITY	0.99+
2017	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
last year	DATE	0.99+
$20 million	QUANTITY	0.99+
Mongo	ORGANIZATION	0.99+
Margo	PERSON	0.99+
100%	QUANTITY	0.99+
Lucy	PERSON	0.99+
2014	DATE	0.99+
this year	DATE	0.99+
19	QUANTITY	0.99+
today	DATE	0.99+
5000 people	QUANTITY	0.99+
100 regions	QUANTITY	0.99+
one	QUANTITY	0.99+
four regions	QUANTITY	0.98+
pandemic	EVENT	0.98+
Today	DATE	0.97+
Margo	ORGANIZATION	0.97+
first	QUANTITY	0.97+
1000 people	QUANTITY	0.97+
about 60%	QUANTITY	0.97+
one technology	QUANTITY	0.97+
2019	DATE	0.95+
first conferences	QUANTITY	0.95+
above 50% a year	QUANTITY	0.94+
single space	QUANTITY	0.94+
Atlas	TITLE	0.94+
mongo DB	TITLE	0.93+
two areas	QUANTITY	0.93+
single document	QUANTITY	0.93+
atlases	TITLE	0.92+
19 seventies	DATE	0.92+
this morning	DATE	0.91+
Atlas	ORGANIZATION	0.9+
Mongo DB	TITLE	0.89+
billion dollar	QUANTITY	0.86+
one strategy	QUANTITY	0.85+
Mm.	PERSON	0.84+
50 plus percent per year	QUANTITY	0.84+
Javits Centre	LOCATION	0.83+
>100%	QUANTITY	0.82+
couple	QUANTITY	0.81+
Mongo db World 2022	EVENT	0.81+
single document database	QUANTITY	0.79+
Gartner	ORGANIZATION	0.77+
mongo db	TITLE	0.77+
10 gen	DATE	0.77+
three	QUANTITY	0.77+
Mongo DB	ORGANIZATION	0.74+
billion dollars	QUANTITY	0.74+
mongo db	TITLE	0.72+
Sanremo	LOCATION	0.72+
MongoDB World 2022	EVENT	0.69+

Tony Baer, Doug Henschen and Sanjeev Mohan, Couchbase | Couchbase Application Modernization

(upbeat music) >> Welcome to this CUBE Power Panel where we're going to talk about application modernization, also success templates, and take a look at some new survey data to see how CIOs are thinking about digital transformation, as we get deeper into the post isolation economy. And with me are three familiar VIP guests to CUBE audiences. Tony Bear, the principal at DB InSight, Doug Henschen, VP and principal analyst at Constellation Research and Sanjeev Mohan principal at SanjMo. Guys, good to see you again, welcome back. >> Thank you. >> Glad to be here. >> Thanks for having us. >> Glad to be here. >> All right, Doug. Let's get started with you. You know, this recent survey, which was commissioned by Couchbase, 650 CIOs and CTOs, and IT practitioners. So obviously very IT heavy. They responded to the following question, "In response to the pandemic, my organization accelerated our application modernization strategy and of course, an overwhelming majority, 94% agreed or strongly agreed." So I'm sure, Doug, that you're not shocked by that, but in the same survey, modernizing existing technologies was second only behind cyber security is the top investment priority this year. Doug, bring us into your world and tell us the trends that you're seeing with the clients and customers you work with in their modernization initiatives. >> Well, the survey, of course, is spot on. You know, any Constellation Research analyst, any systems integrator will tell you that we saw more transformation work in the last two years than in the prior six to eight years. A lot of it was forced, you know, a lot of movement to the cloud, a lot of process improvement, a lot of automation work, but transformational is aspirational and not every company can be a leader. You know, at Constellation, we focus our research on those market leaders and that's only, you know, the top 5% of companies that are really innovating, that are really disrupting their markets and we try to share that with companies that want to be fast followers, that these are the next 20 to 25% of companies that don't want to get left behind, but don't want to hit some of the same roadblocks and you know, pioneering pitfalls that the real leaders are encountering when they're harnessing new technologies. So the rest of the companies, you know, the cautious adopters, the laggards, many of them fall by the wayside, that's certainly what we saw during the pandemic. Who are these leaders? You know, the old saw examples that people saw at the Amazons, the Teslas, the Airbnbs, the Ubers and Lyfts, but new examples are emerging every year. And as a consumer, you immediately recognize these transformed experiences. One of my favorite examples from the pandemic is Rocket Mortgage. No disclaimer required, I don't own stock and you're not client, but when I wanted to take advantage of those record low mortgage interest rates, I called my current bank and some, you know, stall word, very established conventional banks, I'm talking to you Bank of America, City Bank, and they were taking days and weeks to get back to me. Rocket Mortgage had the locked in commitment that day, a very proactive, consistent communications across web, mobile, email, all customer touchpoints. I closed in a matter of weeks an entirely digital seamless process. This is back in the gloves and masks days and the loan officer came parked in our driveway, wiped down an iPad, handed us that iPad, we signed all those documents digitally, completely electronic workflow. The only wet signatures required were those demanded by the state. So it's easy to spot these transformed experiences. You know, Rocket had most of that in place before the pandemic, and that's why they captured 8% of the national mortgage market by 2020 and they're on track to hit 10% here in 2022. >> Yeah, those are great examples. I mean, I'm not a shareholder either, but I am a customer. I even went through the same thing in the pandemic. It was all done in digital it was a piece of cake and I happened to have to do another one with a different firm and stuck with that firm for a variety of reasons and it was night and day. So to your point, it was a forced merge to digital. If you were there beforehand, you had real advantage, it could accelerate your lead during the pandemic. Okay, now Tony bear. Mr. Bear, I understand you're skeptical about all this buzz around digital transformation. So in that same survey, the data shows that the majority of respondents said that their digital initiatives were largely reactive to outside forces, the pandemic compliance changes, et cetera. But at the same time, they indicated that the results while somewhat mixed were generally positive. So why are you skeptical? >> The reason being, and by the way, I have nothing against application modernization. The problem... I think the problem I ever said, it often gets conflated with digital transformation and digital transformation itself has become such a buzzword and so overused that it's really hard, if not impossible to pin down (coughs) what digital transformation actually means. And very often what you'll hear from, let's say a C level, you know, (mumbles) we want to run like Google regardless of whether or not that goal is realistic you know, for that organization (coughs). The thing is that we've been using, you know, businesses have been using digital data since the days of the mainframe, since the... Sorry that data has been digital. What really has changed though, is just the degree of how businesses interact with their customers, their partners, with the whole rest of the ecosystem and how their business... And how in many cases you take look at the auto industry that the nature of the business, you know, is changing. So there is real change of foot, the question is I think we need to get more specific in our goals. And when you look at it, if we can boil it down to a couple, maybe, you know, boil it down like really over simplistically, it's really all about connectedness. No, I'm not saying connectivity 'cause that's more of a physical thing, but connectedness. Being connected to your customer, being connected to your supplier, being connected to the, you know, to the whole landscape, that you operate in. And of course today we have many more channels with which we operate, you know, with customers. And in fact also if you take a look at what's happening in the automotive industry, for instance, I was just reading an interview with Bill Ford, you know, their... Ford is now rapidly ramping up their electric, you know, their electric vehicle strategy. And what they realize is it's not just a change of technology, you know, it is a change in their business, it's a change in terms of the relationship they have with their customer. Their customers have traditionally been automotive dealers who... And the automotive dealers have, you know, traditionally and in many cases by state law now have been the ones who own the relationship with the end customer. But when you go to an electric vehicle, the product becomes a lot more of a software product. And in turn, that means that Ford would have much more direct interaction with its end customers. So that's really what it's all about. It's about, you know, connectedness, it's also about the ability to act, you know, we can say agility, it's about ability not just to react, but to anticipate and act. And so... And of course with all the proliferation, you know, the explosion of data sources and connectivity out there and the cloud, which allows much more, you know, access to compute, it changes the whole nature of the ball game. The fact is that we have to avoid being overwhelmed by this and make our goals more, I guess, tangible, more strictly defined. >> Yeah, now... You know, great points there. And I want to just bring in some survey data, again, two thirds of the respondents said their digital strategies were set by IT and only 26% by the C-suite, 8% by the line of business. Now, this was largely a survey of CIOs and CTOs, but, wow, doesn't seem like the right mix. It's a Doug's point about, you know, leaders in lagers. My guess is that Rocket Mortgage, their digital strategy was led by the chief digital officer potentially. But at the same time, you would think, Tony, that application modernization is a prerequisite for digital transformation. But I want to go to Sanjeev in this war in the survey. And respondents said that on average, they want 58% of their IT spend to be in the public cloud three years down the road. Now, again, this is CIOs and CTOs, but (mumbles), but that's a big number. And there was no ambiguity because the question wasn't worded as cloud, it was worded as public cloud. So Sanjeev, what do you make of that? What's your feeling on cloud as flexible architecture? What does this all mean to you? >> Dave, 58% of IT spend in the cloud is a huge change from today. Today, most estimates, peg cloud IT spend to be somewhere around five to 15%. So what this number tells us is that the cloud journey is still in its early days, so we should buckle up. We ain't seen nothing yet, but let me add some color to this. CIOs and CTOs maybe ramping up their cloud deployment, but they still have a lot of problems to solve. I can tell you from my previous experience, for example, when I was in Gartner, I used to talk to a lot of customers who were in a rush to move into the cloud. So if we were to plot, let's say a maturity model, typically a maturity model in any discipline in IT would have something like crawl, walk, run. So what I was noticing was that these organizations were jumping straight to run because in the pandemic, they were under the gun to quickly deploy into the cloud. So now they're kind of coming back down to, you know, to crawl, walk, run. So basically they did what they had to do under the circumstances, but now they're starting to resolve some of the very, very important issues. For example, security, data privacy, governance, observability, these are all very big ticket items. Another huge problem that nav we are noticing more than we've ever seen, other rising costs. Cloud makes it so easy to onboard new use cases, but it leads to all kinds of unexpected increase in spikes in your operating expenses. So what we are seeing is that organizations are now getting smarter about where the workloads should be deployed. And sometimes it may be in more than one cloud. Multi-cloud is no longer an aspirational thing. So that is a huge trend that we are seeing and that's why you see there's so much increased planning to spend money in public cloud. We do have some issues that we still need to resolve. For example, multi-cloud sounds great, but we still need some sort of single pane of glass, control plane so we can have some fungibility and move workloads around. And some of this may also not be in public cloud, some workloads may actually be done in a more hybrid environment. >> Yeah, definitely. I call it Supercloud. People win sometimes-- >> Supercloud. >> At that term, but it's above multi-cloud, it floats, you know, on topic. But so you clearly identified some potholes. So I want to talk about the evolution of the application experience 'cause there's some potholes there too. 81% of their respondents in that survey said, "Our development teams are embracing the cloud and other technologies faster than the rest of the organization can adopt and manage them." And that was an interesting finding to me because you'd think that infrastructure is code and designing insecurity and containers and Kubernetes would be a great thing for organizations, and it is I'm sure in terms of developer productivity, but what do you make of this? Does the modernization path also have some potholes, Sanjeev? What are those? >> So, first of all, Dave, you mentioned in your previous question, there's no ambiguity, it's a public cloud. This one, I feel it has quite a bit of ambiguity because it talks about cloud and other technologies, that sort of opens up the kimono, it's like that's everything. Also, it says that the rest of the organization is not able to adopt and manage. Adoption is a business function, management is an IT function. So I feed this question is a bit loaded. We know that app modernization is here to stay, developing in the cloud removes a lot of traditional barriers or procuring instantiating infrastructure. In addition, developers today have so many more advanced tools. So they're able to develop the application faster because they have like low-code/no-code options, they have notebooks to write the machine learning code, they have the entire DevOps CI/CD tool chain that makes it easy to version control and push changes. But there are potholes. For example, are developers really interested in fixing data quality problems, all data, privacy, data, access, data governance? How about monitoring? I doubt developers want to get encumbered with all of these operationalization management pieces. Developers are very keen to deliver new functionality. So what we are now seeing is that it is left to the data team to figure out all of these operationalization productionization things that the developers have... You know, are not truly interested in that. So which actually takes me to this topic that, Dave, you've been quite actively covering and we've been talking about, see, the whole data mesh. >> Yeah, I was going to say, it's going to solve all those data quality problems, Sanjeev. You know, I'm a sucker for data mesh. (laughing) >> Yeah, I know, but see, what's going to happen with data mesh is that developers are now going to have more domain resident power to develop these applications. What happens to all of the data curation governance quality that, you know, a central team used to do. So there's a lot of open ended questions that still need to be answered. >> Yeah, That gets automated, Tony, right? With computational governance. So-- >> Of course. >> It's not trivial, it's not trivial, but I'm still an optimist by the end of the decade we'll start to get there. Doug, I want to go to you again and talk about the business case. We all remember, you know, the business case for modernization that is... We remember the Y2K, there was a big it spending binge and this was before the (mumbles) of the enterprise, right? CIOs, they'd be asked to develop new applications and the business maybe helps pay for it or offset the cost with the initial work and deployment then IT got stuck managing the sprawling portfolio for years. And a lot of the apps had limited adoption or only served a few users, so there were big pushes toward rationalizing the portfolio at that time, you know? So do I modernize, they had to make a decision, consolidate, do I sunset? You know, it was all based on value. So what's happening today and how are businesses making the case to modernize, are they going through a similar rationalization exercise, Doug? >> Well, the Y2K era experience that you talked about was back in the days of, you know, throw the requirements over the wall and then we had waterfall development that lasted months in some cases years. We see today's most successful companies building cross functional teams. You know, the C-suite the line of business, the operations, the data and analytics teams, the IT, everybody has a seat at the table to lead innovation and modernization initiatives and they don't start, the most successful companies don't start by talking about technology, they start by envisioning a business outcome by envisioning a transformed customer experience. You hear the example of Amazon writing the press release for the product or service it wants to deliver and then it works backwards to create it. You got to work backwards to determine the tech that will get you there. What's very clear though, is that you can't transform or modernize by lifting and shifting the legacy mess into the cloud. That doesn't give you the seamless processes, that doesn't give you data driven personalization, it doesn't give you a connected and consistent customer experience, whether it's online or mobile, you know, bots, chat, phone, everything that we have today that requires a modern, scalable cloud negative approach and agile deliver iterative experience where you're collaborating with this cross-functional team and course correct, again, making sure you're on track to what's needed. >> Yeah. Now, Tony, both Doug and Sanjeev have been, you know, talking about what I'm going to call this IT and business schism, and we've all done surveys. One of the things I'd love to see Couchbase do in future surveys is not only survey the it heavy, but also survey the business heavy and see what they say about who's leading the digital transformation and who's in charge of the customer experience. Do you have any thoughts on that, Tony? >> Well, there's no question... I mean, it's kind like, you know, the more things change. I mean, we've been talking about that IT and the business has to get together, we talked about this back during, and Doug, you probably remember this, back during the Y2K ERP days, is that you need these cross functional teams, we've been seeing this. I think what's happening today though, is that, you know, back in the Y2K era, we were basically going into like our bedrock systems and having to totally re-engineer them. And today what we're looking at is that, okay, those bedrock systems, the ones that basically are keeping the lights on, okay, those are there, we're not going to mess with that, but on top of that, that's where we're going to innovate. And that gives us a chance to be more, you know, more directed and therefore we can bring these related domains together. I mean, that's why just kind of, you know, talk... Where Sanjeev brought up the term of data mesh, I've been a bit of a cynic about data mesh, but I do think that work and work is where we bring a bunch of these connected teams together, teams that have some sort of shared context, though it's everybody that's... Every team that's working, let's say around the customer, for instance, which could be, you know, in marketing, it could be in sales, order processing in some cases, you know, in logistics and delivery. So I think that's where I think we... You know, there's some hope and the fact is that with all the advanced, you know, basically the low-code/no-code tools, they are ways to bring some of these other players, you know, into the process who previously had to... Were sort of, you know, more at the end of like a, you know, kind of a... Sort of like they throw it over the wall type process. So I do believe, but despite all my cynicism, I do believe there's some hope. >> Thank you. Okay, last question. And maybe all of you could answer this. Maybe, Sanjeev, you can start it off and then Doug and Tony can chime in. In the survey, about a half, nearly half of the 650 respondents said they could tangibly show their organizations improve customer experiences that were realized from digital projects in the last 12 months. Now, again, not surprising, but we've been talking about digital experiences, but there's a long way to go judging from our pandemic customer experiences. And we, again, you know, some were great, some were terrible. And so, you know, and some actually got worse, right? Will that improve? When and how will it improve? Where's 5G and things like that fit in in terms of improving customer outcomes? Maybe, Sanjeev, you could start us off here. And by the way, plug any research that you're working on in this sort of area, please do. >> Thank you, Dave. As a resident optimist on this call, I'll get us started and then I'm sure Doug and Tony will have interesting counterpoints. So I'm a technology fan boy, I have to admit, I am in all of all these new companies and how they have been able to rise up and handle extreme scale. In this time that we are speaking on this show, these food delivery companies would have probably handled tens of thousands of orders in minutes. So these concurrent orders, delivery, customer support, geospatial location intelligence, all of this has really become commonplace now. It used to be that, you know, large companies like Apple would be able to handle all of these supply chain issues, disruptions that we've been facing. But now in my opinion, I think we are seeing this in, Doug mentioned Rocket Mortgage. So we've seen it in FinTech and shopping apps. So we've seen the same scale and it's more than 5G. It includes things like... Even in the public cloud, we have much more efficient, better hardware, which can do like deep learning networks much more efficiently. So machine learning, a lot of natural language programming, being able to handle unstructured data. So in my opinion, it's quite phenomenal to see how technology has actually come to rescue and as, you know, billions of us have gone online over the last two years. >> Yeah, so, Doug, so Sanjeev's point, he's saying, basically, you ain't seen nothing yet. What are your thoughts here, your final thoughts. >> Well, yeah, I mean, there's some incredible technologies coming including 5G, but you know, it's only going to pave the cow path if the underlying app, if the underlying process is clunky. You have to modernize, take advantage of, you know, serverless scalability, autonomous optimization, advanced data science. There's lots of cutting edge capabilities out there today, but you know, lifting and shifting you got to get your hands dirty and actually modernize on that data front. I mentioned my research this year, I'm doing a lot of in depth looks at some of the analytical data platforms. You know, these lake houses we've had some conversations about that and helping companies to harness their data, to have a more personalized and predictive and proactive experience. So, you know, we're talking about the Snowflakes and Databricks and Googles and Teradata and Vertica and Yellowbrick and that's the research I'm focusing on this year. >> Yeah, your point about paving the cow path is right on, especially over the pandemic, a lot of the processes were unknown. But you saw this with RPA, paving the cow path only got you so far. And so, you know, great points there. Tony, you get the last word, bring us home. >> Well, I'll put it this way. I think there's a lot of hope in terms of that the new generation of developers that are coming in are a lot more savvy about things like data. And I think also the new generation of people in the business are realizing that we need to have data as a core competence. So I do have optimism there that the fact is, I think there is a much greater consciousness within both the business side and the technical. In the technology side, the organization of the importance of data and how to approach that. And so I'd like to just end on that note. >> Yeah, excellent. And I think you're right. Putting data at the core is critical data mesh I think very well describes the problem and (mumbles) credit lays out a solution, just the technology's not there yet, nor are the standards. Anyway, I want to thank the panelists here. Amazing. You guys are always so much fun to work with and love to have you back in the future. And thank you for joining today's broadcast brought to you by Couchbase. By the way, check out Couchbase on the road this summer at their application modernization summits, they're making up for two years of shut in and coming to you. So you got to go to couchbase.com/roadshow to find a city near you where you can meet face to face. In a moment. Ravi Mayuram, the chief technology officer of Couchbase will join me. You're watching theCUBE, the leader in high tech enterprise coverage. (bright music)

Published Date : May 19 2022

SUMMARY :

Guys, good to see you again, welcome back. but in the same survey, So the rest of the companies, you know, and I happened to have to do another one it's also about the ability to act, So Sanjeev, what do you make of that? Dave, 58% of IT spend in the cloud I call it Supercloud. it floats, you know, on topic. Also, it says that the say, it's going to solve that still need to be answered. Yeah, That gets automated, Tony, right? And a lot of the apps had limited adoption is that you can't transform or modernize One of the things I'd love to see and the business has to get together, nearly half of the 650 respondents and how they have been able to rise up you ain't seen nothing yet. and that's the research paving the cow path only got you so far. in terms of that the new and love to have you back in the future.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Tony	PERSON	0.99+
Ravi Mayuram	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
Tony Baer	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
iPad	COMMERCIAL_ITEM	0.99+
Sanjeev Mohan	PERSON	0.99+
Sanjeev	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
94%	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
58%	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Yellowbrick	ORGANIZATION	0.99+
8%	QUANTITY	0.99+
2022	DATE	0.99+
today	DATE	0.99+
City Bank	ORGANIZATION	0.99+
Bill Ford	PERSON	0.99+
two years	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
81%	QUANTITY	0.99+
10%	QUANTITY	0.99+
DB InSight	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Today	DATE	0.99+
2020	DATE	0.99+
Couchbase	ORGANIZATION	0.99+
Snowflakes	ORGANIZATION	0.99+
5%	QUANTITY	0.98+
650 CIOs	QUANTITY	0.98+
Amazons	ORGANIZATION	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
Lyfts	ORGANIZATION	0.98+
second	QUANTITY	0.98+
SanjMo	ORGANIZATION	0.98+
26%	QUANTITY	0.98+
Ubers	ORGANIZATION	0.98+
three years	QUANTITY	0.98+
650 respondents	QUANTITY	0.98+
pandemic	EVENT	0.97+
this year	DATE	0.97+
15%	QUANTITY	0.97+
Rocket	ORGANIZATION	0.97+
more than one cloud	QUANTITY	0.97+
25%	QUANTITY	0.97+
Tony bear	PERSON	0.97+
around five	QUANTITY	0.96+
two thirds	QUANTITY	0.96+
about a half	QUANTITY	0.96+

Aditya Nagarajan & Krishna Mohan, TCS AWS Business Unit | AWS re:Invent 2021

>> You're watching theCUBE. Welcome to our continuous coverage of AWS re-Invent 2021. I'm Dave Nicholson. We've got an amazing event that's been going on for the last four days with two live sets, two studios, more than 100 guests, and two very distinguished gentlemen here on the set with us live in Las Vegas. I'd like to welcome Krishna Mohan, Vice President and Global Head of TCS's AWS Business Unit. Welcome Krishna. >> Thank you Dave. >> Dave: And also with us Aditya Jagapal Nagarajan. >> Thank you. >> Dave: I hope I did your name justice. >> Perfect. >> Right, I tried. And Aditya is Head of Strategy and Business Operations for the TCS AWS Business Unit. Krishna, starting with you, tell us about TCS and AWS over the last year. What's been going on. >> Yeah. >> Thank you Dave for having me here. It's great to be in person actually, back in re-Invent, back in person, 25,000 people, but still we have pretty good measures, health measures that way. So I'm very happy to be here. TCS AWS business unit was formed three quarters back and we actually had always AWS partnership, but we actually felt that it's important to kind of have a separate business unit, which is the full stack, multi dimensional unit providing cloud migration modernization across applications, data, and infrastructure, and also main focus on industry solutions. So it has been a great three quarters, and our partnership only enhanced significantly, predominantly what we're actually seeing in the last one year. The cloud overall transformation, I think it kind of taken a different shape. It used to be cloud migration, modernization, cloud native development, but from there it has moved to enterprise transformation, that's happening on cloud, and specifically AWS majority of the time. So with that, we actually see a lot of customers. Broadly you can categorize them into three, cloud for IT, cloud for business, and cloud for innovation. And we're definitely seeing maximum traction there with our customers across the three categories. So I'm super excited to be here at the re-Invent, you know, a couple of our customers were in the keynote, Abort and Adam and Doug. In the Western Union was the keynote, Shelly covered at Western union transformation in the partner keynote with Doug, and very happy to see Linda Cower, the transformation in the United Headlines with Adam. So it's really great to see how we are helping the customers on the transformation. That's definitely, you know, the way that we see. And we have made significant progress on the overall in the last three quarters. And these kinds of wins and business transformation that has actually happened is what resulted in TCS getting the Raising Star GSA award for us. So I'm pretty happy to actually carry this little thing here. >> Is that what this is? >> Absolutely. So it means a lot because our customer in our kind of reinforcing the value, the TCS, along with AWS is bringing to the customer. >> So I wasn't going to say anything. I just assumed that you were a 2001 Space Odyssey fan and you just brought, you know, a version of the monolith with you. I wasn't sure. Congratulations. >> Thank you. That's a quite an achievement especially in the relatively short period of time. And especially with the constraints that have been placed upon all of us. Did they give you like a schwag bag with a bunch of, with, you know, like they do at the academy awards? Are you familiar with that. >> We had a great fun event on Monday afternoon. >> Fantastic. >> Yeah. >> Aditya, talk about, you're a consultancy, your organization is a consultancy. Talk about how you engage with the customers that you are helping to bridge the divide between what their business requirements are, and the technology that AWS is delivering. Because I think we all agree that everything we're seeing here from AWS is wonderful, but without an organization like yours, actual end users, actual customers, have a hard time driving benefits. So, how do you approach that? >> Gladly thank you, Dave, and thank you for theCUBE for having us here. And just borrowing from what Krishna talked about, the three layers of value creation, the cloud for IT, cloud for business and cloud for innovation. We see the journeys clients take, to start with how they look at IT modernization, and go all the way to business transformation, and look at ecosystem transformation as well. For example, we just heard about Western Union and we just came off of one with SWBC where they have completely modernized the payment systems on AWS and TCS has been the partner for transforming that for them. And that not only just means the technology layers, but also re imagining business processes in the cloud. Moving on from the financial side, if you look at the digital farming, for example, we have been working with some of the leading, the transmitter players in the healthcare industry and in the manufacturing space to look at helping farmers with AI. Right? And helping them look at how they can ensure better analytics and drone capabilities for digital farming. Drug trial development and acceleration for time to market has been a front and center for all of us in the last two years where I've been helping pharmacy organizations get better and will bring up drug trials and reach the end customers better with cloud. So there's various examples here. >> I want to poke on that a little bit. >> Aditya: Yeah. So when TCS is engaging a customer, say in farming versus pharma, how much of your interaction with them is specialized by industry vertical or specific area expertise versus the generic workings that are going to be supporting that effort in the background? What does that look like? Are you going in first with a pharma discussion, first with the farm discussion, as opposed to an overall discussion? >> It's a great point you mentioned Dave because that's the sort of essence of TCS. Because the way we look at it, we actually appeal to the industry specific. So our domain and contextual knowledge is very important to appeal to the customers and to the various stakeholders, no longer are the days where you talk about technology as a means to an end. We talk about how end customers can benefit in that context of what they're going through in that industry. And how can then technology be part of that strategy, right? So, hence, as you rightly said, domain and context first, followed by technology powering the outcome. >> Even though farm and pharma sound a lot alike. >> Right, I showed you the very difference. >> And they may share some things in common. Yes, very, very different. Krishna, talk about your go to market motion. How are clients aware of TCS? Do you have teams that engage clients directly and then bring AWS into the conversation? Or are you being brought in by AWS? Is it a combination? What does that look like? >> So, very good relevant question. So our GTM strategies is TCS has been in the, you know, serving the enterprise customers and IT transformation for 52 years now. So we have a huge base. But specifically from an AWS BU perspective, we are focusing on selective verticals, banking financial services and insurance is large, life sciences, health care, and travel, transportation and hospitality. So these are the verticals that we're actually focusing on, and given our presence in the enterprise sector, we already have a direct sales teams who are engaging with the customers directly on enterprise transformation and business transformation. And once we have that conversation, we actually take all these solutions that we have built on AWS and along with AWS. There are few customers in the last three quarters, after farming the AWS business unit, one thing that we did is with AWS we're proactively going and identifying the logos and the customers. And with the focus not on technology, with the focus on how to solve their problems on the business side and how to create new business models. So it's kind of both. We bring in, AWS brings in logos as well, so Greenfield accounts, and as well as our contextual knowledge of the industry is how the GTM is working out, and working out pretty good. >> You mentioned, you've been at this for 52 years. >> Aditya: Yeah. >> You must've been very young when you started doing this. Talk about the internal dynamics. So think of TCS, the larger organization. You represent the AWS business unit. TCS has been doing this for a long time, predating what we think now of as cloud. I'm sure that you have long existing relationships with customers, where you've been doing things for them that aren't cloudy, and those things keep the lights on at TCS, right? Important sources of revenue. Yet you're going in and you're consulting and saying, hey, you know, it might be better for you, Mr. Customer, to work with AWS and TCS, as opposed to maybe being at a data center that TCS manages, I mean, how do you manage that internal dynamic? You've got to have people at TCS who are saying, stay away, that's my revenue, don't move my cheese. What does that look like? >> Very valid question Dave. So the way that TCS is actually looking at is, twin engine strategy. There's a cost and optimization strategy, which we have. We sell the customers and operations, running the BAU if you will, business as usual, then you have something called growth and transformation. So as a strategy that we are very clear that the path of business transformation is growth and transformation channel. So we as a company are very comfortable cannibalizing our C and O in a business because we want to be relevant to the market, relevant to the customer, and relevant to the partner ecosystem. So the only way you are relevant is actually to challenge yourself, cannibalize your own business, and for the long, you know, strategy of looking at how to grow. And that's how our twin engine strategy is working. And there are a lot of customers where we have developer with contextual knowledge serving 20 years, 25 years of the customers. We know how they work, what their business is actually, you know, what's going to be the future of the business. So we are in a better position to actually transform them. And as a company, we already took cannibalize our revenue. >> So Adi, give us an example of working with a customer and give us an idea of what that customer's perspective is in terms of their place on the spectrum of, I don't want to move anything if I don't have to versus, hey, you guys can't move fast enough to deliver what I want. Where are you seeing that spectrum of customer requirements at this point? Do you feel like you're having to lead people to water still? Where are we with that? >> Well, if you asked me this question a couple of years ago, it would be about, hey, look, here's a beautiful water and the lake looks good, why don't we spend by the side and see what it tastes like? Now the question is, how much water to drink? Right? So the point being that customers have fast realized that cloud is not just an IT decision, it's a business transformation decision. So if I may just call it back what Krishna talked about, the dual engine strategy. A clear Testament to that is some of our relationships, most of our relationships are the matter has been over two decades with our clients. And that's a perfect indication of being constantly relevant for them because as their models change, as their markets change, customer expectations change, we need to constantly innovate ourselves. >> You're innovating your business just like that. >> Absolutely. >> Correct. >> So you know, as we say, you're in the boat with them and you're going through the same changes. >> And so coming back to the question which you asked, the point was we give them a point of what experience they can have with cloud by each stakeholder. The CIO wants to look at how we can look at better sustainability of their operations, keep the lights on as you said, enhance stability with more automatable capabilities, looking at DevOps, the business is completely looking at how can cloud fundamentally change my business model. And you have both these stakeholders coexisting with the same outcome towards enterprise transformation. And that's the experience which we work with them to shape. To say what the starting point is? Where would they like to go? And how can we go to them in the journey? What's interesting here is, nobody has all the answers. Neither is AWS nor customer the TCS, but we are here to create a culture of discovering the right goal and the right answers. It's very important. That's the approach to getting it working. >> Krishna and our last minute together. You've just received the Rising Star Award, 2022 is rapidly approaching, this doesn't put any pressure on you at all for 2022 because people are going to ask, what are those rising stars do again in 2022? What's on the horizon, what are the two of you excited about for next year? >> I think we are super excited with how AWS, you know, definitely in Adam's keynote, if I had to take a couple of points that I'm taking away is in addition to enhancing their core cloud capabilities, but if there's pivoted on industry solutions, you know, the fin space that they have announced, and the industrial solutions that they have announced. So that is where it very clearly aligns to our strategy of TCS, helping customers look for change their business models, implement new business models, create ecosystem play. And that's basically where we are really super excited. And another point which I took from Adam is the, they're focused on Edge with IOT and private 5G. And that's very, very important especially when you look at it both IT, as well as the IOT transformation. So we are super excited with the potential, all the new bells and whistles AWS is rolled out in last four days, And looking forward for few more of this. >> Congratulations again. It's a fantastic acknowledgement of what you've been able to do over the last, just three quarters as you mentioned, closing out 2021 in a very, very good way. Looking forward to 2022. Thank you gentlemen for joining us today here on theCUBE, and thank all of you for joining us, for continuing continuous Cube coverage of AWS re-Invent 2021. We are the leader in hybrid technology event coverage. I'm Dave Nicholson stay tuned for more from theCUBE.

Published Date : Dec 2 2021

SUMMARY :

on the set with us live in Las Vegas. Dave: And also with us for the TCS AWS Business Unit. in the partner keynote with Doug, the TCS, along with AWS is and you just brought, you know, especially in the relatively event on Monday afternoon. and the technology that AWS is delivering. and in the manufacturing space in the background? Because the way we look at it, the very difference. Or are you being brought in by AWS? and identifying the logos been at this for 52 years. You represent the AWS business unit. and for the long, you know, on the spectrum of, So the point being that business just like that. So you know, as we say, keep the lights on as you said, What's on the horizon, and the industrial solutions We are the leader in hybrid

ENTITIES

Entity	Category	Confidence
Krishna	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Doug	PERSON	0.99+
Aditya	PERSON	0.99+
Linda Cower	PERSON	0.99+
25 years	QUANTITY	0.99+
20 years	QUANTITY	0.99+
two studios	QUANTITY	0.99+
Aditya Nagarajan	PERSON	0.99+
52 years	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Krishna Mohan	PERSON	0.99+
Monday afternoon	DATE	0.99+
Shelly	PERSON	0.99+
Aditya Jagapal Nagarajan	PERSON	0.99+
TCS	ORGANIZATION	0.99+
SWBC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
next year	DATE	0.99+
Space Odyssey	TITLE	0.99+
Adam	PERSON	0.99+
2022	DATE	0.99+
two live sets	QUANTITY	0.99+
more than 100 guests	QUANTITY	0.99+
2021	DATE	0.99+
both	QUANTITY	0.99+
25,000 people	QUANTITY	0.99+
2001	DATE	0.99+
Rising Star Award	TITLE	0.98+
first	QUANTITY	0.98+
three categories	QUANTITY	0.97+
three	QUANTITY	0.97+
each stakeholder	QUANTITY	0.97+
two very distinguished gentlemen	QUANTITY	0.97+
Raising Star GSA	TITLE	0.96+
today	DATE	0.96+
Greenfield	ORGANIZATION	0.95+
last one year	DATE	0.94+
last year	DATE	0.94+

Chris Folk & Mohan Koo

>> Welcome to theCUBEs, continuing coverage of Splunk's dot conf 21. I'm Dave Nicholson, and I am joined by Chris Faulk, director, cybersecurity policy, and strategic partnerships at MITRE corporation; As well as Mohan Koo, the co-founder and chief technology officer at tech systems. Now, uh, gentlemen, we've heard this before, but I think this is going to be the best example of a conversation on this subject I've ever had. Security is a team sport. So let's talk about how that applies, where MITRE and D techs and Splunk all come together and work as a team. Uh, starting with you, Chris. miter published the, the attack framework. And, just so people are clear on that Ca- all caps, ATT, Ampersand or, AndSign, I should say. Capital C, capital K looks like attack. That's how you say it. Their framework was created by MITRE. Uh, It's a bit of a game changer. Now, enterprise security teams use that pretty religiously. So, so tell us about that, and tell us what we can expect next from MITRE. >> So thank you David, uh, pleasure to be here. You know, I think that the, um, what made attack resonate with users is it's based on data; It started with data that we observed in our networks and organized around at that time, the emergent principle that Lockheed Martin had put out on the kill chain. Uh, so it gave it structure. And we have, we have been lucky that the community has sort of embraced that concept of what we started off. We got the numbers completely wrong. Uh, we, we started off with like 41 TTPs. And, um, that was because that was based on a small subset of data that we had, uh, and what's been powerful and what's made it truly wonderful as the community's adopted it. And it's, that's, what's it's added to it. It's an additive approach. Um, and but it's all based on data and it's all just a fabulous, um, opportunity for the community to come together. So, what Myers really focused on is understanding how data, and those, uh, problems come together. And then, we surround the ecosystem of that problem with things like language. So we give it a framework and we give it, um, we give it operational data so that it actually has resonance with the users of that community. >> So give me an example, uh, of the language that's used. You know, there are, there are things that are, that are under the heading of tactics as an example. Give me an example of some of those things. What did, what's the term in plain English, and what does it mean? >> So tactics are a way for, um, an adversary to go about taking care of their business. So, in the day, uh, when we were first thinking about this, we thought about it as, um, the old cartoons where you'd have the-the-the coyote and the-the sheep would check in, you know, the coyote was given his lunchbox. He was given it, um, if you think about it, as a, uh, the adversary target list. And he was given his tools, he was, he would open up his toolbox, and he would go after those targets for the day. And he would use those tools. What we realized is that in most cases, a lot of those tools were expensive to create. They were, uh, hard to, um, train up on. And so they tended to use the same basic toolkit over and over again. What changed was, perhaps one little thing that they would exploit that was always changing. And so what, you know, what I likened it to was a burglar. A burglar would show up with his bag of- of, uh, tools. He would have a crowbar, and he would have a flashlight, and he would have a bag. And what he would do is he sometimes choose to go in through the windows. Sometimes they choose to go in through the door. Sometimes he choose to go in through the basement. It didn't matter. But once he got in the house, he had that flashlight, he had that bag, and he had that crowbar, I could figure out through my sensors, what he had in his bag or with, with him, I could catch that. And then I could alert on that, and find the other pieces of that. And so that's what really tactics, um, are about and getting that-that concept boiled down to a language that, uh, cyber defenders could readily understand and put into practice in their businesses. >> So Mohan, tell us about Dtex; And I'm particularly interested in the, in the connection between DTex and what Chris was just talking about; That MITRE has provided us, uh, this language that attack provides us. Um, essentially, you're- You're looking- you're listening for those things that go bump in the night. Chris has given us a language to describe them. Tell- tell us how Dtex fits here. >> Yeah. So, so what we're doing, David, um, and thank you for having me as well, um, what we're doing is we're bringing to the table a whole different type of telemetry, and it's all around human behavior. And, and how we got together with MITRE, um, is actually a direct connection to how we got together with Splunk as well. I'm actually sitting here in Adelaide, in Australia, at the Australian Cyber Collaboration Center. And this is an initiative we put together with the state government of South Australia, and federal government as well, um, to actually bring everybody onto one trusted group. So we could break down the silos and collaborate a hell of a lot better. As we all know, the bad guys collaborate extremely well. You know, they share everything, including their IP and their tactics, and their techniques, everything is shared. And that puts them at an extreme advantage to the good guys, and girls, right? And-and so we have to do a much better job at that collaboration. And-and when we came together and were introduced to MITRE here at the Australian Cyber Collaboration Center, we decided that taking MITREs expertise, and they've got like 15, more than 15 years, worth of dedicated experience around behavioral science, and how it contributes to insider threats and studying that in some depth. Putting that together with the data that we're collecting for our enterprise customers was something that was really, really important, and actually, you know, it was here in the Australian Cyber Collaboration Center that we first kept locked together with Splunk. And Splunk started to identify a problem statement amongst their customers too, That, you know, the data that exists out there for security operations teams just doesn't have that cleanliness and, it doesn't have the context when it comes to human behavior. And that's really what we're bringing to the, to the table here. >> So give me an example of a human behavior that you're looking for, or, you know, so, so Splunk is- Splunk is providing this data that's being gathered from logs. These events are being rolled up and, uh, and-and DTex is analyzing them. Can you give us an example that doesn't educate adversaries of-of behaviors that you look at? >> Yeah, absolutely. And I'll-I'll just touch on it. And then I'll hand over to Chris cause, cause uh, MITRE are truly the experts of this stuff. But- but what I will say is that a lot of organizations, when they think about human behavior and the insider threat, per se, they always think about the malicious actor, right? The, the Snowden type character that's, that's maliciously, and intentionally, trying to get access to take stuff. But it's, it's much more than that. It's, it's also insiders that do negligent things, and it's insider's that are victims of-of their own lack of understanding of things that they're facing. And when outsiders are cleverer, or more technically proficient, they can find ways to-to usurp the insider, and get them to do bad things without them even knowing they're doing it. And so understanding intent, and we call it, at Dtex, we call it, indicators of intent, are really important for us to know. Those indicators are what we've been working with MITRE on for the last year or so; Kind of understanding what the newest, most complicated indicators of intent are. And how do we determine those to be able to know the difference between a malicious insider, versus somebody that's just doing the wrong thing without even knowing about it? I-I don't know, Chris, if-if you wanted to touch on that a little bit. >> Yeah, yeah, yeah Chris, absolutely. You've, you know, uh, Mohan's joining us from Australia, Chris, you and MITRE have done a ton of work with the U.S. Federal Government around detection, and prevention of those insider threats. Talk to us, talk us through that. And, and more specifically, tell us how that is applicable to nongovernmental agencies. >> Yeah, well, so I mean, think at the, at the core of it, human behavior is human cue and behavior. And whether those are being applied to, uh, critical infrastructures, whether they're being applied to working at a federal government organization, or a state, local, uh, government organization, it doesn't matter. Humans, humans have behaviors. Every human has behaviors. What makes them unique, is understanding the context behind those behaviors. And then looking for, uh, indicators that are distinguishable from an individual doing his, or her, job. Right? So, one of the challenges that you have with insider behavior is that, you know, data collection is everyone's job, at every organization, right? You're always trying to put together the numbers for the spreadsheet to-to brief to your boss. Well, when you're doing that data collection, it can look like normal work. And you can't trigger on something like that, because otherwise you're going to be triggering, uh, every individual doing their job every day. So you have to add additional context, and behavioral indicators to that, to understand how the individual is doing that differently in a case where they are up to-up to no good, we'll say, as opposed to under circumstances of doing their job in a regular course of action. So, what we have long held as beliefs about how people behave are actually manifesting themselves differently in online behavior; How fast they click, um, what kinds of tools they use to do legitimate work, versus the kinds of tools that they do-to do, uh, I'll call it elicit collection. Uh, literally those kinds of subtle nuances. So while they might do the same collection activities, how fast they do it, um, where they put that information, um, how they, how often they go back to the same site, those are indicators that when taken with that behavioral context really matter. And that's what distinguishes them from just normal, typical user behavior. >> So how much does that context vary between private entities, governmental entities, and across private entities? Is this the classic 80/20 situation where, you know, 80-80% of it's the same, 20% very different? What, what does that look like? >> Yeah, I would say that, you know, an 80/20 is a very good rule. I'd probably put it up closer to 90 to 95 to five, right? So behaviors work the same. Now, the protocols that organizations have are going to drive some of that, right? So a-a government organization is going to have certain things in place that a private company may or may not. So, you know, how, how locked down the systems are, the kinds of access, um, things that, that you allow. So do you allow USB drives? Do you allow, um, those kinds of-of capabilities in your organization? So, if you're a private sector organization, but even within a private sector organization, they'd run the gamut, right? You have very locked down environments like banks, and regulated industries and then, you have very unregulated industries as well. So it really isn't about government and industry. It's about the kind of, um, protocols that are already in place for other reasons that really drive the differences between that. And then you have, again, you have those additional safeguards that you have, say with a-with a government organization and that you've got, uh, security vetting, right? So you've done security vetting of a lot of your employees, whether even if it's not security clearance, it's a- it's a personnel vetting. And so, it's an additional level, um, but all it does is change the-the emphasis of-of where you place the value in your security mechanisms. >> So, you mentioned a variety of contexts. Mohan, We've had a mass shift to remote working, obviously. Um, Splunk has shared with us that, uh, that the customers are concerned about, you know, giving- giving people visibility without compromising privacy. And I, and I-I say Splunk like Splunk is a person (man laughing) We like to personalize everything here at theCUBE, but how is DTex helping with this challenge, this challenge of not being intrusive, yet, uh, getting the important work done that needs to be done? >> Yeah, that's a, that's a great question. And-and for us, you know, we, as DTex, we kind of grew up in-in Europe, that's kind of where we became an international organization. So, employee privacy is at the heart of everything that we do. And-and, we make privacy by designing into everything that we do. So, we're actually able to, uh, pseudo anonymize every bit of data that we're collecting, so that you're actually really, truly looking for bad behaviors or unusual behaviors. You're not looking for bad people or unusual people, right? Like it's, it's a very clear distinction; and being able to do it in a way that gives you the visibility, gives the organization, the visibility to prevent against risk and to de-risk the organization without infringing on anyone's privacy is, is really critical. And, you know, as Chris was mentioning, even if you go to the private sector, you know, you've got those very regulated banks or healthcare organizations that are typically quite locked down, but we're dealing more and more with, with high-tech companies, right? A lot of bay area firms, Silicon valley companies, which have always required the flexibility for their workforce, right? They want them to be innovative. They want them to do different things. And in order to do that, they need the ability to have any tools they need to get their job done. But in those environments, you can't have too many hard and fast controls. So how do we actually provide that visibility to the organization without infringing privacy? That is absolutely what the game is about. And so, you know, not kind of having to scrape screens, and type key strokes and type video capture, you know, that's the old school way of doing it. You know, in some cases maybe you do need that level of surveillance, but in most cases you absolutely do not. And so, you know, for many, many years, a lot of enterprise security organizations have been collecting way more data than they need to and taking way more intrusive approaches. And we're about backing that off and kind of getting the right balance between security and privacy, because what we truly believe is where you overlap security and privacy, that Venn diagram that you get in the middle is where you get safety. And we really see it as, as an extension of health and safety. >> So Mohan, if we do all of these things correctly, between Splunk, MITRE, and DTex, you get the perfect scenario where you're catching bad actors and you're not inconveniencing good actors. So what's your view of this? Dystopian future, Utopian future, a mix of both? >> Well, uh, look, I think-I think that the future really is, you know, as the title to this discussion is it's a team sport, right? Like, and, and I think the, the approach that Splunk is taking right now is absolutely the right one. Like we, we need to all come together. We can't be everything to everyone. I don't think there is a one size fits all solution in enterprise security today. And those organizations that understand that and recognize that, but neither is it, are we able to continue just kind of investing in hundreds of point solutions across the enterprise and layering them across the business. Like, band-aids, we need that consolidation, but we do need to take best of breed solution providers to, to focus on those integrations and doing it properly. And that's what we've really enjoyed about working with Splunk over the last couple of years is kind of taking a very holistic approach and realizing that we all need to come together to play these teams sport because, you know, we, as detects, we bring together a very clean data set that gives you that human telemetry and then MITRE brings to get brings the behavioral science capability and behavioral science understanding. And Splunk provides that big data platform to bring everything together and show it and visualize it. And, and really that's, that's, that's, that's one way of looking at it. And I, and I think, you know, going forward those vendors or those organizations that don't recognize that that proper integration actual true integration has to be done collectively. And it has to be done in a way that's light and easy for anybody to consume. >> Perfect way to wrap this cube conversation. Thank you, Mohan. Thank you, Chris. And thank all of you for joining us on this cube conversation or continuing coverage of splunk.com 21 continues. I'm Dave Nicholson. Thanks for joining.

Published Date : Oct 19 2021

SUMMARY :

And, just so people are clear on that Ca- that we observed in our of the language that's used. And so what, you know, what I in the connection between DTex and and how it contributes to insider threats behaviors that you look at? and get them to do bad things without You've, you know, uh, Mohan's So, one of the challenges that you have additional safeguards that you have, done that needs to be done? get in the middle is where you So Mohan, if we do all And it has to be done in a And thank all of you for

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Chris Faulk	PERSON	0.99+
Adelaide	LOCATION	0.99+
Europe	LOCATION	0.99+
15	QUANTITY	0.99+
Australian Cyber Collaboration Center	ORGANIZATION	0.99+
DTex	ORGANIZATION	0.99+
Mohan Koo	PERSON	0.99+
Mohan	PERSON	0.99+
MITRE	ORGANIZATION	0.99+
U.S. Federal Government	ORGANIZATION	0.99+
Snowden	PERSON	0.99+
Splunk	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
Lockheed Martin	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
Dtex	ORGANIZATION	0.99+
last year	DATE	0.99+
Chris Folk	PERSON	0.99+
both	QUANTITY	0.98+
five	QUANTITY	0.98+
more than 15 years	QUANTITY	0.98+
English	OTHER	0.98+
D	ORGANIZATION	0.98+
90	QUANTITY	0.98+
95	QUANTITY	0.98+
first	QUANTITY	0.96+
AndSign	ORGANIZATION	0.96+
Myers	PERSON	0.96+
one	QUANTITY	0.95+
41 TTPs	QUANTITY	0.95+
Ampersand	ORGANIZATION	0.95+
MITREs	ORGANIZATION	0.95+
ATT	ORGANIZATION	0.93+
Australian Cyber Collaboration Center	ORGANIZATION	0.91+
Silicon valley	LOCATION	0.91+
today	DATE	0.9+
hundreds	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.89+
splunk.com	OTHER	0.88+
Splunk	PERSON	0.87+
one little thing	QUANTITY	0.86+
one size	QUANTITY	0.84+
one way	QUANTITY	0.83+
last couple of years	DATE	0.82+
South Australia	ORGANIZATION	0.76+
80-80%	QUANTITY	0.74+
point	QUANTITY	0.74+
one trusted group	QUANTITY	0.71+
21	OTHER	0.43+

Sanjeev Mohan, SanjMo & Nong Li, Okera | AWS Startup Showcase

(cheerful music) >> Hello everyone, welcome to today's session of theCUBE's presentation of AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics, Cloud Management Tools, featuring Okera from the cloud management migration track. I'm John Furrier, your host. We've got two great special guests today, Nong Li, founder and CTO of Okera, and Sanjeev Mohan, principal @SanjMo, and former research vice president of big data and advanced analytics at Gartner. He's a legend, been around the industry for a long time, seen the big data trends from the past, present, and knows the future. Got a great lineup here. Gentlemen, thank you for this, so, life in the trenches, lessons learned across compliance, cloud migration, analytics, and use cases for Fortune 1000s. Thanks for joining us. >> Thanks for having us. >> So Sanjeev, great to see you, I know you've seen this movie, I was saying that in the open, you've at Gartner seen all the visionaries, the leaders, you know everything about this space. It's changing extremely fast, and one of the big topics right out of the gate is not just innovation, we'll get to that, that's the fun part, but it's the regulatory compliance and audit piece of it. It's keeping people up at night, and frankly if not done right, slows things down. This is a big part of the showcase here, is to solve these problems. Share us your thoughts, what's your take on this wide-ranging issue? >> So, thank you, John, for bringing this up, and I'm so happy you mentioned the fact that, there's this notion that it can slow things down. Well I have to say that the old way of doing governance slowed things down, because it was very much about control and command. But the new approach to data governance is actually in my opinion, it's liberating data. If you want to democratize or monetize, whatever you want to call it, you cannot do it 'til you know you can trust said data and it's governed in some ways, so data governance has actually become very interesting, and today if you want to talk about three different areas within compliance regulatory, for example, we all know about the EU GDPR, we know California has CCPA, and in fact California is now getting even a more stringent version called CPRA in a couple of years, which is more aligned to GDPR. That is a first area we know we need to comply to that, we don't have any way out. But then, there are other areas, there is insider trading, there is how you secure the data that comes from third parties, you know, vendors, partners, suppliers, so Nong, I'd love to hand it over to you, and see if you can maybe throw some light into how our customers are handling these use cases. >> Yeah, absolutely, and I love what you said about balancing agility and liberating, in the face of what may be seen as things that slow you down. So we work with customers across verticals with old and new regulations, so you know, you brought up GDPR. One of our clients is using this to great effect to power their ecosystem. They are a very large retail company that has operations and customers across the world, obviously the importance of GDPR, and the regulations that imposes on them are very top of mind, and at the same time, being able to do effective targeting analytics on customer information is equally critical, right? So they're exactly at that spot where they need this customer insight for powering their business, and then the regulatory concerns are extremely prevalent for them. So in the context of GDPR, you'll hear about things like consent management and right to be forgotten, right? I, as a customer of that retailer should say "I don't want my information used for this purpose," right? "Use it for this, but not this." And you can imagine at a very, very large scale, when you have a billion customers, managing that, all the data you've collected over time through all of your devices, all of your telemetry, really, really challenging. And they're leveraging Okera embedded into their analytics platform so they can do both, right? Their data scientists and analysts who need to do everything they're doing to power the business, not have to think about these kind of very granular customer filtering requirements that need to happen, and then they leverage us to do that. So that's kind of new, right, GDPR, relatively new stuff at this point, but we obviously also work with customers that have regulations from a long long time ago, right? So I think you also mentioned insider trading and that supply chain, so we'll talk to customers, and they want really data-driven decisions on their supply chain, everything about their production pipeline, right? They want to understand all of that, and of course that makes sense, whether you're the CFO, if you're going to make business decisions, you need that information readily available, and supply chains as we know get more and more and more complex, we have more and more integrated into manufacturing and other verticals. So that's your, you're a little bit stuck, right? You want to be data-driven on those supply chain analytics, but at the same time, knowing the details of all the supply chain across all of your dependencies exposes your internal team to very high blackout periods or insider trading concerns, right? For example, if you knew Apple was buying a bunch of something, that's maybe information that only a select few people can have, and the way that manifests into data policies, 'cause you need the ability to have very, very scalable, per employee kind of scalable data restriction policies, so they can do their job easier, right? If we talk about speeding things up, instead of a very complex process for them to get approved, and approved on SEC regulations, all that kind of stuff, you can now go give them access to the part of the supply chain that they need, and no more, and limit their exposure and the company's exposure and all of that kind of stuff. So one of our customers able to do this, getting two orders of magnitude, a 100x reduction in the policies to manage the system like that. >> When I hear you talking like that, I think the old days of "Oh yeah, regulatory, it kind of slows down innovation, got to go faster," pretty basic variables, not a lot of combination of things to check. Now with cloud, there seems to be combinations, Sanjeev, because how complicated has the regulatory compliance and audit environment gotten in the past few years, because I hear security in a supply chain, I hear insider threats, I mean these are security channels, not just compliance department G&A kind of functions. You're talking about large-scale, potentially combinations of access, distribution, I mean it seems complicated. How much more complicated is it now, just than it was a few years ago? >> So, you know the way I look at it is, I'm just mentioning these companies just as an example, when PayPal or Ebay, all these companies started, they started in California. Anybody who ever did business on Ebay or PayPal, guess where that data was? In the US in some data center. Today you cannot do it. Today, data residency laws are really tough, and so now these organizations have to really understand what data needs to remain where. On top of that, we now have so many regulations. You know, earlier on if you were healthcare, you needed to be HIPAA compliant, or banking PCI DSS, but today, in the cloud, you really need to know, what data I have, what sensitive data I have, how do I discover it? So that data discovery becomes really important. What roles I have, so for example, let's say I work for a bank in the US, and I decide to move to Germany. Now, the old school is that a new rule will be created for me, because of German... >> John: New email address, all these new things happen, right? >> Right, exactly. So you end up with this really, a mass of rules and... And these are all static. >> Rules and tools, oh my god. >> Yeah. So Okera actually makes a lot of this dynamic, which reduces your cloud migration overhead, and Nong used some great examples, in fact, sorry if I take just a second, without mentioning any names, there's one of the largest banks in the world is going global in the digital space for the first time, and they're taking Okera with them. So... >> But what's the point? This is my next topic in cloud migration, I want to bring this up because, complexity, when you're in that old school kind of data center, waterfall, these old rules and tools, you have to roll this out, and it's a pain in the butt for everybody, it's a hassle, huge hassle. Cloud gives the agility, we know that, and cloud's becoming more secure, and I think now people see the on-premise, certainly things that'd be on-premises for secure things, I get that, but when you start getting into agility, and you now have cloud regions, you can start being more programmatic, so I want to get you guys' thoughts on the cloud migration, how companies who are now lifting and shifting, replatforming, what's the refactoring beyond that, because you can replatform in the cloud, and still some are kind of holding back on that. Then when you're in the cloud, the ones that are winning, the companies that are winning are the ones that are refactoring in the cloud. Doing things different with new services. Sanjeev, you start. >> Yeah, so you know, in fact lot of people tell me, "You know, we are just going to lift and shift into the cloud." But you're literally using cloud as a data center. You still have all the, if I may say, junk you had on-prem, you just moved it into the cloud, and now you're paying for it. In cloud, nothing is free. Every storage, every processing, you're going to pay for it. The most successful companies are the ones that are replatforming, they are taking advantage of the platform as a service or software as a service, so that includes things like, you pay as you go, you pay for exactly the amount you use, so you scale up and scale down or scale out and scale in, pretty quickly, you know? So you're handling that demand, so without replatforming, you are not really utilizing your- >> John: It's just hosting. >> Yeah, you're just hosting. >> It's basically hosting if you're not doing anything right there. >> Right. The reason why people sometimes resist to replatform, is because there's a hidden cost that we don't really talk about, PaaS adds 3x to IaaS cost. So, some organizations that are very mature, and they have a few thousand people in the IT department, for them, they're like "No, we just want to run it in the cloud, we have the expertise, and it's cheaper for us." But in the long run, to get the most benefit, people should think of using cloud as a service. >> Nong what's your take, because you see examples of companies, I'll just call one out, Snowflake for instance, they're essentially a data warehouse in the cloud, they refactored and they replatformed, they have a competitive advantage with the scale, so they have things that others don't have, that just hosting. Or even on-premise. The new model developing where there's real advantages, and how should companies think about this when they have to manage these data lakes, and they have to manage all these new access methods, but they want to maintain that operational stability and control and growth? >> Yeah, so. No? Yeah. >> There's a few topics that are all (indistinct) this topic. (indistinct) enterprises moving to the cloud, they do this maybe for some cost savings, but a ton of it is agility, right? The motor that the business can run at is just so much faster. So we'll work with companies in the context of cloud migration for data, where they might have a data warehouse they've been using for 20 years, and building policies over that time, right? And it's taking a long time to go proof of access and those kind of things, made more sense, right? If it took you months to procure a physical infrastructure, get machines shipped to your data center, then this data access taking so long feels okay, right? That's kind of the same rate that everything is moving. In the cloud, you can spin up new infrastructure instantly, so you don't want approvals for getting policies, creating rules, all that stuff that Sanjeev was talking about, that being slow is a huge, huge problem. So this is a very common environment that we see where they're trying to do that kind of thing. And then, for replatforming, again, they've been building these roles and processes and policies for 20 years. What they don't want to do is take 20 years to go migrate all that stuff into the cloud, right? That's probably an experience nobody wants to repeat, and frankly for many of them, people who did it originally may or may not be involved in this kind of effort. So we work with a lot of companies like that, they have their, they want stability, they got to have the business running as normal, they got to get moving into the new infrastructure, doing it in a new way that, you know, with all the kind of lessons learned, so, as Sanjeev said, one of these big banks that we work with, that classical story of on-premise data warehousing, maybe a little bit of Hadoop, moved onto AWS, S3, Snowflake, that kind of setup, extremely intricate policies, but let's go reimagine how we can do this faster, right? What we like to talk about is, you're an organization, you need a design that, if you onboarded 1000 more data users, that's got to be way, way easier than the first 10 you onboarded, right? You got to get it to be easier over time, in a really, really significant way. >> Talk about the data authorization safety factor, because I can almost imagine all the intricacies of these different tools creates specialism amongst people who operate them. And each one might have their own little authorization nuance. Trend is not to have that siloed mentality. What's your take on clients that want to just "Hey, you know what? I want to have the maximum agility, but I don't want to get caught in the weeds on some of these tripwires around access and authorization." >> Yeah, absolutely, I think it's real important to get the balance of it, right? Because if you are an enterprise, or if you have diversive teams, you want them to have the ability to use tools as best of breed for their purpose, right? But you don't want to have it be so that every tool has its own access and provisioning and whatever, that's definitely going to be a security, or at least, a lot of friction for you to get things going. So we think about that really hard, I think we've seen great success with things like SSO and Okta, right? Unifying authentication. We think there's a very, very similar thing about to happen with authorization. You want that single control plane that can integrate with all the tools, and still get the best of what you need, but it's much, much easier (indistinct). >> Okta's a great example, if people don't want to build their own thing and just go with that, same with what you guys are doing. That seems to be the dots that are connecting you, Sanjeev. The ease of use, but yet the stability factor. >> Right. Yeah, because John, today I may want to bring up a SQL editor to go into Snowflake, just as an example. Tomorrow, I may want to use the Azure Bot, you know? I may not even want to go to Snowflake, I may want to go to an underlying piece of data, or I may use Power BI, you know, for some reason, and come from Azure side, so the point is that, unless we are able to control, in some sort of a centralized manner, we will not get that consistency. And security you know is all or nothing. You cannot say "Well, I secured my Snowflake, but if you come through HTFS, Hadoop, or some, you know, that is outside of my realm, or my scope," what's the point? So that is why it is really important to have a watertight way, in fact I'm using just a few examples, maybe tomorrow I decide to use a data catalog, or I use Denodo as my data virtualization and I run a query. I'm the same identity, but I'm using different tools. I may use it from home, over VPN, or I may use it from the office, so you want this kind of flexibility, all encompassed in a policy, rather than a separate rule if you do this and this, if you do that, because then you end up with literally thousands of rules. >> And it's never going to stop, either, it's like fashion, the next tool's going to come out, it's going to be cool, and people are going to want to use it, again, you don't want to have to then move the train from the compliance side this way or that way, it's a lot of hassle, right? So we have that one capability, you can bring on new things pretty quickly. Nong, am I getting it right, this is kind of like the trend, that you're going to see more and more tools and/or things that are relevant or, certain use cases that might justify it, but yet, AppSec review, compliance review, I mean, good luck with that, right? >> Yeah, absolutely, I mean we certainly expect tools to continue to get more and more diverse, and better, right? Most innovation in the data space, and I think we... This is a great time for that, a lot of things that need to happen, and so on and so forth. So I think one of the early goals of the company, when we were just brainstorming, is we don't want data teams to not be able to use the tools because it doesn't have the right security (indistinct), right? Often those tools may not be focused on that particular area. They're great at what they do, but we want to make sure they're enabled, they do some enterprise investments, they see broader adoption much easier. A lot of those things. >> And I can hear the sirens in the background, that's someone who's not using your platform, they need some help there. But that's the case, I mean if you don't get this right, there are some consequences, and I think one of the things I would like to bring up on next track is, to talk through with you guys is, the persona pigeonhole role, "Oh yeah, a data person, the developer, the DevOps, the SRE," you start to see now, developers and with cloud developers, and data folks, people, however they get pigeonholed, kind of blending in, okay? You got data services, you got analytics, you got data scientists, you got more democratization, all these things are being kicked around, but the notion of a developer now is a data developer, because cloud is about DevOps, data is now a big part of it, it's not just some department, it's actually blending in. Just a cultural shift, can you guys share your thoughts on this trend of data people versus developers now becoming kind of one, do you guys see this happening, and if so, how? >> So when, John, I started my career, I was a DBA, and then a data architect. Today, I think you cannot have a DBA who's not a developer. That's just my opinion. Because there is so much of CICD, DevOps, that happens today, and you know, you write your code in Python, you put it in version control, you deploy using Jenkins, you roll back if there's a problem. And then, you are interacting, you're building your data to be consumed as a service. People in the past, you would have a thick client that would connect to the database over TCP/IP. Today, people don't want to connect over TCP/IP necessarily, they want to go by HTTP. And they want an API gateway in the middle. So, if you're a data architect or DBA, now you have to worry about, "I have a REST API call that's coming in, how am I going to secure that, and make sure that people are allowed to see that?" And that was just yesterday. >> Exactly. Got to build an abstraction layer. You got to build an abstraction layer. The old days, you have to worry about schema, and do all that, it was hard work back then, but now, it's much different. You got serverless, functions are going to show way... It's happening. >> Correct, GraphQL, and semantic layer, that just blows me away because, it used to be, it was all in database, then we took it out of database and we put it in a BI tool. So we said, like BusinessObjects started this whole trend. So we're like "Let's put the semantic layer there," well okay, great, but that was when everything was surrounding BusinessObjects and Oracle Database, or some other database, but today what if somebody brings Power BI or Tableau or Qlik, you know? Now you don't have a semantic layer access. So you cannot have it in the BI layer, so you move it down to its own layer. So now you've got a semantic layer, then where do you store your metrics? Same story repeats, you have a metrics layer, then the data centers want to do feature engineering, where do you store your features? You have a feature store. And before you know, this stack has disaggregated over and over and over, and then you've got layers and layers of specialization that are happening, there's query accelerators like Dremio or Trino, so you've got your data here, which Nong is trying really hard to protect, and then you've got layers and layers and layers of abstraction, and networks are fast, so the end user gets great service, but it's a nightmare for architects to bring all these things together. >> How do you tame the complexity? What's the bottom line? >> Nong? >> Yeah, so, I think... So there's a few things you need to do, right? So, we need to re-think how we express security permanence, right? I think you guys have just maybe in passing (indistinct) talked about creating all these rules and all that kind of stuff, that's been the way we've done things forever. We get to think about policies and mechanisms that are much more dynamic, right? You need to really think about not having to do any additional work, for the new things you add to the system. That's really, really core to solving the complexity problem, right? 'Cause that gets you those orders of magnitude reduction, system's got to be more expressive and map to those policies. That's one. And then second, it's got to be implemented at the right layer, right, to Sanjeev's point, close to the data, and it can service all of those applications and use cases at the same time, and have that uniformity and breadth of support. So those two things have to happen. >> Love this universal data authorization vision that you guys have. Super impressive, we had a CUBE Conversation earlier with Nick Halsey, who's a veteran in the industry, and he likes it. That's a good sign, 'cause he's seen a lot of stuff, too, Sanjeev, like yourself. This is a new thing, you're seeing compliance being addressed, and with programmatic, I'm imagining there's going to be bots someday, very quickly with AI that's going to scale that up, so they kind of don't get in the innovation way, they can still get what they need, and enable innovation. You've got cloud migration, which is only going faster and faster. Nong, you mentioned speed, that's what CloudOps is all about, developers want speed, not things in days or hours, they want it in minutes and seconds. And then finally, ultimately, how's it scale up, how does it scale up for the people operating and/or programming? These are three major pieces. What happens next? Where do we go from here, what's, the customer's sitting there saying "I need help, I need trust, I need scale, I need security." >> So, I just wrote a blog, if I may diverge a bit, on data observability. And you know, so there are a lot of these little topics that are critical, DataOps is one of them, so to me data observability is really having a transparent view of, what is the state of your data in the pipeline, anywhere in the pipeline? So you know, when we talk to these large banks, these banks have like 1000, over 1000 data pipelines working every night, because they've got that hundred, 200 data sources from which they're bringing data in. Then they're doing all kinds of data integration, they have, you know, we talked about Python or Informatica, or whatever data integration, data transformation product you're using, so you're combining this data, writing it into an analytical data store, something's going to break. So, to me, data observability becomes a very critical thing, because it shows me something broke, walk me down the pipeline, so I know where it broke. Maybe the data drifted. And I know Okera does a lot of work in data drift, you know? So this is... Nong, jump in any time, because I know we have use cases for that. >> Nong, before you get in there, I just want to highlight a quick point. I think you're onto something there, Sanjeev, because we've been reporting, and we believe, that data workflows is intellectual property. And has to be protected. Nong, go ahead, your thoughts, go ahead. >> Yeah, I mean, the observability thing is critically important. I would say when you want to think about what's next, I think it's really effectively bridging tools and processes and systems and teams that are focused on data production, with the data analysts, data scientists, that are focused on data consumption, right? I think bridging those two, which cover a lot of the topics we talked about, that's kind of where security almost meets, that's kind of where you got to draw it. I think for observability and pipelines and data movement, understanding that is essential. And I think broadly, on all of these topics, where all of us can be better, is if we're able to close the loop, get the feedback loop of success. So data drift is an example of the loop rarely being closed. It drifts upstream, and downstream users can take forever to figure out what's going on. And we'll have similar examples related to buy-ins, or data quality, all those kind of things, so I think that's really a problem that a lot of us should think about. How do we make sure that loop is closed as quickly as possible? >> Great insight. Quick aside, as the founder CTO, how's life going for you, you feel good? I mean, you started a company, doing great, it's not drifting, it's right in the stream, mainstream, right in the wheelhouse of where the trends are, you guys have a really crosshairs on the real issues, how you feeling, tell us a little bit about how you see the vision. >> Yeah, I obviously feel really good, I mean we started the company a little over five years ago, there are kind of a few things that we bet would happen, and I think those things were out of our control, I don't think we would've predicted GDPR security and those kind of things being as prominent as they are. Those things have really matured, probably as best as we could've hoped, so that feels awesome. Yeah, (indistinct) really expanded in these years, and it feels good. Feels like we're in the right spot. >> Yeah, it's great, data's competitive advantage, and certainly has a lot of issues. It could be a blocker if not done properly, and you're doing great work. Congratulations on your company. Sanjeev, thanks for kind of being my cohost in this segment, great to have you on, been following your work, and you continue to unpack it at your new place that you started. SanjMo, good to see your Twitter handle taking on the name of your new firm, congratulations. Thanks for coming on. >> Thank you so much, such a pleasure. >> Appreciate it. Okay, I'm John Furrier with theCUBE, you're watching today's session presentation of AWS Startup Showcase, featuring Okera, a hot startup, check 'em out, great solution, with a really great concept. Thanks for watching. (calm music)

Published Date : Sep 22 2021

SUMMARY :

and knows the future. and one of the big topics and I'm so happy you in the policies to manage of things to check. and I decide to move to Germany. So you end up with this really, is going global in the digital and you now have cloud regions, Yeah, so you know, if you're not doing anything right there. But in the long run, to and they have to manage all Yeah, so. In the cloud, you can spin up get caught in the weeds and still get the best of what you need, with what you guys are doing. the Azure Bot, you know? are going to want to use it, a lot of things that need to happen, the SRE," you start to see now, People in the past, you The old days, you have and networks are fast, so the for the new things you add to the system. that you guys have. So you know, when we talk Nong, before you get in there, I would say when you want I mean, you started a and I think those things and you continue to unpack it Thank you so much, of AWS Startup Showcase,

ENTITIES

Entity	Category	Confidence
Nick Halsey	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
US	LOCATION	0.99+
Nong Li	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Ebay	ORGANIZATION	0.99+
PayPal	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Tomorrow	DATE	0.99+
two	QUANTITY	0.99+
GDPR	TITLE	0.99+
Sanjeev Mohan	PERSON	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	TITLE	0.99+
today	DATE	0.99+
Python	TITLE	0.99+
Gartner	ORGANIZATION	0.99+
Tableau	TITLE	0.99+
first time	QUANTITY	0.99+
3x	QUANTITY	0.99+
both	QUANTITY	0.99+
100x	QUANTITY	0.99+
one	QUANTITY	0.99+
Okera	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.98+
two orders	QUANTITY	0.98+
Nong	ORGANIZATION	0.98+
SanjMo	PERSON	0.98+
second	QUANTITY	0.98+
Power BI	TITLE	0.98+
1000	QUANTITY	0.98+
tomorrow	DATE	0.98+
two things	QUANTITY	0.98+
Qlik	TITLE	0.98+
each one	QUANTITY	0.97+
thousands of rules	QUANTITY	0.97+
1000 more data users	QUANTITY	0.96+
Twitter	ORGANIZATION	0.96+
first 10	QUANTITY	0.96+
Okera	PERSON	0.96+
AWS	ORGANIZATION	0.96+
hundred, 200 data sources	QUANTITY	0.95+
HIPAA	TITLE	0.94+
EU	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
over 1000 data pipelines	QUANTITY	0.93+
single	QUANTITY	0.93+
first area	QUANTITY	0.93+
two great special guests	QUANTITY	0.92+
BusinessObjects	TITLE	0.92+

Deepak Mohan, Veritas | VMworld 2020

>>from around the globe. It's the Cube with digital coverage of VM World 2020 brought to you by VM Ware and its ecosystem partners. Welcome back. I'm stupid a man. And this is the cubes coverage of VM World 2020 our 11th year at VM World. And of course, we've been watching VM where they're doing a lot more in the cloud the last few years. Big partnership with A W s. And part of that is they bring their ecosystem with them. So Justus, they've had hundreds of companies working with them in the data center. When they do VM ware cloud on AWS in azure oracle, all the cloud service fighters, the data protection companies can come along and continue to partner with them. That's part of what we're gonna be discussing. Happened. Welcome back to the program. It's been a few years. Deepak Mohan. He's the executive vice president of products organization at Veritas. Deepak, thank you so much for joining us. You've got a beautiful veritas facility behind you there. >>Yeah. Nice to meet you. Stew. Yeah. We're really excited about the way in world event and a happy to be on the show. with you? >>Yes. So? So? So let's before we dig in tow data, resiliency and all the other pieces, you know, the Veritas VM relationship goes, goes way back. I mean, I think back to the early oughts, uh, you know, talk about the software companies. You know, Veritas was the, you know, software company in the industry that really got a lot of it started. Yeah, a little company that you and I both know knee M c picked up VM where the rest is history there. But veritas that that partnership has been there since the early early days off from VM ware. So just free refresh our viewers a little bit on on that partnership. >>Yeah, So we, um we're and Veritas have bean partners for, like 20 years. In fact, I'll say, both companies were founded about the same time. We, uh, neighbors in Silicon Valley and Veritas was actually one of the first companies to have introduced the concept off software defined data center software, defined storage. In fact, even before, you know, visa and all came into the picture. But as we and we're progressed with, the virtual is ations off the infrastructure. It was really important for enterprise customers to ensure that both their applications stay resilient and highly available, and all that data remains protected. So at 87% off the global fortune 500 customers are veritas customers. They're all using we and we're in their infrastructures. So any time we, um we're introduces a technology we have to ensure it is available, it's protected eso that partnership goes along a long way where every remember platform has way supported on day one for the Veritas solution. So very tight partnership. We get to see each other frequently and make sure that our solutions are joined at the hip. >>Yeah, Deepak, the term we hear from Veritas, we talked about data resiliency. And as you laid out there, you know, some things have changed. You know, 20 years ago, we weren't talking about cloud native environments, and you know all of these various pieces. Uh, it was really multi vendor heterogeneous environments that veritas lived in. Um, but even in all of these environments of, of course, you know, data resiliency, you know, making sure my data is protected, making sure things they're secure. Um, is still, you know, top of mine and so important for organizations. So, you know, talk to us a little bit about you know what that means here in 2020. With Veritas? Yes. >>So I'll say. 20 years ago, uh, we had one application. One server. Life was very fairly simple. Um, you know? Then came William where? You know, now we have the hybrid private clouds, public clouds, hybrid clouds. So the infrastructure is shifting into these other models, but the need for application resiliency and data resiliency is getting more and more complex because now we have applications that are running on Prem. They're running in virtual machines. They're running in hybrid environments. They're running in private clouds. They're running in infrastructure as a service. SAAS applications. So they're all over the place now, think about the job off the CEO. First, you have to make sure all these applications are up and running 24 by seven. Second, these applications have to be protected, which means, in case off a disaster in case often issue, you have to be ableto recover them a third. How do you be compliant with regulations with things? So so customers now have to have visibility into their infrastructure. So the job of the CEO is becoming super complex to keep in handle on everything. And that's where, uh, the companies like Veritas who are doing application resiliency data resiliency has become really important. I mean, as an example, last year at VM World Show floor, I actually counted the number off backup vendors compared to storage vendors. And there was actually more data protection and resiliency vendors on the floor. Then they were actually storage. Wentz. >>Yeah, Deepak here. You're absolutely right. We saw that, you know, for for years we used to call it storage world because they had all come in partner with VM Ware. But data protection. So So eso important here when one of the big conversations this year, of course, is that rollout of Project Pacific with VCR 77 update one just right, right ahead of the M world. Uh, I'm assuming Veritas is just keeping in lockstep with vm ware, but, you know, talk a bit about you know how that fits into the portfolio. >>Oh, absolutely. So, uh so one off the keys for veritas success over the last 20 years, uh, is that we have kept up with all the technology transformations and all the technology disruptions that happened. And as these hybrid cloud disruption that happening with you mentioned Project Pacific. But you know that it's the 10 zoo platform we are. We are one off the design partners with VM ware for to ensure the data protection layers are done correctly. Eso So we are definitely working with VM ware on the on the Chenzhou uh, resiliency as well as leveraging the Valero platform. So we'll make sure that as a customers are deploying these new solutions the Veritas Solutions out there or or to offer them the resiliency and data protection needed >>Deepak, we've watched that that real maturation of what VM was doing in the cloud, of course, the partnership, you know, first with IBM at VM World a few years ago, right after VM world, it was with a W s. And there was a lot of interest. But we are seeing that customer adoption. I wonder if you talk about how closely you worked with them. Do you have any, you know, maybe anonymous customers that you talk about? You know what they're seeing in the cloud? Why vm ware and Veritas went when they go to this environment. >>Yes. So I'll we have several customers who are moving into the cloud space, uh, leveraging VMC or now with the azure reimburse solutions. So what happens is when these customers we have large financials, for example, who are using now we anywhere and migrating their workloads into the cloud have eso. So they may be deploying virtual machines there. But the need for H A and data resilience in backup actually gets a little bit more complex because the old environments are still there on prime. Some workloads are now moving to the cloud, and they're leveraging The Veritas Solutions want to support the migration. Second, to offer the resiliency, leveraging the Veritas resiliency platform or net backup overeaters input scale. An example is I'll use an example of an air one airline customer reservation systems now moving to KWS within two availability zones. The application availability comes with the Veritas solution. So Veritas is Prue is on their journey to the cloud helping enterprise customers work in these hybrid use cases. >>Deepak, since you've got so many customers and they're going through their cloud journeys, uh, Veritas works across all the environment. You get a good view point as to where we are. One of the things we're really trying to help clarify people. We throw out these terms Hybrid cloud and multi cloud. Most customers I talked to we have a cloud strategy and you use more than one cloud. Yes. Is portability the big concern? Well, no, I'm not moving things all over the time. I don't wake up and say, you know, I'm checking the stock market and therefore I'm gonna, you know, move toe one of the other, but I need tohave my multiple environment. It's difficult on them with different skill sets. Uh, and you know, we're seeing, you know, companies like Veritas and VM where, you know, living where the customer is. So give us a little insight as toe what you're seeing from the customers, this whole hybrid, multi cloud environment. What? What does it mean to to your customers? >>Eso what? What? And says, You know, we have a variety of customers and, you know, invariably, when we talked to them, each one of them has, ah, little bit different journey to the cloud. I you know, some customers I'd say maybe more mid market. Want to move completely towards ah platform as a service approach and leverage either azure or a W s. Uh, but I'll say most of the enterprise customers are looking at, uh, taking workloads. It could be one of the applications. Some are further ahead in the journey, and they're taking now a mission Critical application. Okay, You know, it could be and s a p workload. It could be a thumb mission critical, you know, building system reservation systems and then using VM ware as the mechanism to go into the cloud with it and and and And when they do that, they're looking for the same level and same level of tools for both availability and data protection. Eso I'll say that we have lots of different examples between utilities, healthcare companies, financials, government. Yeah, who are ill say the common theme is now they're moving towards. I'll say the harder workloads are now moving to the cloud. And now they're absolutely leveraging tools from where eaters. They want to make sure that our solutions actually support those complex and highly scalable use cases. And we're absolutely doing that with the solutions. >>Deepak, you talk about some of the challenges that customers have. You know, some things have changed in 2021 thing that has not changed eyes that security is top of mind. We often see the, you know, data protection and security. Some of those pieces go hand in hand. I remember years ago talking at at the Veritas conference, it was G, D, p. R. And Ransom. Where were the big things that we talked about with every single customer as to how they were defending and preparing for that? So give us, give us the state of your environment. We know that even when everybody's working from home, unfortunately, the bad actors they're actually working over telling >>No. Yes. So I'll see the problem off. Ran somewhere has actually gotten a whole lot worse over the last couple of years. Uh, so, Aziz, we think about ransom where, uh, we have the security layer, which means, you know, first is you have to make sure your infrastructure is protected. You know, the second layer is detection. Which means how do you know if there's ransomware sitting in your environment? Because it could have come in and it may actually click in at a much later time, and the third is recovery. And to be able to recover, you need really good data protection and back up policies within the companies were able to recover it. So, of course, uh, most companies invest a lot in the security software, but we know that ransomware still get sent. It can get into a phishing attack. It can get into email some one off the employees at home clicks on something. You know, Ransomware is in eso the backup, and the data protection is the last line of defense from to be able to recover. So now you have it. You're stuck. What do you do? You want to find the last best copy, uh, be able to recover very, very quickly, and and the problem is is really serious. I was actually talking to my one off our tech support leaders, and we get at least one color day with one of our customers that have been hit with ransom er and we helped them through the recovery process s Oh, that's a heavy investment area for Veritas. Without that backup software backup exact software, but also with the hardened very terse appliances. We provide a very solid way for our customers to be able to protect and recover from Ransomware. The only thing I suggest is you know, once you have been hit at and if you don't have a good backup you know, I talked about that huge. Just state that entire state has to be protected also from ransomware, which means standardization is key. So when something happens, are you going to look at nine products to recover from or you want all your catalogs, all your data, all your insights in one place, so you can then go quickly, come back online and not have to pay the ransom? >>All right. Well, Deepak, let's let's bring it home. We're here at VM World. We we talked at the beginning about the long partnership. You were there, you know, Day zero with the VCR seven activity. What do you want people to take away from VM World 2020. When it comes to Veritas, >>I'm a key message. Tow our mutual customers as that veritas is here to support your journey to the hybrid cloud to the cloud. We are investing heavily in the solutions we Our goal is to continue providing today zero support for all we end where solutions and releases. And we're working very closely with VM ware on the 10 zoo platform rollout. We have a design partner with me and were there as well as leveraging the right AP eyes, whether to be a d. P. V i o P sent were certified on every latest versions off the VM Ware portfolio. We have several 100 engineers that work the just to make sure that we support these platforms, you know, in additional say's as the women were connects toe aws and to azure. Those solutions are also extremely well certified. So where it'll works very closely with AWS we were the first to be certified on the the AWS solutions. >>Uh, you're you're you're talking about like outposts, I believe. >>Oh, yes. Outpost. Yeah, so we just got the outpost ready. Certification, you know, works extremely well with the reimburse solutions. A swell Aziz A V s, uh, azure reimburse solutions so heavy areas off investment for us. So the same way that our customers have depended on us over the last 20 years. We are writing the technology disruptions to help our customers into the next wave with the same set off solutions working both on prime hybrid and clouds. >>Yeah, Deepak, I'm having flashbacks. You and I remember the things when it was the V x f s and the Vieques VM. And now we've got the, uh you know, uh, you know all the very the VM Ware versions on A V s and Google Cloud VM Ware engine. It gets a little confusing out there. But, hey, I really appreciate you giving us some clarity as to how you're helping customers with their their data resiliency supporting and ransomware and the deepen long partnership that Veritas and VM Ware have. Thanks so much for joining us. >>Thank you. Thank you. Stew. >>Alright, Stay tuned. Lots more coverage from VM World 2020. I'm stew minimum and thank you for watching the Cube

Published Date : Sep 29 2020

SUMMARY :

the data protection companies can come along and continue to partner with them. We're really excited about the way in world event and early oughts, uh, you know, talk about the software companies. one of the first companies to have introduced the concept off software defined data center So, you know, talk to us a little bit about you know So the infrastructure is shifting into these with vm ware, but, you know, talk a bit about you know how that fits into the portfolio. hybrid cloud disruption that happening with you mentioned Project Pacific. of course, the partnership, you know, first with IBM at VM World a few years ago, right after VM But the need for H Most customers I talked to we have a cloud strategy and you use more than one cloud. critical, you know, building system reservation systems and then using We often see the, you know, data protection and security. layer, which means, you know, first is you have to make sure your infrastructure is protected. you know, Day zero with the VCR seven activity. support these platforms, you know, in additional say's as the women were connects toe Certification, you know, And now we've got the, uh you know, Thank you. I'm stew minimum and thank you for watching the Cube

ENTITIES

Entity	Category	Confidence
Veritas	ORGANIZATION	0.99+
Deepak	PERSON	0.99+
Deepak Mohan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
2020	DATE	0.99+
Silicon Valley	LOCATION	0.99+
87%	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Second	QUANTITY	0.99+
both companies	QUANTITY	0.99+
First	QUANTITY	0.99+
2021	DATE	0.99+
VM World	ORGANIZATION	0.99+
veritas	ORGANIZATION	0.99+
Aziz	PERSON	0.99+
last year	DATE	0.99+
nine products	QUANTITY	0.99+
second layer	QUANTITY	0.99+
today	DATE	0.99+
one application	QUANTITY	0.99+
one	QUANTITY	0.99+
KWS	ORGANIZATION	0.99+
VM Ware	ORGANIZATION	0.99+
100 engineers	QUANTITY	0.99+
24	QUANTITY	0.99+
11th year	QUANTITY	0.99+
One server	QUANTITY	0.99+
this year	DATE	0.99+
20 years ago	DATE	0.99+
Stew	PERSON	0.98+
both	QUANTITY	0.98+
VM	ORGANIZATION	0.98+
VM World 2020	EVENT	0.98+
first	QUANTITY	0.98+
VM World 2020	EVENT	0.98+
stew	PERSON	0.98+
William	PERSON	0.97+
two availability zones	QUANTITY	0.97+
seven	QUANTITY	0.97+
one place	QUANTITY	0.97+
third	QUANTITY	0.97+
VM World Show	EVENT	0.96+
each one	QUANTITY	0.96+
VM World	EVENT	0.95+
Veritas	EVENT	0.95+
more than one cloud	QUANTITY	0.95+
zero support	QUANTITY	0.94+
Project Pacific	ORGANIZATION	0.94+
One	QUANTITY	0.94+
hundreds of companies	QUANTITY	0.93+
first companies	QUANTITY	0.93+
VMworld 2020	EVENT	0.92+

Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps

>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)

Published Date : Jan 14 2023

SUMMARY :

with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal

ENTITIES

Entity	Category	Confidence
Bob Muglia	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
David Flynn	PERSON	0.99+
Veronica	PERSON	0.99+
Jack	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Nick Taylor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Veronica Durgin	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Warner Media	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ionis Pharmaceutical	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Bob Muglia	PERSON	0.99+
David Flore	PERSON	0.99+
DBT Labs	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Palo Alto	LOCATION	0.99+
21 sessions	QUANTITY	0.99+
Darren Bramberm	PERSON	0.99+
33 guests	QUANTITY	0.99+
Nir Zuk	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Harveer Singh	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Supercloud 2	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
last year	DATE	0.99+
Western Union	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
200 locations	QUANTITY	0.99+
August	DATE	0.99+
Keith Townsend	PERSON	0.99+
Data Mesh	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
next week	DATE	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
second	QUANTITY	0.99+
first point	QUANTITY	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
VMware	ORGANIZATION	0.98+
Silicon Angle	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Eric Bradley	PERSON	0.98+
two	QUANTITY	0.98+
today	DATE	0.98+
Sachs	ORGANIZATION	0.98+
SAKS	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
last August	DATE	0.98+
each week	QUANTITY	0.98+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Why Should Customers Care About SuperCloud

Hello and welcome back to Supercloud 2 where we examine the intersection of cloud and data in the 2020s. My name is Dave Vellante. Our Supercloud panel, our power panel is back. Maribel Lopez is the founder and principal analyst at Lopez Research. Sanjeev Mohan is former Gartner analyst and principal at Sanjeev Mohan. And Keith Townsend is the CTO advisor. Folks, welcome back and thanks for your participation today. Good to see you. >> Okay, great. >> Great to see you. >> Thanks. Let me start, Maribel, with you. Bob Muglia, we had a conversation as part of Supercloud the other day. And he said, "Dave, I like the work, you got to simplify this a little bit." So he said, quote, "A Supercloud is a platform." He said, "Think of it as a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." And then Nelu Mihai said, "Well, wait a minute. This is just going to create more stove pipes. We need more standards in an architecture," which is kind of what Berkeley Sky Computing initiative is all about. So there's a sort of a debate going on. Is supercloud an architecture, a platform? Or maybe it's just another buzzword. Maribel, do you have a thought on this? >> Well, the easy answer would be to say it's just a buzzword. And then we could just kill the conversation and be done with it. But I think the term, it's more than that, right? The term actually isn't new. You can go back to at least 2016 and find references to supercloud in Cornell University or assist in other documents. So, having said this, I think we've been talking about Supercloud for a while, so I assume it's more than just a fancy buzzword. But I think it really speaks to that undeniable trend of moving towards an abstraction layer to deal with the chaos of what we consider managing multiple public and private clouds today, right? So one definition of the technology platform speaks to a set of services that allows companies to build and run that technology smoothly without worrying about the underlying infrastructure, which really gets back to something that Bob said. And some of the question is where that lives. And you could call that an abstraction layer. You could call it cross-cloud services, hybrid cloud management. So I see momentum there, like legitimate momentum with enterprise IT buyers that are trying to deal with the fact that they have multiple clouds now. So where I think we're moving is trying to define what are the specific attributes and frameworks of that that would make it so that it could be consistent across clouds. What is that layer? And maybe that's what the supercloud is. But one of the things I struggle with with supercloud is. What are we really trying to do here? Are we trying to create differentiated services in the supercloud layer? Is a supercloud just another variant of what AWS, GCP, or others do? You spoken to Walmart about its cloud native platform, and that's an example of somebody deciding to do it themselves because they need to deal with this today and not wait for some big standards thing to happen. So whatever it is, I do think it's something. I think we're trying to maybe create an architecture out of it would be a better way of saying it so that it does get to those set of principles, but it also needs to be edge aware. I think whenever we talk about supercloud, we're always talking about like the big centralized cloud. And I think we need to think about all the distributed clouds that we're looking at in edge as well. So that might be one of the ways that supercloud evolves. >> So thank you, Maribel. Keith, Brian Gracely, Gracely's law, things kind of repeat themselves. We've seen it all before. And so what Muglia brought to the forefront is this idea of a platform where the platform provider is really responsible for the architecture. Of course, the drawback is then you get a a bunch of stove pipes architectures. But practically speaking, that's kind of the way the industry has always evolved, right? >> So if we look at this from the practitioner's perspective and we talk about platforms, traditionally vendors have provided the platforms for us, whether it's distribution of lineage managed by or provided by Red Hat, Windows, servers, .NET, databases, Oracle. We think of those as platforms, things that are fundamental we can build on top. Supercloud isn't today that. It is a framework or idea, kind of a visionary goal to get to a point that we can have a platform or a framework. But what we're seeing repeated throughout the industry in customers, whether it's the Walmarts that's kind of supersized the idea of supercloud, or if it's regular end user organizations that are coming out with platform groups, groups who normalize cloud native infrastructure, AWS multi-cloud, VMware resources to look like one thing internally to their developers. We're seeing this trend that there's a desire for a platform that provides the capabilities of a supercloud. >> Thank you for that. Sanjeev, we often use Snowflake as a supercloud example, and now would presumably would be a platform with an architecture that's determined by the vendor. Maybe Databricks is pushing for a more open architecture, maybe more of that nirvana that we were talking about before to solve for supercloud. But regardless, the practitioner discussions show. At least currently, there's not a lot of cross-cloud data sharing. I think it could be a killer use case, egress charges or a barrier. But how do you see it? Will that change? Will we hide that underlying complexity and start sharing data across cloud? Is that something that you think Snowflake or others will be able to achieve? >> So I think we are already starting to see some of that happen. Snowflake is definitely one example that gets cited a lot. But even we don't talk about MongoDB in this like, but you could have a MongoDB cluster, for instance, with nodes sitting in different cloud providers. So there are companies that are starting to do it. The advantage that these companies have, let's take Snowflake as an example, it's a centralized proprietary platform. And they are building the capabilities that are needed for supercloud. So they're building things like you can push down your data transformations. They have the entire security and privacy suite. Data ops, they're adding those capabilities. And if I'm not mistaken, it'll be very soon, we will see them offer data observability. So it's all works great as long as you are in one platform. And if you want resilience, then Snowflake, Supercloud, great example. But if your primary goal is to choose the most cost-effective service irrespective of which cloud it sits in, then things start falling sideways. For example, I may be a very big Snowflake user. And I like Snowflake's resilience. I can move from one cloud to another cloud. Snowflake does it for me. But what if I want to train a very large model? Maybe Databricks is a better platform for that. So how do I do move my workload from one platform to another platform? That tooling does not exist. So we need server hybrid, cross-cloud, data ops platform. Walmart has done a great job, but they built it by themselves. Not every company is Walmart. Like Maribel and Keith said, we need standards, we need reference architectures, we need some sort of a cost control. I was just reading recently, Accenture has been public about their AWS bill. Every time they get the bill is tens of millions of lines, tens of millions 'cause there are over thousand teams using AWS. If we have not been able to corral a usage of a single cloud, now we're talking about supercloud, we've got multiple clouds, and hybrid, on-prem, and edge. So till we've got some cross-platform tooling in place, I think this will still take quite some time for it to take shape. >> It's interesting. Maribel, Walmart would tell you that their on-prem infrastructure is cheaper to run than the stuff in the cloud. but at the same time, they want the flexibility and the resiliency of their three-legged stool model. So the point as Sanjeev was making about hybrid. It's an interesting balance, isn't it, between getting your lowest cost and at the same time having best of breed and scale? >> It's basically what you're trying to optimize for, as you said, right? And by the way, to the earlier point, not everybody is at Walmart's scale, so it's not actually cheaper for everybody to have the purchasing power to make the cloud cheaper to have it on-prem. But I think what you see almost every company, large or small, moving towards is this concept of like, where do I find the agility? And is the agility in building the infrastructure for me? And typically, the thing that gives you outside advantage as an organization is not how you constructed your cloud computing infrastructure. It might be how you structured your data analytics as an example, which cloud is related to that. But how do you marry those two things? And getting back to sort of Sanjeev's point. We're in a real struggle now where one hand we want to have best of breed services and on the other hand we want it to be really easy to manage, secure, do data governance. And those two things are really at odds with each other right now. So if you want all the knobs and switches of a service like geospatial analytics and big query, you're going to have to use Google tools, right? Whereas if you want visibility across all the clouds for your application of state and understand the security and governance of that, you're kind of looking for something that's more cross-cloud tooling at that point. But whenever you talk to somebody about cross-cloud tooling, they look at you like that's not really possible. So it's a very interesting time in the market. Now, we're kind of layering this concept of supercloud on it. And some people think supercloud's about basically multi-cloud tooling, and some people think it's about a whole new architectural stack. So we're just not there yet. But it's not all about cost. I mean, cloud has not been about cost for a very, very long time. Cloud has been about how do you really make the most of your data. And this gets back to cross-cloud services like Snowflake. Why did they even exist? They existed because we had data everywhere, but we need to treat data as a unified object so that we can analyze it and get insight from it. And so that's where some of the benefit of these cross-cloud services are moving today. Still a long way to go, though, Dave. >> Keith, I reached out to my friends at ETR given the macro headwinds, And you're right, Maribel, cloud hasn't really been about just about cost savings. But I reached out to the ETR, guys, what's your data show in terms of how customers are dealing with the economic headwinds? And they said, by far, their number one strategy to cut cost is consolidating redundant vendors. And a distant second, but still notable was optimizing cloud costs. Maybe using reserve instances, or using more volume buying. Nowhere in there. And I asked them to, "Could you go look and see if you can find it?" Do we see repatriation? And you hear this a lot. You hear people whispering as analysts, "You better look into that repatriation trend." It's pretty big. You can't find it. But some of the Walmarts in the world, maybe even not repatriating, but they maybe have better cost structure on-prem. Keith, what are you seeing from the practitioners that you talk to in terms of how they're dealing with these headwinds? >> Yeah, I just got into a conversation about this just this morning with (indistinct) who is an analyst over at GigaHome. He's reading the same headlines. Repatriation is happening at large scale. I think this is kind of, we have these quiet terms now. We have quiet quitting, we have quiet hiring. I think we have quiet repatriation. Most people haven't done away with their data centers. They're still there. Whether they're completely on-premises data centers, and they own assets, or they're partnerships with QTX, Equinix, et cetera, they have these private cloud resources. What I'm seeing practically is a rebalancing of workloads. Do I really need to pay AWS for this instance of SAP that's on 24 hours a day versus just having it on-prem, moving it back to my data center? I've talked to quite a few customers who were early on to moving their static SAP workloads onto the public cloud, and they simply moved them back. Surprising, I was at VMware Explore. And we can talk about this a little bit later on. But our customers, net new, not a lot that were born in the cloud. And they get to this point where their workloads are static. And they look at something like a Kubernetes, or a OpenShift, or VMware Tanzu. And they ask the question, "Do I need the scalability of cloud?" I might consider being a net new VMware customer to deliver this base capability. So are we seeing repatriation as the number one reason? No, I think internal IT operations are just naturally come to this realization. Hey, I have these resources on premises. The private cloud technologies have moved far along enough that I can just simply move this workload back. I'm not calling it repatriation, I'm calling it rightsizing for the operating model that I have. >> Makes sense. Yeah. >> Go ahead. >> If I missed something, Dave, why we are on this topic of repatriation. I'm actually surprised that we are talking about repatriation as a very big thing. I think repatriation is happening, no doubt, but it's such a small percentage of cloud migration that to me it's a rounding error in my opinion. I think there's a bigger problem. The problem is that people don't know where the cost is. If they knew where the cost was being wasted in the cloud, they could do something about it. But if you don't know, then the easy answer is cloud costs a lot and moving it back to on-premises. I mean, take like Capital One as an example. They got rid of all the data centers. Where are they going to repatriate to? They're all in the cloud at this point. So I think my point is that data observability is one of the places that has seen a lot of traction is because of cost. Data observability, when it first came into existence, it was all about data quality. Then it was all about data pipeline reliability. And now, the number one killer use case is FinOps. >> Maribel, you had a comment? >> Yeah, I'm kind of in violent agreement with both Sanjeev and Keith. So what are we seeing here? So the first thing that we see is that many people wildly overspent in the big public cloud. They had stranded cloud credits, so to speak. The second thing is, some of them still had infrastructure that was useful. So why not use it if you find the right workloads to what Keith was talking about, if they were more static workloads, if it was already there? So there is a balancing that's going on. And then I think fundamentally, from a trend standpoint, these things aren't binary. Everybody, for a while, everything was going to go to the public cloud and then people are like, "Oh, it's kind of expensive." Then they're like, "Oh no, they're going to bring it all on-prem 'cause it's really expensive." And it's like, "Well, that doesn't necessarily get me some of the new features and functionalities I might want for some of my new workloads." So I'm going to put the workloads that have a certain set of characteristics that require cloud in the cloud. And if I have enough capability on-prem and enough IT resources to manage certain things on site, then I'm going to do that there 'cause that's a more cost-effective thing for me to do. It's not binary. That's why we went to hybrid. And then we went to multi just to describe the fact that people added multiple public clouds. And now we're talking about super, right? So I don't look at it as a one-size-fits-all for any of this. >> A a number of practitioners leading up to Supercloud2 have told us that they're solving their cloud complexity by going in monocloud. So they're putting on the blinders. Even though across the organization, there's other groups using other clouds. You're like, "In my group, we use AWS, or my group, we use Azure. And those guys over there, they use Google. We just kind of keep it separate." Are you guys hearing this in your view? Is that risky? Are they missing out on some potential to tap best of breed? What do you guys think about that? >> Everybody thinks they're monocloud. Is anybody really monocloud? It's like a group is monocloud, right? >> Right. >> This genie is out of the bottle. We're not putting the genie back in the bottle. You might think your monocloud and you go like three doors down and figure out the guy or gal is on a fundamentally different cloud, running some analytics workload that you didn't know about. So, to Sanjeev's earlier point, they don't even know where their cloud spend is. So I think the concept of monocloud, how that's actually really realized by practitioners is primary and then secondary sources. So they have a primary cloud that they run most of their stuff on, and that they try to optimize. And we still have forked workloads. Somebody decides, "Okay, this SAP runs really well on this, or these analytics workloads run really well on that cloud." And maybe that's how they parse it. But if you really looked at it, there's very few companies, if you really peaked under the hood and did an analysis that you could find an actual monocloud structure. They just want to pull it back in and make it more manageable. And I respect that. You want to do what you can to try to streamline the complexity of that. >> Yeah, we're- >> Sorry, go ahead, Keith. >> Yeah, we're doing this thing where we review AWS service every day. Just in your inbox, learn about a new AWS service cursory. There's 238 AWS products just on the AWS cloud itself. Some of them are redundant, but you get the idea. So the concept of monocloud, I'm in filing agreement with Maribel on this that, yes, a group might say I want a primary cloud. And that primary cloud may be the AWS. But have you tried the licensed Oracle database on AWS? It is really tempting to license Oracle on Oracle Cloud, Microsoft on Microsoft. And I can't get RDS anywhere but Amazon. So while I'm driven to desire the simplicity, the reality is whether be it M&A, licensing, data sovereignty. I am forced into a multi-cloud management style. But I do agree most people kind of do this one, this primary cloud, secondary cloud. And I guarantee you're going to have a third cloud or a fourth cloud whether you want to or not via shadow IT, latency, technical reasons, et cetera. >> Thank you. Sanjeev, you had a comment? >> Yeah, so I just wanted to mention, as an organization, I'm complete agreement, no organization is monocloud, at least if it's a large organization. Large organizations use all kinds of combinations of cloud providers. But when you talk about a single workload, that's where the program arises. As Keith said, the 238 services in AWS. How in the world am I going to be an expert in AWS, but then say let me bring GCP or Azure into a single workload? And that's where I think we probably will still see monocloud as being predominant because the team has developed its expertise on a particular cloud provider, and they just don't have the time of the day to go learn yet another stack. However, there are some interesting things that are happening. For example, if you look at a multi-cloud example where Oracle and Microsoft Azure have that interconnect, so that's a beautiful thing that they've done because now in the newest iteration, it's literally a few clicks. And then behind the scene, your .NET application and your Oracle database in OCI will be configured, the identities in active directory are federated. And you can just start using a database in one cloud, which is OCI, and an application, your .NET in Azure. So till we see this kind of a solution coming out of the providers, I think it's is unrealistic to expect the end users to be able to figure out multiple clouds. >> Well, I have to share with you. I can't remember if he said this on camera or if it was off camera so I'll hold off. I won't tell you who it is, but this individual was sort of complaining a little bit saying, "With AWS, I can take their best AI tools like SageMaker and I can run them on my Snowflake." He said, "I can't do that in Google. Google forces me to go to BigQuery if I want their excellent AI tools." So he was sort of pushing, kind of tweaking a little bit. Some of the vendor talked that, "Oh yeah, we're so customer-focused." Not to pick on Google, but I mean everybody will say that. And then you say, "If you're so customer-focused, why wouldn't you do a ABC?" So it's going to be interesting to see who leads that integration and how broadly it's applied. But I digress. Keith, at our first supercloud event, that was on August 9th. And it was only a few months after Broadcom announced the VMware acquisition. A lot of people, myself included said, "All right, cuts are coming." Generally, Tanzu is probably going to be under the radar, but it's Supercloud 22 and presumably VMware Explore, the company really... Well, certainly the US touted its Tanzu capabilities. I wasn't at VMware Explore Europe, but I bet you heard similar things. Hawk Tan has been blogging and very vocal about cross-cloud services and multi-cloud, which doesn't happen without Tanzu. So what did you hear, Keith, in Europe? What's your latest thinking on VMware's prospects in cross-cloud services/supercloud? >> So I think our friend and Cube, along host still be even more offended at this statement than he was when I sat in the Cube. This was maybe five years ago. There's no company better suited to help industries or companies, cross-cloud chasm than VMware. That's not a compliment. That's a reality of the industry. This is a very difficult, almost intractable problem. What I heard that VMware Europe were customers serious about this problem, even more so than the US data sovereignty is a real problem in the EU. Try being a company in Switzerland and having the Swiss data solvency issues. And there's no local cloud presence there large enough to accommodate your data needs. They had very serious questions about this. I talked to open source project leaders. Open source project leaders were asking me, why should I use the public cloud to host Kubernetes-based workloads, my projects that are building around Kubernetes, and the CNCF infrastructure? Why should I use AWS, Google, or even Azure to host these projects when that's undifferentiated? I know how to run Kubernetes, so why not run it on-premises? I don't want to deal with the hardware problems. So again, really great questions. And then there was always the specter of the problem, I think, we all had with the acquisition of VMware by Broadcom potentially. 4.5 billion in increased profitability in three years is a unbelievable amount of money when you look at the size of the problem. So a lot of the conversation in Europe was about industry at large. How do we do what regulators are asking us to do in a practical way from a true technology sense? Is VMware cross-cloud great? >> Yeah. So, VMware, obviously, to your point. OpenStack is another way of it. Actually, OpenStack, uptake is still alive and well, especially in those regions where there may not be a public cloud, or there's public policy dictating that. Walmart's using OpenStack. As you know in IT, some things never die. Question for Sanjeev. And it relates to this new breed of data apps. And Bob Muglia and Tristan Handy from DBT Labs who are participating in this program really got us thinking about this. You got data that resides in different clouds, it maybe even on-prem. And the machine polls data from different systems. No humans involved, e-commerce, ERP, et cetera. It creates a plan, outcomes. No human involvement. Today, you're on a CRM system, you're inputting, you're doing forms, you're, you're automating processes. We're talking about a new breed of apps. What are your thoughts on this? Is it real? Is it just way off in the distance? How does machine intelligence fit in? And how does supercloud fit? >> So great point. In fact, the data apps that you're talking about, I call them data products. Data products first came into limelight in the last couple of years when Jamal Duggan started talking about data mesh. I am taking data products out of the data mesh concept because data mesh, whether data mesh happens or not is analogous to data products. Data products, basically, are taking a product management view of bringing data from different sources based on what the consumer needs. We were talking earlier today about maybe it's my vacation rentals, or it may be a retail data product, it may be an investment data product. So it's a pre-packaged extraction of data from different sources. But now I have a product that has a whole lifecycle. I can version it. I have new features that get added. And it's a very business data consumer centric. It uses machine learning. For instance, I may be able to tell whether this data product has stale data. Who is using that data? Based on the usage of the data, I may have a new data products that get allocated. I may even have the ability to take existing data products, mash them up into something that I need. So if I'm going to have that kind of power to create a data product, then having a common substrate underneath, it can be very useful. And that could be supercloud where I am making API calls. I don't care where the ERP, the CRM, the survey data, the pricing engine where they sit. For me, there's a logical abstraction. And then I'm building my data product on top of that. So I see a new breed of data products coming out. To answer your question, how early we are or is this even possible? My prediction is that in 2023, we will start seeing more of data products. And then it'll take maybe two to three years for data products to become mainstream. But it's starting this year. >> A subprime mortgages were a data product, definitely were humans involved. All right, let's talk about some of the supercloud, multi-cloud players and what their future looks like. You can kind of pick your favorites. VMware, Snowflake, Databricks, Red Hat, Cisco, Dell, HP, Hashi, IBM, CloudFlare. There's many others. cohesive rubric. Keith, I wanted to start with CloudFlare because they actually use the term supercloud. and just simplifying what they said. They look at it as taking serverless to the max. You write your code and then you can deploy it in seconds worldwide, of course, across the CloudFlare infrastructure. You don't have to spin up containers, you don't go to provision instances. CloudFlare worries about all that infrastructure. What are your thoughts on CloudFlare this approach and their chances to disrupt the current cloud landscape? >> As Larry Ellison said famously once before, the network is the computer, right? I thought that was Scott McNeley. >> It wasn't Scott McNeley. I knew it was on Oracle Align. >> Oracle owns that now, owns that line. >> By purpose or acquisition. >> They should have just called it cloud. >> Yeah, they should have just called it cloud. >> Easier. >> Get ahead. >> But if you think about the CloudFlare capability, CloudFlare in its own right is becoming a decent sized cloud provider. If you have compute out at the edge, when we talk about edge in the sense of CloudFlare and points of presence, literally across the globe, you have all of this excess computer, what do you do with it? First offering, let's disrupt data in the cloud. We can't start the conversation talking about data. When they say we're going to give you object-oriented or object storage in the cloud without egress charges, that's disruptive. That we can start to think about supercloud capability of having compute EC2 run in AWS, pushing and pulling data from CloudFlare. And now, I've disrupted this roach motel data structure, and that I'm freely giving away bandwidth, basically. Well, the next layer is not that much more difficult. And I think part of CloudFlare's serverless approach or supercloud approaches so that they don't have to commit to a certain type of compute. It is advantageous. It is a feature for me to be able to go to EC2 and pick a memory heavy model, or a compute heavy model, or a network heavy model, CloudFlare is taken away those knobs. and I'm just giving code and allowing that to run. CloudFlare has a massive network. If I can put the code closest using the CloudFlare workers, if I can put that code closest to where the data is at or residing, super compelling observation. The question is, does it scale? I don't get the 238 services. While Server List is great, I have to know what I'm going to build. I don't have a Cognito, or RDS, or all these other services that make AWS, GCP, and Azure appealing from a builder's perspective. So it is a very interesting nascent start. It's great because now they can hide compute. If they don't have the capacity, they can outsource that maybe at a cost to one of the other cloud providers, but kind of hiding the compute behind the surplus architecture is a really unique approach. >> Yeah. And they're dipping their toe in the water. And they've announced an object store and a database platform and more to come. We got to wrap. So I wonder, Sanjeev and Maribel, if you could maybe pick some of your favorites from a competitive standpoint. Sanjeev, I felt like just watching Snowflake, I said, okay, in my opinion, they had the right strategy, which was to run on all the clouds, and then try to create that abstraction layer and data sharing across clouds. Even though, let's face it, most of it might be happening across regions if it's happening, but certainly outside of an individual account. But I felt like just observing them that anybody who's traditional on-prem player moving into the clouds or anybody who's a cloud native, it just makes total sense to write to the various clouds. And to the extent that you can simplify that for users, it seems to be a logical strategy. Maybe as I said before, what multi-cloud should have been. But are there companies that you're watching that you think are ahead in the game , or ones that you think are a good model for the future? >> Yes, Snowflake, definitely. In fact, one of the things we have not touched upon very much, and Keith mentioned a little bit, was data sovereignty. Data residency rules can require that certain data should be written into certain region of a certain cloud. And if my cloud provider can abstract that or my database provider, then that's perfect for me. So right now, I see Snowflake is way ahead of this pack. I would not put MongoDB too far behind. They don't really talk about this thing. They are in a different space, but now they have a lakehouse, and they've got all of these other SQL access and new capabilities that they're announcing. So I think they would be quite good with that. Oracle is always a dark forest. Oracle seems to have revived its Cloud Mojo to some extent. And it's doing some interesting stuff. Databricks is the other one. I have not seen Databricks. They've been very focused on lakehouse, unity, data catalog, and some of those pieces. But they would be the obvious challenger. And if they come into this space of supercloud, then they may bring some open source technologies that others can rely on like Delta Lake as a table format. >> Yeah. One of these infrastructure players, Dell, HPE, Cisco, even IBM. I mean, I would be making my infrastructure as programmable and cloud friendly as possible. That seems like table stakes. But Maribel, any companies that stand out to you that we should be paying attention to? >> Well, we already mentioned a bunch of them, so maybe I'll go a slightly different route. I'm watching two companies pretty closely to see what kind of traction they get in their established companies. One we already talked about, which is VMware. And the thing that's interesting about VMware is they're everywhere. And they also have the benefit of having a foot in both camps. If you want to do it the old way, the way you've always done it with VMware, they got all that going on. If you want to try to do a more cross-cloud, multi-cloud native style thing, they're really trying to build tools for that. So I think they have really good access to buyers. And that's one of the reasons why I'm interested in them to see how they progress. The other thing, I think, could be a sleeping horse oddly enough is Google Cloud. They've spent a lot of work and time on Anthos. They really need to create a certain set of differentiators. Well, it's not necessarily in their best interest to be the best multi-cloud player. If they decide that they want to differentiate on a different layer of the stack, let's say they want to be like the person that is really transformative, they talk about transformation cloud with analytics workloads, then maybe they do spend a good deal of time trying to help people abstract all of the other underlying infrastructure and make sure that they get the sexiest, most meaningful workloads into their cloud. So those are two people that you might not have expected me to go with, but I think it's interesting to see not just on the things that might be considered, either startups or more established independent companies, but how some of the traditional providers are trying to reinvent themselves as well. >> I'm glad you brought that up because if you think about what Google's done with Kubernetes. I mean, would Google even be relevant in the cloud without Kubernetes? I could argue both sides of that. But it was quite a gift to the industry. And there's a motivation there to do something unique and different from maybe the other cloud providers. And I'd throw in Red Hat as well. They're obviously a key player and Kubernetes. And Hashi Corp seems to be becoming the standard for application deployment, and terraform, or cross-clouds, and there are many, many others. I know we're leaving lots out, but we're out of time. Folks, I got to thank you so much for your insights and your participation in Supercloud2. Really appreciate it. >> Thank you. >> Thank you. >> Thank you. >> This is Dave Vellante for John Furrier and the entire Cube community. Keep it right there for more content from Supercloud2.

Published Date : Jan 10 2023

SUMMARY :

And Keith Townsend is the CTO advisor. And he said, "Dave, I like the work, So that might be one of the that's kind of the way the that we can have a Is that something that you think Snowflake that are starting to do it. and the resiliency of their and on the other hand we want it But I reached out to the ETR, guys, And they get to this point Yeah. that to me it's a rounding So the first thing that we see is to Supercloud2 have told us Is anybody really monocloud? and that they try to optimize. And that primary cloud may be the AWS. Sanjeev, you had a comment? of a solution coming out of the providers, So it's going to be interesting So a lot of the conversation And it relates to this So if I'm going to have that kind of power and their chances to disrupt the network is the computer, right? I knew it was on Oracle Align. Oracle owns that now, Yeah, they should have so that they don't have to commit And to the extent that you And if my cloud provider can abstract that that stand out to you And that's one of the reasons Folks, I got to thank you and the entire Cube community.

ENTITIES

Entity	Category	Confidence
Keith	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jamal Duggan	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Maribel	PERSON	0.99+
Bob Muglia	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Keith Townsend	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Brian Gracely	PERSON	0.99+
Bob	PERSON	0.99+
HP	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Equinix	ORGANIZATION	0.99+
QTX	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
Maribel Lopez	PERSON	0.99+
August 9th	DATE	0.99+
Dave	PERSON	0.99+
Gracely	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Walmarts	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Hashi	ORGANIZATION	0.99+
GigaHome	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
2023	DATE	0.99+
Hawk Tan	PERSON	0.99+
Google	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
two things	QUANTITY	0.99+
Broadcom	ORGANIZATION	0.99+
Switzerland	LOCATION	0.99+
Snowflake	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
two	QUANTITY	0.99+
238 services	QUANTITY	0.99+
two people	QUANTITY	0.99+
2016	DATE	0.99+
Gartner	ORGANIZATION	0.99+
tens of millions	QUANTITY	0.99+
three years	QUANTITY	0.99+
DBT Labs	ORGANIZATION	0.99+
fourth cloud	QUANTITY	0.99+

Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

ENTITIES

Entity	Category	Confidence
Jess Borgman	PERSON	0.99+
Richard	PERSON	0.99+
20 cents	QUANTITY	0.99+
six	QUANTITY	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
Jess	PERSON	0.99+
pythons	TITLE	0.99+
seven years	QUANTITY	0.99+
Today	DATE	0.99+
Javas	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
EVAs	ORGANIZATION	0.98+
JAK	PERSON	0.98+
Starburst	ORGANIZATION	0.98+
both	QUANTITY	0.97+
10	DATE	0.97+
12 years ago	DATE	0.97+
Starbust	TITLE	0.96+
today	DATE	0.95+
Apache iceberg	ORGANIZATION	0.94+
Google	ORGANIZATION	0.93+
12 years	QUANTITY	0.92+
single point	QUANTITY	0.92+
two worlds	QUANTITY	0.92+
10	QUANTITY	0.91+
Hudu	LOCATION	0.91+
Unix	TITLE	0.9+
one thing	QUANTITY	0.87+
trillions of records	QUANTITY	0.83+
first data lake	QUANTITY	0.82+
Starburst	TITLE	0.8+
PJI	ORGANIZATION	0.79+
years ago	DATE	0.76+
IE	TITLE	0.75+
Lie 2	TITLE	0.72+
many years ago	DATE	0.72+
over a couple times	QUANTITY	0.7+
TCO	ORGANIZATION	0.7+
Parque	ORGANIZATION	0.67+
Number two	QUANTITY	0.64+
Kubernetes	ORGANIZATION	0.59+
a decade	QUANTITY	0.58+
plus years	DATE	0.57+
Azure	TITLE	0.57+
S3	TITLE	0.55+
Delta	TITLE	0.54+
20	QUANTITY	0.49+
last	DATE	0.48+
Mohan	PERSON	0.44+
ORC	ORGANIZATION	0.27+

Starburst The Data Lies FULL V2b

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst The Data Lies FULL V1

Published Date : Aug 20 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Breaking Analysis Further defining Supercloud W/ tech leaders VMware, Snowflake, Databricks & others

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante at our inaugural super cloud 22 event we further refined the concept of a super cloud iterating on the definition the salient attributes and some examples of what is and what is not a super cloud welcome to this week's wikibon cube insights powered by etr you know snowflake has always been what we feel is one of the strongest examples of a super cloud and in this breaking analysis from our studios in palo alto we unpack our interview with benoit de javille co-founder and president of products at snowflake and we test our super cloud definition on the company's data cloud platform and we're really looking forward to your feedback first let's examine how we defl find super cloudant very importantly one of the goals of super cloud 22 was to get the community's input on the definition and iterate on previous work super cloud is an emerging computing architecture that comprises a set of services which are abstracted from the underlying primitives of hyperscale clouds we're talking about services such as compute storage networking security and other native tooling like machine learning and developer tools to create a global system that spans more than one cloud super cloud as shown on this slide has five essential properties x number of deployment models and y number of service models we're looking for community input on x and y and on the first point as well so please weigh in and contribute now we've identified these five essential elements of a super cloud let's talk about these first the super cloud has to run its services on more than one cloud leveraging the cloud native tools offered by each of the cloud providers the builder of the super cloud platform is responsible for optimizing the underlying primitives of each cloud and optimizing for the specific needs be it cost or performance or latency or governance data sharing security etc but those primitives must be abstracted such that a common experience is delivered across the clouds for both users and developers the super cloud has a metadata intelligence layer that can maximize efficiency for the specific purpose of the super cloud i.e the purpose that the super cloud is intended for and it does so in a federated model and it includes what we call a super pass this is a prerequisite that is a purpose-built component and enables ecosystem partners to customize and monetize incremental services while at the same time ensuring that the common experiences exist across clouds now in terms of deployment models we'd really like to get more feedback on this piece but here's where we are so far based on the feedback we got at super cloud 22. we see three deployment models the first is one where a control plane may run on one cloud but supports data plane interactions with more than one other cloud the second model instantiates the super cloud services on each individual cloud and within regions and can support interactions across more than one cloud with a unified interface connecting those instantiations those instances to create a common experience and the third model superimposes its services as a layer or in the case of snowflake they call it a mesh on top of the cloud on top of the cloud providers region or regions with a single global instantiation a single global instantiation of those services which spans multiple cloud providers this is our understanding from a comfort the conversation with benoit dejaville as to how snowflake approaches its solutions and for now we're going to park the service models we need to more time to flesh that out and we'll propose something shortly for you to comment on now we peppered benoit dejaville at super cloud 22 to test how the snowflake data cloud aligns to our concepts and our definition let me also say that snowflake doesn't use the term data cloud they really want to respect and they want to denigrate the importance of their hyperscale partners nor do we but we do think the hyperscalers today anyway are building or not building what we call super clouds but they are but but people who bar are building super clouds are building on top of hyperscale clouds that is a prerequisite so here are the questions that we tested with snowflake first question how does snowflake architect its data cloud and what is its deployment model listen to deja ville talk about how snowflake has architected a single system play the clip there are several ways to do this you know uh super cloud as as you name them the way we we we picked is is to create you know one single system and that's very important right the the the um [Music] there are several ways right you can instantiate you know your solution uh in every region of a cloud and and you know potentially that region could be a ws that region could be gcp so you are indeed a multi-cloud solution but snowflake we did it differently we are really creating cloud regions which are superposed on top of the cloud provider you know region infrastructure region so we are building our regions but but where where it's very different is that each region of snowflake is not one in instantiation of our service our service is global by nature we can move data from one region to the other when you land in snowflake you land into one region but but you can grow from there and you can you know exist in multiple clouds at the same time and that's very important right it's not one single i mean different instantiation of a system is one single instantiation which covers many cloud regions and many cloud providers snowflake chose the most advanced level of our three deployment models dodgeville talked about too presumably so it could maintain maximum control and ensure that common experience like the iphone model next we probed about the technical enablers of the data cloud listen to deja ville talk about snow grid he uses the term mesh and then this can get confusing with the jamaicani's data mesh concept but listen to benoit's explanation well as i said you know first we start by building you know snowflake regions we have today furry region that spawn you know the world so it's a worldwide worldwide system with many regions but all these regions are connected together they are you know meshed together with our technology we name it snow grid and that makes it hard because you know regions you know azure region can talk to a ws region or gcp regions and and as a as a user of our cloud you you don't see really these regional differences that you know regions are in different you know potentially clown when you use snowflake you can exist your your presence as an organization can be in several regions several clouds if you want geographic and and and both geographic and cloud provider so i can share data irrespective of the the cloud and i'm in the snowflake data cloud is that correct i can do that today exactly and and that's very critical right what we wanted is to remove data silos and and when you instantiate a system in one single region and that system is locked in that region you cannot communicate with other parts of the world you are locking the data in one region right and we didn't want to do that we wanted you know data to be distributed the way customer wants it to be distributed across the world and potentially sharing data at world scale now maybe there are many ways to skin the other cat meaning perhaps if a platform does instantiate in multiple places there are ways to share data but this is how snowflake chose to approach the problem next question how do you deal with latency in this big global system this is really important to us because while snowflake has some really smart people working as engineers and and the like we don't think they've solved for the speed of light problem the best people working on it as we often joke listen to benoit deja ville's comments on this topic so yes and no the the way we do it it's very expensive to do that because generally if you want to join you know data which is in which are in different regions and different cloud it's going to be very expensive because you need to move you know data every time you join it so the way we do it is that you replicate the subset of data that you want to access from one region from other regions so you can create this data mesh but data is replicated to make it very cheap and very performant too and is the snow grid does that have the metadata intelligence yes to actually can you describe that a little bit yeah snow grid is both uh a way to to exchange you know metadata about so each region of snowflake knows about all the other regions of snowflake every time we create a new region diary you know the metadata is distributed over our data cloud not only you know region knows all the regions but knows you know every organization that exists in our clouds where this organization is where data can be replicated by this organization and then of course it's it's also used as a way to uh uh exchange data right so you can exchange you know beta by scale of data size and we just had i was just receiving an email from one of our customers who moved more than four petabytes of data cross-region cross you know cloud providers in you know few days and you know it's a lot of data so it takes you know some time to move but they were able to do that online completely online and and switch over you know to the diff to the other region which is failover is very important also so yes and no probably means typically no he says yes and no probably means no so it sounds like snowflake is selectively pulling small amounts of data and replicating it where necessary but you also heard him talk about the metadata layer which is one of the essential aspects of super cloud okay next we dug into security it's one of the most important issues and we think one of the hardest parts related to deploying super cloud so we've talked about how the cloud has become the first line of defense for the cso but now with multi-cloud you have multiple first lines of defense and that means multiple shared responsibility models and multiple tool sets from different cloud providers and an expanded threat surface so listen to benoit's explanation here please play the clip this is a great question uh security has always been the most important aspect of snowflake since day one right this is the question that every customer of ours has you know how you can you guarantee the security of my data and so we secure data really tightly in region we have several layers of security it starts by by encrypting it every data at rest and that's very important a lot of customers are not doing that right you hear these attacks for example on on cloud you know where someone left you know their buckets uh uh open and then you know you can access the data because it's a non-encrypted uh so we are encrypting everything at rest we are encrypting everything in transit so a region is very secure now you know you never from one region you never access data from another region in snowflake that's why also we replicate data now the replication of that data across region or the metadata for that matter is is really highly secure so snow grits ensure that everything is encrypted everything is you know we have multiple you know encryption keys and it's you know stored in hardware you know secure modules so we we we built you know snow grids such that it's secure and it allows very secure movement of data so when we heard this explanation we immediately went to the lowest common denominator question meaning when you think about how aws for instance deals with data in motion or data and rest it might be different from how another cloud provider deals with it so how does aws uh uh uh differences for example in the aws maturity model for various you know cloud capabilities you know let's say they've got a faster nitro or graviton does it do do you have to how does snowflake deal with that do they have to slow everything else down like imagine a caravan cruising you know across the desert so you know every truck can keep up let's listen it's a great question i mean of course our software is abstracting you know all the cloud providers you know infrastructure so that when you run in one region let's say aws or azure it doesn't make any difference as far as the applications are concerned and and this abstraction of course is a lot of work i mean really really a lot of work because it needs to be secure it needs to be performance and you know every cloud and it has you know to expose apis which are uniform and and you know cloud providers even though they have potentially the same concept let's say blob storage apis are completely different the way you know these systems are secure it's completely different the errors that you can get and and the retry you know mechanism is very different from one cloud to the other performance is also different we discovered that when we were starting to port our software and and and you know we had to completely rethink how to leverage blob storage in that cloud versus that cloud because just of performance too so we had you know for example to you know stripe data so all this work is work that's you know you don't need as an application because our vision really is that applications which are running in our data cloud can you know be abstracted of all this difference and and we provide all the services all the workload that this application need whether it's transactional access to data analytical access to data you know managing you know logs managing you know metrics all of these is abstracted too such that they are not you know tied to one you know particular service of one cloud and and distributing this application across you know many regions many cloud is very seamless so from that answer we know that snowflake takes care of everything but we really don't understand the performance implications in you know in that specific case but we feel pretty certain that the promises that snowflake makes around governance and security within their data sharing construct construct will be kept now another criterion that we've proposed for super cloud is a super pass layer to create a common developer experience and an enabler for ecosystem partners to monetize please play the clip let's listen we build it you know a custom build because because as you said you know what exists in one cloud might not exist in another cloud provider right so so we have to build you know on this all these this components that modern application mode and that application need and and and and that you know goes to machine learning as i say transactional uh analytical system and the entire thing so such that they can run in isolation basically and the objective is the developer experience will be identical across those clouds yes right the developers doesn't need to worry about cloud provider and actually our system we have we didn't talk about it but the marketplace that we have which allows actually to deliver we're getting there yeah okay now we're not going to go deep into ecosystem today we've talked about snowflakes strengths in this regard but snowflake they pretty much ticked all the boxes on our super cloud attributes and definition we asked benoit dejaville to confirm that this is all shipping and available today and he also gave us a glimpse of the future play the clip and we are still developing it you know the transactional you know unistore as we call it was announced in last summit so so they are still you know working properly but but but that's the vision right and and and that's important because we talk about the infrastructure right you mentioned a lot about storage and compute but it's not only that right when you think about application they need to use the transactional database they need to use an analytical system they need to use you know machine learning so you need to provide also all these services which are consistent across all the cloud providers so you can hear deja ville talking about expanding beyond taking advantage of the core infrastructure storage and networking et cetera and bringing intelligence to the data through machine learning and ai so of course there's more to come and there better be at this company's valuation despite the recent sharp pullback in a tightening fed environment okay so i know it's cliche but everyone's comparing snowflakes and data bricks databricks has been pretty vocal about its open source posture compared to snowflakes and it just so happens that we had aligotsy on at super cloud 22 as well he wasn't in studio he had to do remote because i guess he's presenting at an investor conference this week so we had to bring him in remotely now i didn't get to do this interview john furrier did but i listened to it and captured this clip about how data bricks sees super cloud and the importance of open source take a listen to goatzee yeah i mean let me start by saying we just we're big fans of open source we think that open source is a force in software that's going to continue for you know decades hundreds of years and it's going to slowly replace all proprietary code in its way we saw that you know it could do that with the most advanced technology windows you know proprietary operating system very complicated got replaced with linux so open source can pretty much do anything and what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse you know data lake machine learning real-time stack in open source and we're excited to be part of it for us delta lake is a very important project that really helps you standardize how you lay out your data in the cloud and with it comes a really important protocol called delta sharing that enables you in an open way actually for the first time ever share large data sets between organizations but it uses an open protocol so the great thing about that is you don't need to be a database customer you don't even like databricks you just need to use this open source project and you can now securely share data sets between organizations across clouds and it actually does so really efficiently just one copy of the data so you don't have to copy it if you're within the same cloud so the implication of ellie gotzi's comments is that databricks with delta sharing as john implied is playing a long game now i don't know if enough about the databricks architecture to comment in detail i got to do more research there so i reached out to my two analyst friends tony bear and sanji mohan to see what they thought because they cover these companies pretty closely here's what tony bear said quote i've viewed the divergent lake house strategies of data bricks and snowflake in the context of their roots prior to delta lake databrick's prime focus was the compute not the storage layer and more specifically they were a compute engine not a database snowflake approached from the opposite end of the pool as they originally fit the mold of the classic database company rather than a specific compute engine per se the lake house pushes both companies outside of their original comfort zones data bricks to storage snowflake to compute engine so it makes perfect sense for databricks to embrace the open source narrative at the storage layer and for snowflake to continue its walled garden approach but in the long run their strategies are already overlapping databricks is not a 100 open source company its practitioner experience has always been proprietary and now so is its sql query engine likewise snowflake has had to open up with the support of iceberg for open data lake format the question really becomes how serious snowflake will be in making iceberg a first-class citizen in its environment that is not necessarily officially branding a lake house but effectively is and likewise can databricks deliver the service levels associated with walled gardens through a more brute force approach that relies heavily on the query engine at the end of the day those are the key requirements that will matter to data bricks and snowflake customers end quote that was some deep thought by by tony thank you for that sanjay mohan added the following quote open source is a slippery slope people buy mobile phones based on open source android but it's not fully open similarly databricks delta lake was not originally fully open source and even today its photon execution engine is not we are always going to live in a hybrid world snowflake and databricks will support whatever model works best for them and their customers the big question is do customers care as deeply about which vendor has a higher degree of openness as we technology people do i believe customers evaluation criteria is far more nuanced than just to decipher each vendor's open source claims end quote okay so i had to ask dodgeville about their so-called wall garden approach and what their strategy is with apache iceberg here's what he said iceberg is is very important so just to to give some context iceberg is an open you know table format right which was you know first you know developed by netflix and netflix you know put it open source in the apache community so we embrace that's that open source standard because because it's widely used by by many um many you know companies and also many companies have you know really invested a lot of effort in building you know big data hadoop solution or data like solution and they want to use snowflake and they couldn't really use snowflake because all their data were in open you know formats so we are embracing icebergs to help these companies move through the cloud but why we have been relentless with direct access to data direct access to data is a little bit of a problem for us and and the reason is when you direct access to data now you have direct access to storage now you have to understand for example the specificity of one cloud versus the other so as soon as you start to have direct access to data you lose your you know your cloud diagnostic layer you don't access data with api when you have direct access to data it's very hard to secure data because you need to grant access direct access to tools which are not you know protected and you see a lot of you know hacking of of data you know because of that so so that was not you know direct access to data is not serving well our customers and that's why we have been relented to do that because it's it's cr it's it's not cloud diagnostic it's it's you you have to code that you have to you you you need a lot of intelligence while apis access so we want open apis that's that's i guess the way we embrace you know openness is is by open api versus you know you access directly data here's my take snowflake is hedging its bets because enough people care about open source that they have to have some open data format options and it's good optics and you heard benoit deja ville talk about the risks of directly accessing the data and the complexities it brings now is that maybe a little fud against databricks maybe but same can be said for ollie's comments maybe flooding the proprietaryness of snowflake but as both analysts pointed out open is a spectrum hey i remember unix used to equal open systems okay let's end with some etr spending data and why not compare snowflake and data bricks spending profiles this is an xy graph with net score or spending momentum on the y-axis and pervasiveness or overlap in the data set on the x-axis this is data from the january survey when snowflake was holding above 80 percent net score off the charts databricks was also very strong in the upper 60s now let's fast forward to this next chart and show you the july etr survey data and you can see snowflake has come back down to earth now remember anything above 40 net score is highly elevated so both companies are doing well but snowflake is well off its highs and data bricks has come down somewhat as well databricks is inching to the right snowflake rocketed to the right post its ipo and as we know databricks wasn't able to get to ipo during the covet bubble ali gotzi is at the morgan stanley ceo conference this week they got plenty of cash to withstand a long-term recession i'm told and they've started the message that they're a billion dollars in annualized revenue i'm not sure exactly what that means i've seen some numbers on their gross margins i'm not sure what that means i've seen some numbers on their net retention revenue or net revenue retention again i'll reserve judgment until we see an s1 but it's clear both of these companies have momentum and they're out competing in the market well as always be the ultimate arbiter different philosophies perhaps is it like democrats and republicans well it could be but they're both going after a solving data problem both companies are trying to help customers get more value out of their data and both companies are highly valued so they have to perform for their investors to paraphrase ralph nader the similarities may be greater than the differences okay that's it for today thanks to the team from palo alto for this awesome super cloud studio build alex myerson and ken shiffman are on production in the palo alto studios today kristin martin and sheryl knight get the word out to our community rob hoff is our editor-in-chief over at siliconangle thanks to all please check out etr.ai for all the survey data remember these episodes are all available as podcasts wherever you listen just search breaking analysis podcasts i publish each week on wikibon.com and siliconangle.com and you can email me at david.vellante at siliconangle.com or dm me at devellante or comment on my linkedin posts and please as i say etr has got some of the best survey data in the business we track it every quarter and really excited to be partners with them this is dave vellante for the cube insights powered by etr thanks for watching and we'll see you next time on breaking analysis [Music] you

Published Date : Aug 14 2022

SUMMARY :

and and the retry you know mechanism is

ENTITIES

Entity	Category	Confidence
netflix	ORGANIZATION	0.99+
john furrier	PERSON	0.99+
palo alto	ORGANIZATION	0.99+
tony bear	PERSON	0.99+
boston	LOCATION	0.99+
sanji mohan	PERSON	0.99+
ken shiffman	PERSON	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
ellie gotzi	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
siliconangle.com	OTHER	0.99+
more than four petabytes	QUANTITY	0.99+
first point	QUANTITY	0.99+
kristin martin	PERSON	0.99+
both companies	QUANTITY	0.99+
first question	QUANTITY	0.99+
rob hoff	PERSON	0.99+
more than one	QUANTITY	0.99+
second model	QUANTITY	0.98+
alex myerson	PERSON	0.98+
third model	QUANTITY	0.98+
one region	QUANTITY	0.98+
one copy	QUANTITY	0.98+
one region	QUANTITY	0.98+
five essential elements	QUANTITY	0.98+
android	TITLE	0.98+
100	QUANTITY	0.98+
first line	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
sheryl	PERSON	0.98+
more than one cloud	QUANTITY	0.98+
first	QUANTITY	0.98+
iphone	COMMERCIAL_ITEM	0.98+
super cloud 22	EVENT	0.98+
each cloud	QUANTITY	0.98+
each	QUANTITY	0.97+
sanjay mohan	PERSON	0.97+
john	PERSON	0.97+
republicans	ORGANIZATION	0.97+
this week	DATE	0.97+
hundreds of years	QUANTITY	0.97+
siliconangle	ORGANIZATION	0.97+
each week	QUANTITY	0.97+
data lake house	ORGANIZATION	0.97+
one single region	QUANTITY	0.97+
january	DATE	0.97+
dave vellante	PERSON	0.96+
each region	QUANTITY	0.96+
one	QUANTITY	0.96+
dave vellante	PERSON	0.96+
tony	PERSON	0.96+
above 80 percent	QUANTITY	0.95+
more than one cloud	QUANTITY	0.95+
more than one cloud	QUANTITY	0.95+
data lake	ORGANIZATION	0.95+
five essential properties	QUANTITY	0.95+
democrats	ORGANIZATION	0.95+
first time	QUANTITY	0.95+
july	DATE	0.94+
linux	TITLE	0.94+
etr	ORGANIZATION	0.94+
devellante	ORGANIZATION	0.93+
dodgeville	ORGANIZATION	0.93+
each vendor	QUANTITY	0.93+
super cloud 22	ORGANIZATION	0.93+
delta lake	ORGANIZATION	0.92+
three deployment models	QUANTITY	0.92+
first lines	QUANTITY	0.92+
dejaville	LOCATION	0.92+
day one	QUANTITY	0.92+

Breaking Analysis: What we hope to learn at Supercloud22

>> From theCUBE studios in Palo Alto in Boston bringing you data driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante. >> The term Supercloud is somewhat new, but the concepts behind it have been bubbling for years, early last decade when NIST put forth a definition of cloud computing it said services had to be accessible over a public network essentially cutting the on-prem crowd out of the cloud conversation. Now a guy named Chuck Hollis, who was a field CTO at EMC at the time and a prolific blogger objected to that criterion and laid out his vision for what he termed a private cloud. Now, in that post, he showed a workload running both on premises and in a public cloud sharing the underlying resources in an automated and seamless manner. What later became known more broadly as hybrid cloud that vision as we now know, really never materialized, and we were left with multi-cloud sets of largely incompatible and disconnected cloud services running in separate silos. The point is what Hollis laid out, IE the ability to abstract underlying infrastructure complexity and run workloads across multiple heterogeneous estates with an identical experience is what super cloud is all about. Hello and welcome to this week's Wikibon cube insights powered by ETR and this breaking analysis. We share what we hope to learn from super cloud 22 next week, next Tuesday at 9:00 AM Pacific. The community is gathering for Supercloud 22 an inclusive pilot symposium hosted by theCUBE and made possible by VMware and other founding partners. It's a one day single track event with more than 25 speakers digging into the architectural, the technical, structural and business aspects of Supercloud. This is a hybrid event with a live program in the morning running out of our Palo Alto studio and pre-recorded content in the afternoon featuring industry leaders, technologists, analysts and investors up and down the technology stack. Now, as I said up front the seeds of super cloud were sewn early last decade. After the very first reinvent we published our Amazon gorilla post, that scene in the upper right corner here. And we talked about how to differentiate from Amazon and form ecosystems around industries and data and how the cloud would change IT permanently. And then up in the upper left we put up a post on the old Wikibon Wiki. Yeah, it used to be a Wiki. Check out my hair by the way way no gray, that's how long ago this was. And we talked about in that post how to compete in the Amazon economy. And we showed a graph of how IT economics were changing. And cloud services had marginal economics that looked more like software than hardware at scale. And this would reset, we said opportunities for both technology sellers and buyers for the next 20 years. And this came into sharper focus in the ensuing years culminating in a milestone post by Greylock's Jerry Chen called Castles in the Cloud. It was an inspiration and catalyst for us using the term Supercloud in John Furrier's post prior to reinvent 2021. So we started to flesh out this idea of Supercloud where companies of all types build services on top of hyperscale infrastructure and across multiple clouds, going beyond multicloud 1.0, if you will, which was really a symptom, as we said, many times of multi-vendor at least that's what we argued. And despite its fuzzy definition, it resonated with people because they knew something was brewing, Keith Townsend the CTO advisor, even though he frankly, wasn't a big fan of the buzzy nature of the term Supercloud posted this awesome Blackboard on Twitter take a listen to how he framed it. Please play the clip. >> Is VMware the right company to make the super cloud work, term that Wikibon came up with to describe the taking of discreet services. So it says RDS from AWS, cloud compute engines from GCP and authentication from Azure to build SaaS applications or enterprise applications that connect back to your data center, is VMware's cross cloud vision 'cause it is just a vision today, the right approach. Or should you be looking towards companies like HashiCorp to provide this overall capability that we all agree, or maybe you don't that we need in an enterprise comment below your thoughts. >> So I really like that Keith has deep practitioner knowledge and lays out a couple of options. I especially like the examples he uses of cloud services. He recognizes the need for cross cloud services and he notes this capability is aspirational today. Remember this was eight or nine months ago and he brings HashiCorp into the conversation as they're one of the speakers at Supercloud 22 and he asks the community, what they think, the thing is we're trying to really test out this concept and people like Keith are instrumental as collaborators. Now I'm sure you're not surprised to hear that mot everyone is on board with the Supercloud meme, in particular Charles Fitzgerald has been a wonderful collaborator just by his hilarious criticisms of the concept. After a couple of super cloud posts, Charles put up his second rendition of "Supercloudifragilisticexpialidoucious". I mean, it's just beautiful, but to boot, he put up this picture of Baghdad Bob asking us to just stop, Bob's real name is Mohamed Said al-Sahaf. He was the minister of propaganda for Sadam Husein during the 2003 invasion of Iraq. And he made these outrageous claims of, you know US troops running in fear and putting down their arms and so forth. So anyway, Charles laid out several frankly very helpful critiques of Supercloud which has led us to really advance the definition and catalyze the community's thinking on the topic. Now, one of his issues and there are many is we said a prerequisite of super cloud was a super PaaS layer. Gartner's Lydia Leong chimed in saying there were many examples of successful PaaS vendors built on top of a hyperscaler some having the option to run in more than one cloud provider. But the key point we're trying to explore is the degree to which that PaaS layer is purpose built for a specific super cloud function. And not only runs in more than one cloud provider, Lydia but runs across multiple clouds simultaneously creating an identical developer experience irrespective of a state. Now, maybe that's what Lydia meant. It's hard to say from just a tweet and she's a sharp lady, so, and knows more about that market, that PaaS market, than I do. But to the former point at Supercloud 22, we have several examples. We're going to test. One is Oracle and Microsoft's recent announcement to run database services on OCI and Azure, making them appear as one rather than use an off the shelf platform. Oracle claims to have developed a capability for developers specifically built to ensure high performance low latency, and a common experience for developers across clouds. Another example we're going to test is Snowflake. I'll be interviewing Benoit Dageville co-founder of Snowflake to understand the degree to which Snowflake's recent announcement of an application development platform is perfect built, purpose built for the Snowflake data cloud. Is it just a plain old pass, big whoop as Lydia claims or is it something new and innovative, by the way we invited Charles Fitz to participate in Supercloud 22 and he decline saying in addition to a few other somewhat insulting things there's definitely interesting new stuff brewing that isn't traditional cloud or SaaS but branding at all super cloud doesn't help either. Well, indeed, we agree with part of that and we'll see if it helps advanced thinking and helps customers really plan for the future. And that's why Supercloud 22 has going to feature some of the best analysts in the business in The Great Supercloud Debate. In addition to Keith Townsend and Maribel Lopez of Lopez research and Sanjeev Mohan from former Gartner analyst and principal at SanjMo participated in this session. Now we don't want to mislead you. We don't want to imply that these analysts are hopping on the super cloud bandwagon but they're more than willing to go through the thought experiment and mental exercise. And, we had a great conversation that you don't want to miss. Maribel Lopez had what I thought was a really excellent way to think about this. She used TCP/IP as an historical example, listen to what she said. >> And Sanjeev Mohan has some excellent thoughts on the feasibility of an open versus de facto standard getting us to the vision of Supercloud, what's possible and what's likely now, again, I don't want to imply that these analysts are out banging the Supercloud drum. They're not necessarily doing that, but they do I think it's fair to say believe that something new is bubbling and whether it's called Supercloud or multicloud 2.0 or cross cloud services or whatever name you choose it's not multicloud of the 2010s and we chose Supercloud. So our goal here is to advance the discussion on what's next in cloud and Supercloud is meant to be a term to describe that future of cloud and specifically the cloud opportunities that can be built on top of hyperscale, compute, storage, networking machine learning, and other services at scale. And that is why we posted this piece on Answering the top 10 questions about Supercloud. Many of which were floated by Charles Fitzgerald and others in the community. Why does the industry need another term what's really new and different? And what is hype? What specific problems does Supercloud solve? What are the salient characteristics of Supercloud? What's different beyond multicloud? What is a super pass? Is it necessary to have a Supercloud? How will applications evolve on superclouds? What workloads will run? All these questions will be addressed in detail as a way to advance the discussion and help practitioners and business people understand what's real today. And what's possible with cloud in the near future. And one other question we'll address is who will build super clouds? And what new entrance we can expect. This is an ETR graphic that we showed in a previous episode of breaking analysis, and it lays out some of the companies we think are building super clouds or in a position to do so, by the way the Y axis shows net score or spending velocity and the X axis depicts presence in the ETR survey of more than 1200 respondents. But the key callouts to this slide in addition to some of the smaller firms that aren't yet showing up in the ETR data like Chaossearch and Starburst and Aviatrix and Clumio but the really interesting additions are industry players Walmart with Azure, Capital one and Goldman Sachs with AWS, Oracle, with Cerner. These we think are early examples, bubbling up of industry clouds that will eventually become super clouds. So we'll explore these and other trends to get the community's input on how this will all play out. These are the things we hope you'll take away from Supercloud 22. And we have an amazing lineup of experts to answer your question. Technologists like Kit Colbert, Adrian Cockcroft, Mariana Tessel, Chris Hoff, Will DeForest, Ali Ghodsi, Benoit Dageville, Muddu Sudhakar and many other tech athletes, investors like Jerry Chen and In Sik Rhee the analyst we featured earlier, Paula Hansen talking about go to market in a multi-cloud world Gee Rittenhouse talking about cloud security, David McJannet, Bhaskar Gorti of Platform9 and many, many more. And of course you, so please go to theCUBE.net and register for Supercloud 22, really lightweight reg. We're not doing this for lead gen. We're doing it for collaboration. If you sign in you can get the chat and ask questions in real time. So don't miss this inaugural event Supercloud 22 on August 9th at 9:00 AM Pacific. We'll see you there. Okay. That's it for today. Thanks for watching. Thank you to Alex Myerson who's on production and manages the podcast. Kristen Martin and Cheryl Knight. They help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some really wonderful editing. Thank you to all. Remember these episodes are all available as podcasts wherever you listen, just search breaking analysis podcast. I publish each week on wikibon.com and Siliconangle.com. And you can email me at David.Vellantesiliconangle.com or DM me at Dvellante, comment on my LinkedIn post. Please do check out ETR.AI for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE insights powered by ETR. Thanks for watching. And we'll see you next week in Palo Alto at Supercloud 22 or next time on breaking analysis. (calm music)

Published Date : Aug 5 2022

SUMMARY :

This is breaking analysis and buyers for the next 20 years. Is VMware the right company is the degree to which that PaaS layer and specifically the cloud opportunities

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David McJannet	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Paula Hansen	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Keith Townsend	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Chuck Hollis	PERSON	0.99+
Charles Fitz	PERSON	0.99+
Charles	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Keith	PERSON	0.99+
Mariana Tessel	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Charles Fitzgerald	PERSON	0.99+
Mohamed Said al-Sahaf	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Clumio	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Gee Rittenhouse	PERSON	0.99+
Aviatrix	ORGANIZATION	0.99+
Chaossearch	ORGANIZATION	0.99+
Benoit Dageville	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
NIST	ORGANIZATION	0.99+
Lydia Leong	PERSON	0.99+
Muddu Sudhakar	PERSON	0.99+
Bob	PERSON	0.99+
Cerner	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Capital one	ORGANIZATION	0.99+
David.Vellantesiliconangle.com	OTHER	0.99+
Starburst	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
2010s	DATE	0.99+
Will DeForest	PERSON	0.99+
more than 1200 respondents	QUANTITY	0.99+
one day	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
2021	DATE	0.99+
next week	DATE	0.99+
Supercloud 22	EVENT	0.99+
theCUBE.net	OTHER	0.99+
Bhaskar Gorti	PERSON	0.99+
Supercloud	ORGANIZATION	0.98+
each week	QUANTITY	0.98+
eight	DATE	0.98+
SanjMo	ORGANIZATION	0.98+
Lydia	PERSON	0.98+
theCUBE	ORGANIZATION	0.98+
PaaS	TITLE	0.98+
more than 25 speakers	QUANTITY	0.98+
Snowflake	ORGANIZATION	0.98+
Platform9	ORGANIZATION	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
today	DATE	0.97+
Hollis	PERSON	0.97+
Sadam Husein	PERSON	0.97+
second rendition	QUANTITY	0.97+
Boston	LOCATION	0.97+
SiliconANGLE	ORGANIZATION	0.96+
more than one cloud provider	QUANTITY	0.96+
both	QUANTITY	0.95+
super cloud 22	EVENT	0.95+

Starburst Panel Q2

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
six	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Richard Jarvis	PERSON	0.99+
20 cents	QUANTITY	0.99+
20%	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
P Sanji Mohan	PERSON	0.99+
Today	DATE	0.99+
seven years	QUANTITY	0.99+
pythons	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
JAK	PERSON	0.99+
Javas	TITLE	0.99+
10	DATE	0.99+
today	DATE	0.98+
Starbust	TITLE	0.98+
Starburst	ORGANIZATION	0.97+
VMware	ORGANIZATION	0.97+
both	QUANTITY	0.97+
12 years ago	DATE	0.96+
single point	QUANTITY	0.96+
millions of hours	QUANTITY	0.95+
10	QUANTITY	0.93+
Unix	TITLE	0.92+
12 years	QUANTITY	0.92+
Google	ORGANIZATION	0.9+
two worlds	QUANTITY	0.9+
DY	ORGANIZATION	0.87+
first data lake	QUANTITY	0.86+
Hudu	LOCATION	0.85+
trillions	QUANTITY	0.85+
one thing	QUANTITY	0.83+
many years ago	DATE	0.79+
Apache iceberg	ORGANIZATION	0.79+
over a couple times	QUANTITY	0.77+
emus health	ORGANIZATION	0.75+
Jimin	PERSON	0.73+
Starburst	TITLE	0.73+
years ago	DATE	0.72+
Azure	TITLE	0.7+
Kubernetes	ORGANIZATION	0.67+
TCO	ORGANIZATION	0.64+
S3	TITLE	0.62+
Delta	ORGANIZATION	0.6+
plus years	DATE	0.59+
Number two	QUANTITY	0.58+
a decade	QUANTITY	0.56+
iceberg	TITLE	0.47+
Parque	ORGANIZATION	0.47+
last	DATE	0.47+
20	QUANTITY	0.46+
Q2	QUANTITY	0.31+
ORC	ORGANIZATION	0.27+

Supercloud22

(upbeat music) >> On August 9th at 9:00 am Pacific, we'll be broadcasting live from theCUBE Studios in Palo Alto, California. Supercloud22, an open industry event made possible by VMware. Supercloud22 will lay out the future of multi-cloud services in the 2020s. John Furrier and I will be hosting a star lineup, including Kit Colbert, VMware CTO, Benoit Dageville, co-founder of Snowflake, Marianna Tessel, CTO of Intuit, Ali Ghodsi, CEO of Databricks, Adrian Cockcroft, former CTO of Netflix, Jerry Chen of Greylock, Chris Hoff aka Beaker, Maribel Lopez, Keith Townsend, Sanjiv Mohan, and dozens of thought leaders. A full day track with 17 sessions. You won't want to miss Supercloud22. Go to thecube.net to mark your calendar and learn more about this free hybrid event. We'll see you there. (upbeat music)

Published Date : Jul 30 2022

SUMMARY :

and dozens of thought leaders.

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
Steve Mullaney	PERSON	0.99+
Katie	PERSON	0.99+
David Floyer	PERSON	0.99+
Charles	PERSON	0.99+
Mike Dooley	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris	PERSON	0.99+
Tristan Handy	PERSON	0.99+
Bob	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Wolf	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Merim	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Rossi	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Chris Wegmann	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Eric	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Caterpillar	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Josh	PERSON	0.99+
Europe	LOCATION	0.99+
Jerome	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lori MacVittie	PERSON	0.99+
2007	DATE	0.99+
Seattle	LOCATION	0.99+
10	QUANTITY	0.99+
five	QUANTITY	0.99+
Ali Ghodsi	PERSON	0.99+
Peter McKee	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
India	LOCATION	0.99+
Mike	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Tanuja Randery	PERSON	0.99+

The Great Supercloud Debate | Supercloud22

[Music] welcome to the great super cloud debate a power panel of three top technology industry analysts maribel lopez is here she's the founder and principal analyst at lopez research keith townsend is ceo and founder of the cto advisor and sanjeev mohan is principal at sanjmo super cloud is a term that we've used to describe the future of cloud architectures the idea is that super clouds are built on top of hyperscaler capex infrastructure and the idea is it goes beyond multi-cloud the premise being that multi-cloud is primarily a symptom of multi-vendor or m a or both and results in more stove we're going to talk about that super cloud's meant to connote a new architecture that leverages the underlying primitives of hyperscale clouds but hides and abstracts that complexity of each of their respective clouds and adds new value on top of that with services and a continuous experience a similar or identical experience across more than one cloud people may say hey that's multi-cloud we're going to talk about that as well so with that as brief background um i'd like to first welcome our painless guys thanks so much for coming on thecube it's great to see you all again great to be here thank you to be here so i'm going to start with maribel you know what i just described what's your reaction to that is it just like what like cloud is supposed to be is that really what multi-cloud is do you agree with the premise that multi-cloud has really been you know what like chuck whitten from dell calls it it's been multi-cloud by default i call it a symptom of multi-vendor what's your take on on what this is oh wow dave another term here we go right more more to define for people but okay the reality is i agree that it's time for something new something evolved right whether we call that super cloud or something else i you know i don't want to really debate the term but we need to move beyond where we are today in multi-cloud and into if we want to call it cloud 5 multi-cloud 2 whatever we want to call it i believe that we're at the next generation that we have to define what that next generation is but if you think about it we went from public to private to hybrid to multi and every time you have a discussion with somebody about cloud you spend 10 minutes defining what you're talking about so this doesn't seem any different to me so let's just go with super cloud for the moment and see where we go and you know if you're interested after everybody else makes their comments i got a few thoughts about what super cloud might mean as well yeah great so i and i agree with you when we like i said in a recent post you could call it cl cloud you know multi-cloud 2.0 but it's something different is happening and sanjeev i know you're not a you're not a big fan of buzz words either but i wonder if you could weigh in on this topic uh you mean by the way sanjeev is at the mit cdo iq conference a great conference uh in boston uh and so he's it's a public place so we're going to have i think you viewed his line when he's not speaking please go ahead yeah so you know i come from a pedigree of uh being an analyst of uh firms that love inventing new terms i am not a big fan of inventing new terms i feel that when we come up with a new term i spend all my time standing on a stage trying to define what it is it takes me away from trying to solve the problem so so i'm you know i find these terms to be uh words of convenience like for example big data you know big data to me may not mean anything but big data connotes some of this modern way of handling vast volumes of data that traditional systems could not handle so from that point of view i'm i'm completely okay with super cloud but just inventing a new term is what i have called in my previous sessions tyranny of jargons where we have just too many jargons and uh and they resonate with i.t people they do not resonate with the business people business people care about the problem they don't care about what we and i t called them yeah and i think this is a really important point that you make and by the way we're not trying to create a new industry category per se yeah we leave that to gartner that's why actually i like super cloud because nobody's going to use that no vendor's going to use the term super cloud it's just too buzzy so so but but but it brings up the point about practitioners and so keith i want to bring you in so the what we've talked about and i'll just sort of share some some thoughts on the problems that we see and and get keith get your practitioner view most clouds most companies use multiple clouds we all kind of agree on that i think and largely these clouds operate in silos and they have their own development environment their own operating environment different apis different primitives and the functionality of a particular cloud doesn't necessarily extend to other clouds so the problem is that increases friction for customers increases cost increases security risk and so there's this promise maribel multi-cloud 2.0 that's going to solve that problem so keith my question to you is is is that an accurate description of the problem that practitioners face today do what did i miss and i wonder if you could elaborate so i think we'll get into some of the detail later on why this is a problem specifically around technologies but if we think about it in the abstract most customers have their hands full dealing with one cloud like we'll you know through m a and such and you zoom in and you look at companies that have multiple clouds or multi-cloud from result of mma mna m a activity you'll see that most of that is in silos so organizationally the customer may have multiple clouds but sub orchid silos they're generally a single silo in a single cloud so as you think about being able to take advantage of of tooling across the multicloud of what dave you guys are calling the super cloud this becomes a serious problem it's just a skill problem it's too much capability uh across too many things that look completely different than another okay so dave can i pick up on that please i'd love i was gonna just go to you maribel please chime in here okay so if we think about what we're talking about with super cloud and what keith just mentioned remember when we went to see tcp ip and the whole idea was like how do we get computers to talk to each other in a more standardized way how do we get data to move in a more standardized way i think that the problem we have with multi-cloud right now is that we don't have that so i think that's sort of a ground level of getting us to your super cloud premise is that and and you know google's tried it with anthony's like everybody every hyperscaler has tried their like right one to run anywhere but that abstraction layer you talk about what whatever we want to call it is super necessary and it's sort of the foundation so if you really think about it we've spent like 15 years or so building out all the various components of cloud and now's the time to take it so that cloud is actually more of an operating model versus a place there's at least a base level of it that is vendor neutral and then to your point the value that's going to be built on top of that you know people been trying to commoditize the basic infrastructure for a while now and i think that's what you're seeing in your super cloud multi-cloud whatever you want to call it the infrastructure is the infrastructure and then what would have been traditionally that past layer and above is where we're going to start to see some real innovation but we still haven't gotten to that point where you can do visibility observability manageability across that really complex cloud stack that we have the reason i the reason i love that tcpip example hm is because it changed the industry and it had an ecosystem effect in sanjiv the the the example that i first example that i used was snowflake a company that you're very familiar with that is sort of hiding all that complexity and right and so we're not there yet but please chime in on this topic uh you gotta you gotta view it again uh after you building upon what maribel said you know to me uh this sounds like a multi-cloud operating system where uh you know you need that kind of a common uh set of primitives and layers because if you go in in the typical multi-cloud process you've got multiple identities and you can't have that you how can you govern if i'm if i have multiple identities i don't have observability i don't know what's going on across my different stacks so to me super cloud is that call it single pane of glass or or one way through which i'm unifying my experience my my technology interfaces my integration and uh and i as an end user don't even care which uh which cloud i'm in it makes no difference to me it makes a difference to the vendor the vendor may say this is coming from aws and this is coming from gcp or azure but to the end user it is a consistent experience with consistent id and and observability and governance so that to me makes it a big difference and so one of floyer's contribution conversation was in order to have a super cloud you got to have a super pass i'm like oh boy people are going to love that but the point being that that allows a consistent developer experience and to maribel's earlier point about tcp it explodes the ecosystem because the ecosystem can now write to that super pass if you will those apis so keith do you do do you buy that number one and number two do you see that industries financial services and healthcare are actually going to be on clouds or what we call super clouds so sanjeev hit on a really key aspect of this is identity let's make this real they you love talk about data collaboration i love senji's point on the business user kind of doesn't care if this is aws versus super cloud versus etc i was collaborating with the client and he wanted to send video file and the video file uh his organization's access control policy didn't allow him to upload or share the file from their preferred platform so he had to go out to another cloud provider and create yet another identity for that data on the cloud same data different identity a proper super cloud will enable me to simply say as a end user here's a set of data or data sets and i want to share a collaboration a collaborator and that requires cross identity across multiple clouds so even before we get to the past layer and the apis we have to solve the most basic problem which is data how do we stop data scientists from shipping snowballs to a location because we can't figure out the identity the we're duplicating the same data within the same cloud because we can't share identity across customer accounts or etc we we have to solve these basic thoughts before we get to supercloud otherwise we get to us a turtles all the way down thing so we'll get into snowflake and what snowflake can do but that's what happens when i want to share my snowflake data across multiple clouds to a different platform yeah you have to go inside the snowflake cloud which leads right so i would say to keith's question sanjeev snowflake i think is solving that problem but then he brings up the other problem which is what if i want to share share data outside the snowflake cloud so that gets to the point of visit open is it closed and so sanji chime in on the sort of snowflake example and in maribel i wonder if there are networking examples because that's that's keith's saying you got to fix the plumbing before you get these higher level abstractions but sanji first yeah so i so i actually want to go and talk a little bit about network but from a data and analytics point of view so i never built upon what what keith said so i i want to give an example let's say i am getting fantastic web logs i and i know who uh uh how much time they're spending on my web pages and which pages they're looking at so i have all of that now all of that is going into cloud a now it turns out that i use google analytics or maybe i use adobe's you know analytics uh suite now that is giving me the business view and i'm trying to do customer journey analytics and guess what i now have two separate identities two separate products two separate clouds if i and i as an id person no problem i can solve any problem by writing tons of code but why would i do that if i can have that super pass or a multi-cloud layout where i've got like a single way of looking at my network traffic my customer metrics and i can do my customer journey analytics it solves a huge problem and then i can share that data with my with my partners so they can see data about their products which is a combination of data from different uh clouds great thank you uh maribel please i think we're having a lord of the rings moment here with the run one room to rule them all concept and i'm not sure that anybody's actually incented to do that right so i think there's two levels of the stack i think in the basic we're talking a lot about we don't have the basic fundamentals of how do you move data authenticate data secure data do data lineage all that stuff across different clouds right we haven't even spoken right now i feel like we're really just talking about the public cloud venue and we haven't even pulled in the fact that people are doing hybrid cloud right so hybrid cloud you know then you're talking about you've got hardware vendors and you've got hyperscaler vendors and there's two or three different ways of doing things so i honestly think that something will emerge like if we think about where we are in technology today it's almost like we need back to that operating system that sanji was talking about like we need a next generation operating system like nobody wants to build the cloud mouse driver of the 21st century over and over again right we need something like that as a foundation layer but then on top of it you know there's obviously a lot of opportunity to build differentiation like when i think back on what happened with cloud amazon remained aws remained very powerful and popular because people invested in building things on amazon right they created a platform and it took a while for anybody else to catch up to that or to have that kind of presence and i still feel that way when i talk to companies but having said that i talked to retail the other day and they were like hey we spent a long time building an abstraction layer on top of the clouds so that our developers could basically write once and run anywhere but they were a massive global presence retailer that's not something that everybody can do so i think that we are still missing a gap i don't know if that exactly answers your question but i i do feel like we're kind of in this chicken and egg thing which comes first and nobody wants to necessarily invest in like oh well you know amazon has built a way to do this so we're all just going to do it the amazon way right it seems like that's not going to work either but i think you bring up a really important point which there is going to be no one ring to rule them all you're going to have you know vmware is going to solve its multi-cloud problem snowflake's going to do a very has a very specific you know purpose-built system for it itself databricks is going to do its thing and it's going to be you know more open source i would companies like aviatrix i would say cisco even is going to go out and solve this problem dell showed at uh at dell tech world a thing called uh project alpine which is basically storage across clouds they're going to be many super clouds we're going to get maybe super cloud stove pipes but but the point is however for a specific problem in a set of use cases they will be addressing those and solving incremental value so keith maybe we won't have that single cloud operating you know system but we'll have multiple ones what are your thoughts on that yeah we're definitely going to have multiple ones uh the there is no um there is no community large enough or influential enough to push a design take maribel's example of the mega retailer they've solved it but they're not going to that's that's competitive that's their competitive advantage they're not going to share that with the rest of us and open source that and force that upon the industry via just agreement from everyone else so we're not going to get uh the level of collaboration either originated by the cloud provider originated from user groups that solves this problem big for us we will get silos in which this problem is solved we'll get groups working together inside of maybe uh industry or subgroups within the industry to say that hey we're going to share or federate identity across our three or four or five or a dozen organizations we'll be able to share data we're going to solve that data problem but in the same individual organizations in another part of the super cloud problem are going to again just be silos i can't uh i can't run machine learning against my web assets for the community group that i run because that's not part of the working group that solved a different data science problem so yes we're going to have these uh bifurcations and forks within the super cloud the question is where is the focus for each individual organization where do i point my smart people and what problems they solve okay i want to throw out a premise and get you guys reaction to it because i think this again i go back to the maribel's tcpip example it changed the industry it opened up an ecosystem and to me this is what digital transformation is all about you've got now industry participants marc andreessen says every company is a software company you've now got industry participants and here's some examples it's not i wouldn't call them true super clouds yet but walmart's doing their hybrid thing with azure you got goldman sachs announced at the last reinvent and it's going to take its tools its software its data and which is on-prem and connect that to the aws cloud and actually deliver a service capital one we saw sanjiv at the snowflake summit is is taking their tooling and doing it now granted just within snowflake and aws but i fully expect them to expand that across other clouds these are industry examples capital one software is the name of the division that are now it's to the re reason why i don't get so worried that we're not solving the lord of the rings problem that maribel mentioned is because it opens up tremendous opportunities for companies we got like just under five minutes left i want to throw that out there and see what you guys think yeah i would just i want to build upon what maribel said i love what she said you're not going to build a mouse driver so if multi-cloud supercloud is a multi-cloud os the mouse driver would be identity or maybe it's data quality and to teach point that data quality is not going to come from a single vendor that is going to come from a different vendor whose job is to to harmonize data because there might be data might be for the same identity but it may be a different granularity level so you cannot just mix and match so you need to have some sort of like resolution and that is is an example of a driver for multi-cloud interesting okay so you know octa might be the identity cloud or z scaler might be the security cloud or calibre has its cloud etc any thoughts on that keith or maribel yeah so let's talk about where the practical challenges run into this we did some really great research that was sponsored by one of the large cloud providers in which we took all we looked at all the vmware cloud solutions when i say vmware cloud vmware has a lot of products across multi-cloud now in the rock broadcloud portfolio but we're talking about the og solution vmware vsphere it would seem like on paper if i put vmware vsphere in each cloud that is therefore a super cloud i think we would all agree to that in principle what we found in our research was that when we put hands on keyboard the differences of the clouds show themselves in the training gap and that skills gap between the clouds show themselves if i needed to expose less our favorite friend a friend a tc pip address to the public internet that is a different process on each one of the clouds that needs to be done on each one of the clouds and not abstracted in vmware vsphere so as we look at the nuance yes we can give the big controls but where the capital ones the uh jp morgan chase just spent two billion dollars on this type of capability where the spin effort is done is taking it from that 80 percent to that 90 95 experience and that's where the effort and money is spent on that last mile maribel we're out of time but please you know bring us home give us your closing thoughts hey i think we're still going to be working on what the multi-cloud thing is for a while and you know super cloud i think is a direction of the future of cloud computing but we got some real problems to solve around authentication uh identity data lineage data security so i think those are going to be sort of the tactical things that we're working on for the next couple years right guys always a pleasure having you on the cube i hope we see you around keith i understand you're you're bringing your airstream to vmworld or vmware explorer putting it on the on the floor i can't wait to see that and uh mrs cto advisor i'm sure we'll be uh by your side so looking forward to that hopefully sanjeev and maribel we'll see you uh on the circuit as well yes hope to see you there right looking forward to hopefully even doing some content with you guys at vmware explorer too awesome looking forward all right keep it right there for more content from super cloud 22 right back [Music] you

Published Date : Jul 20 2022

SUMMARY :

that problem so keith my question to you

ENTITIES

Entity	Category	Confidence
marc andreessen	PERSON	0.99+
maribel lopez	PERSON	0.99+
three	QUANTITY	0.99+
amazon	ORGANIZATION	0.99+
10 minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
two billion dollars	QUANTITY	0.99+
maribel	PERSON	0.99+
sanjeev	PERSON	0.99+
four	QUANTITY	0.99+
cisco	ORGANIZATION	0.99+
five	QUANTITY	0.99+
keith	PERSON	0.99+
80 percent	QUANTITY	0.99+
sanji	PERSON	0.99+
walmart	ORGANIZATION	0.99+
aviatrix	ORGANIZATION	0.99+
boston	LOCATION	0.99+
sanjmo	ORGANIZATION	0.99+
cto advisor	ORGANIZATION	0.99+
two levels	QUANTITY	0.98+
15 years	QUANTITY	0.98+
sanjeev mohan	PERSON	0.98+
21st century	DATE	0.98+
more than one cloud	QUANTITY	0.97+
uh project alpine	ORGANIZATION	0.96+
each one	QUANTITY	0.96+
aws	ORGANIZATION	0.96+
lopez	ORGANIZATION	0.96+
each cloud	QUANTITY	0.96+
under five minutes	QUANTITY	0.96+
senji	PERSON	0.96+
today	DATE	0.95+
one	QUANTITY	0.94+
first example	QUANTITY	0.94+
first	QUANTITY	0.94+
vmware	TITLE	0.93+
both	QUANTITY	0.93+
one room	QUANTITY	0.92+
vmworld	ORGANIZATION	0.92+
azure	TITLE	0.92+
single cloud	QUANTITY	0.92+
keith townsend	PERSON	0.91+
one way	QUANTITY	0.91+
google	ORGANIZATION	0.9+
three different ways	QUANTITY	0.89+
two separate	QUANTITY	0.89+
single way	QUANTITY	0.89+
each	QUANTITY	0.88+
adobe	TITLE	0.88+
each individual organization	QUANTITY	0.86+
gartner	ORGANIZATION	0.86+
dell	ORGANIZATION	0.86+
aws	TITLE	0.86+
vmware	ORGANIZATION	0.85+
uh	ORGANIZATION	0.85+
single pane	QUANTITY	0.84+
next couple years	DATE	0.83+
single vendor	QUANTITY	0.83+
a dozen organizations	QUANTITY	0.83+
floyer	PERSON	0.82+
tons of code	QUANTITY	0.81+
one cloud	QUANTITY	0.81+
super cloud	TITLE	0.8+
maribel	LOCATION	0.79+
three top technology industry analysts	QUANTITY	0.78+
dell tech world	ORGANIZATION	0.78+
dave	PERSON	0.77+
clouds	ORGANIZATION	0.77+

theCUBE on Supercloud | AWS Summit New York 2022

welcome back to thecube's live coverage coming to you from the big apple in new york city we're talking all things aws summit but right now i've got two powerhouses you know them you love them john furrier dave vellante going to be talking about super cloud guys we've been talking a lot about this there's a big event coming up on the cube august 9th and i gotta start dave with you because we talk about it pretty much in every interview where it's relevant why super cloud yeah so john furrier years ago started a tradition lisa prior to aws which was to lay down the expectation for our audiences what they should be looking for at aws reinvent okay john when did that start 2012 2013. actually 2013 was our first but 2015 was the first time when we get access to andy jassy who wasn't doing any briefings and we realized that the whole industry started looking at amazon web services as a structural forcing function of massive change uh some say inflection point we were saying complete redefinition so you wrote the trillion dollar baby yeah right which actually turns into probably multi-trillion dollars we got it right on that one surprisingly it was pretty obvious so every year since then john has published the seminal article prior to reinvent so this year we were talking we're coming out of the isolation economy and john hedwig also also adam silevski was the new ceo so we had a one-on-one with adam that's right and then that's where the convergence between andy jassy and adam celebski kicked in which is essentially those guys work together even though they he went off and boomerang back in as they say in aws but what's interesting was is that adam zluski's point of view piggyback jassy but he had a different twist yeah some so you know low you know people who didn't have really a lot of thought into it said oh he's copying microsoft moving up the stack we're like no no no no no something structural is happening again and so john wrote the piece and he started sharing it we're collaborating he said hey dave take a take a look add your perspectives and then jerry chen had just written castles in the cloud and he talked about sub-markets and we were sort of noodling and one of the other things was in 2018 2019 around that time at aws re invent there was this friction between like snowflake and aws because redshift separated compute from storage which was snowflake's whole thing now fast forward to 2021 after we're leaving you know the covert economy by the way everyone was complaining they are asking jassy are you competing with your ecosystem the classic right trope and then in in remember jason used to use cloudera as the example i would like to maybe pick a better example snowflake became that example and what the transition was it went from hey we're kind of competitive for sure there's a lot of examples but it went from we're competitive they're stealing our stuff to you know what we're making so much money building on top of aws specifically but also the clouds and cross clouds so we said there's something new happening in the ecosystem and then just it popped up this term super cloud came up to connote a layer that floats above the hyperscale capex not is it's not pass it's not sas it's the combination of the of those things on top of a new digital infrastructure and we chose the term super cloud we liked it better than multi-cloud because multiplayer at least one other point too i think four or five years earlier dave and i across not just aws reinvent all of our other events we were speculating that there might be a tier two cloud service provider models and we've talked with intel about this and others just kind of like evaluating it staring at it and we met by tier two like maybe competing against amazon but what happened was it wasn't a tier two cloud it was a super cloud built on the capex of aws which means initially was a company didn't have to build aws to be like aws and everybody wanted to be like aws so we saw the emergence of the smart companies saying hey let's refactor our business model in the category or industry scope and to dominate with cloud scale and they did it that then continued that was the premise of chen's post which was kind of rift on the cube initially which is you can have a moat in a castle in the cloud and have a competitive advantage and a sustainable differentiation model and that's exactly what's happening and then you introduce the edge and hybrid you now have a cloud operating model that that super cloud extends as a substrate across all environments so it's not multi-cloud which sounds broken and like put it distance jointed joint barriers hybrid cloud which is the hybrid operating model at scale and you don't have to be amazon to take advantage of all the value creation since they took care of the capex now they win too on the other side because because they're selling ec2 and storage and ml and ai and this is new and this is information that people don't might not know about internally at aws there was a debate dave okay i heard this from sources do we go all in and compete and just own the whole category or open the ecosystem and coexist with [ __ ] why do we have these other companies or snowflake and guess what the decision was let's make it open ecosystem and let's have our own offerings as well and let the winner take off smart because they can't hire enough people and we just had aws and snowflake on the cube a few weeks ago talking about the partnership the co-op petition the value in it but what's been driving it is the voice of the customer but i want to ask you paint the picture for the audience of the critical key components of super cloud what are those yeah so i think first and foremost super cloud as john was saying it's not multi-cloud chuck whitten had a great phrase at dell tech world he said multi-cloud by default right versus multi-cloud by design and multi-cloud has been by default it's been this sort of i run in aws and i run my stack in azure or i run my stack in gcp and it works or i wrap my stack in a container and host it in the cloud that's what multi-cloud has been so the first sort of concept is it's a layer that that abstracts the underlying complexity of all the clouds all the primitives uh it takes advantage of maybe graviton or microsoft tooling hides all that and builds new value on top of that the other piece of of super cloud is it's ecosystem driven really interesting story you just told because literally amazon can't hire everybody right so they have to rely on the ecosystem for feature acceleration so it's it also includes a path layer a super pass layer we call it because you need to develop applications that are specific to the problem that the super cloud is solving so it's not a generic path like openshift it's specific to whether it's snowflake or [ __ ] or aviatrix so that developers can actually build on top of and not have to worry about that underlying and also there's some people that are criticizing um what we're doing in a good way because we want to have an open concept sure but here's the thing that a lot of people don't understand they're criticizing or trying to kind of shoot holes in our new structural change that we're identifying to comparing it to old that's like saying mainframe and mini computers it's like saying well the mainframe does it this way therefore there's no way that's going to be legitimate so the old thinking dave is from people that have no real foresight in the new model right and so they don't really get it right so what i'm saying is that we look at structural change structural change is structural change it either happens or it doesn't so what we're observing is the fact that a snowflake didn't design their solution to be multi-cloud they did it all on aws and then said hey why would we why are we going to stop there let's go to azure because microsoft's got a boatload of customers because they have a vertically stacking integration for their install base so if i'm snowflake why wouldn't i be on azure and the same for gcp and the same for other things so this idea that you can get the value of an amp what amazon did leverage and all that value without paying for it up front is a huge dynamic and that's not just saying oh that's cloud that's saying i have a cloud-like scale cloud-like value proposition which which will look like an ecosystem so to me the acid test is if i build on top of say [ __ ] or say snowflake or super cloud by default i'm either a category leader i own the data at scale or i'm sharing data at scale and i have an ecosystem people are building on top of me so that's a platform so that's really difficult so what's happening is these ecosystem partners are taking advantage as john said of all the hyperscale capex and they're building out their version of a distributed global system and then the other attribute of super cloud is it's got metadata management capability in other words it knows if i'm optimizing for latency where in the super cloud to get the data or how to protect privacy or sovereignty or how many copies to make to have the proper data protection or where the air gap should be for ransomware so these are examples of very specific purpose-built super clouds that are filling gaps that the hyperscalers aren't going after what's a good example of a specific super cloud that you think really articulates what you guys are talking about i think there are a lot of them i think snowflake is a really good example i think vmware is building a multi-cloud management system i think aviatrix and virtual you know private cloud networking and for high performance networking i think to a certain extent what oracle is doing with azure is is is definitely looks like a super cloud i think what capital one is doing by building on to taking their own tools and and and moving that to snowflake now that they're not cross-cloud yet but i predict that they will be of i think uh what veeam is doing in data protection uh dell what they showed at dell tech world with project alpine these are all early examples of super well here's an indicator here's how you look at the example so to me if you're just lifting and shifting that was the first gen cloud that's not changing the business model so i think the number one thing to look at is is the company whether they're in a vertical like insurance or fintech or financial are they refactoring their spend not as an i.t cost but as a refactoring of their business model yes like what snowflake did dave or they say okay i'm gonna change how i operate not change my business model per se or not my business identity if i'm gonna provide financial services i don't have to spend capex it's operating expenses i get the capex leverage i redefine i get the data at scale and now i become a service provider to everybody else because scale will determine the power law of who wins in the verticals and in the industry so we believe that snowflake is a data warehouse in the cloud they call it a data cloud now i don't think snowflake would like that dave i call them a data warehouse no a super data cloud but but so the other key here is you know the old saying that andreessen came up with i guess with every company's a software company well what does that mean it means every company software company every company is going digital well how are they going to do that they're going to do that by taking their business their data their tooling their proprietary you know moat and moving that to the cloud so they can compete at scale every company should be if they're not thinking about doing a super cloud well walmart i think i think andreessen's wrong i think i would revise and say that andreessen and the brain trust at andreas and horowitz is that that's no longer irrelevant every company isn't a software company the software industry is called open source everybody is an open source company and every company will be at super cloud that survives yeah to me to me if you're not looking at super cloud as a strategy to get value and refactor your business model take advantage of what you're paying it for but you're paying now in a new way you're building out value so that's you're either going to be a super cloud or get services from a super cloud so if you're not it's like the old joke dave if you're at the table and you don't know who the sucker is it's probably you right so if you're looking at the marketplace you're saying if i'm not a super cloud i'm probably gonna have to work with one because they're gonna have the data they're gonna have the insights they're gonna have the scale they're going to have the castle in the cloud and they will be called a super cloud so in customer conversations helping customers identify workloads to move to the cloud what are the ideal workloads and services to run in super cloud so i honestly think virtually any workload could be a candidate and i think that it's really the business that they're in that's going to define the workload i'll say what i mean so there's certain businesses where low latency high performance transactions are going to matter that's you know kind of the oracle's business there's certain businesses like snowflake where data sharing is the objective how do i share data in a governed way in a secure way in any location across the world that i can monetize so that's their objective you take a data protection company like veeam their objective is to protect data so they have very specific objectives that ultimately dictate what the workload looks like couchbase is another one they they in my opinion are doing some of the most interesting things at the edge because this is where when you when you really push companies in the cloud including the hyperscalers when they get out to the far edge it starts to get a little squishy couchbase actually is developing capabilities to do that and that's to me that's the big wild card john i think you described it accurately the cloud is expanding you've got public clouds no longer just remote services you're including on-prem and now expanding out to the near edge and the deep what do you call it deep edge or far edge lower sousa called the tiny edge right deep edge well i mean look at look at amazon's outpost announcement to me hp e is opportunity dell has opportunities the hardware box guys companies they have an opportunity to be that gear to be an outpost to be their own output they get better stacks they have better gear they just got to run cloud on it yeah right that's an edge node right so so that's that would be part of the super cloud so this is where i think people that are looking at the old models like operating systems or systems mindsets from the 80s they look they're not understanding the new architecture what i would say to them is yeah i hear what you're saying but the structural change is the nodes on the network distributed computing if you will is going to run hybrid cloud all the way across the fact that it's multiple clouds is just coincidence on who's got the best capex value that people build on for their super cloud capability so why wouldn't i be on azure if microsoft's going to give me all their customers that are running office 365 and teams great if i want to be on amazon's kind of sweet which is their ecosystem why wouldn't i want to tap into that so again you can patch it all together in the super cloud so i think the future will be distributed computing cloud architecture end to end and and we felt that was different from multi-cloud you know if you want to call it multi-cloud 2.0 that's fine but you know frankly you know sometimes we get criticized for not defining it tightly enough but we continue to evolve that definition i've never really seen a great definition from multi-cloud i think multi-cloud by default was the definition i run in multiple clouds you know it works in azure it's not a strategy it's a broken name it's a symptom right it's a symptom of multi-vendor is really what multi-cloud has been and so we felt like it was a new term of examples look what we're talking about snowflake data bricks databricks another good one these are these are examples goldman sachs and we felt like the term immediately connotes something bigger something that sits above the clouds and is part of a digital platform you know the people poo poo the metaverse because it's really you know not well defined but every 15 or 20 years this industry goes through dave let me ask you a question so uh lisa you too if i'm in the insurance vertical uh and i'm a i'm an insurance company i have competitors my customers can go there and and do business with that company and you know and they all know that they go to the same conferences but in that sector now you have new dynamics your i.t spend isn't going to keep the lights on and make your apps work your back-end systems and your mobile app to get your whatever now it's like i have cloud scale so what if i refactored my business model become a super cloud and become the major primary service provider to all the competitors and the people that are the the the channel partners of the of the ecosystem that means that company could change the category totally okay and become the dominant category leader literally in two three years if i'm geico okay i i got business in the cloud because i got the app and i'm doing transactions on geico but with all the data that they're collecting there's adjacent businesses that they can get into maybe they're in the safety business maybe they can sell data to governments maybe they can inform logistics and highway you know patterns roll up all the people that don't have the same scale they have and service them with that data and they get subscription revenue and they can build on top of the geico super insurance cloud right yes it's it's unlimited opportunity that's why it's but the multi-trillion dollar baby so talk to us you've done an amazing job of talking which i know you would of why super cloud what it is the critical components the key workloads great examples talk to us in our last few minutes about the event the cube on super cloud august 9th what's the audience going to who are they going to hear from what are they going to learn yeah so august 9th live out of our palo alto studio we're going to have a program that's going to run from 9 a.m to 1 p.m and we're going to have a number of industry luminaries in there uh kit colbert from from vmware is going to talk about you know their strategy uh benoit de javille uh from snowflake is going to is going to be there of g written house of sky-high security um i i i don't want to give it away but i think steve mullaney is going to come on adrian uh cockroft is coming on the panel keith townsend sanjeev mohan will be on so we'll be running that live and also we'll be bringing in pre-recorded interviews that we'll have prior to the show that will run post the live event it's really a pilot virtual event we want to do a physical event we're thinking but the pilot is to bring our trusted friends together they're credible that have industry experience to try to understand the scope of what we're talking about and open it up and help flesh out the definition make it an open model where we can it's not just our opinion we're observing identifying the structural changes but bringing in smart people our smart friends and companies are saying yeah we get behind this because it has it has legs for a reason so we're gonna zoom out and let people participate and let the conversation and the community drive the content and that is super important to the cube as you know dave but i think that's what's going on lisa is that it's a pilot if it has legs we'll do a physical event certainly we're getting phones to bring it off the hook for sponsors so we don't want to go and go all in on sponsorships right now because it's not about money making it's about getting that super cloud clarity around to help companies yeah we want to evolve the concept and and bring in outside perspectives well the community is one of the best places to do that absolutely organic it's an organic community where i mean people want to find out what's going on with the best practices of how to transform a business and right now digital transformation is not just getting digitized it's taking advantage of the technology to leapfrog the competition so all the successful people we talked to at least have the same common theme i'm changing my game but not changing my game to the customer i'm just going to do it differently better faster cheaper more efficient and have higher margins and beat the competition that's the company doesn't want to beat the competition go to thecube.net if you're not all they're all ready to register for the cube on supercloud august 9th 9am pacific you won't want to miss it for john furrier and dave vellante i'm lisa martin we're all coming at you from new york city at aws summit 22. i'll be right back with our next guest [Music] you

Published Date : Jul 14 2022

SUMMARY :

and the deep what do you call it deep

ENTITIES

Entity	Category	Confidence
adam silevski	PERSON	0.99+
jerry chen	PERSON	0.99+
john hedwig	PERSON	0.99+
2015	DATE	0.99+
thecube.net	OTHER	0.99+
august 9th	DATE	0.99+
lisa martin	PERSON	0.99+
amazon	ORGANIZATION	0.99+
john furrier	PERSON	0.99+
adam	PERSON	0.99+
august 9th	DATE	0.99+
2013	DATE	0.99+
2012	DATE	0.99+
2018	DATE	0.99+
new york	LOCATION	0.99+
microsoft	ORGANIZATION	0.99+
9 a.m	DATE	0.99+
adam celebski	PERSON	0.99+
john	PERSON	0.99+
dave	PERSON	0.99+
2021	DATE	0.99+
1 p.m	DATE	0.99+
dave vellante	PERSON	0.99+
august 9th 9am	DATE	0.99+
multi-trillion dollars	QUANTITY	0.99+
first time	QUANTITY	0.99+
walmart	ORGANIZATION	0.99+
multi-trillion dollar	QUANTITY	0.99+
adam zluski	PERSON	0.98+
steve mullaney	PERSON	0.98+
20 years	QUANTITY	0.98+
first	QUANTITY	0.98+
first gen	QUANTITY	0.97+
jason	PERSON	0.97+
two three years	QUANTITY	0.97+
new york city	LOCATION	0.96+
andy jassy	PERSON	0.96+
project alpine	ORGANIZATION	0.96+
aws	ORGANIZATION	0.96+
this year	DATE	0.96+
aviatrix	ORGANIZATION	0.96+
geico	ORGANIZATION	0.95+
graviton	ORGANIZATION	0.95+
super cloud	ORGANIZATION	0.92+
two	QUANTITY	0.92+
one	QUANTITY	0.92+
andreas	ORGANIZATION	0.92+
vmware	ORGANIZATION	0.91+
jassy	PERSON	0.91+
80s	DATE	0.89+
adrian	PERSON	0.89+
years	DATE	0.89+
trillion dollar	QUANTITY	0.88+
palo alto studio	ORGANIZATION	0.88+
tier two	QUANTITY	0.88+
lot of people	QUANTITY	0.87+
office 365	TITLE	0.87+
keith townsend	PERSON	0.86+
azure	TITLE	0.85+
a few weeks ago	DATE	0.84+
azure	ORGANIZATION	0.84+
every company	QUANTITY	0.8+
andreessen	PERSON	0.8+
intel	ORGANIZATION	0.8+
reinvent	EVENT	0.79+
cloudera	TITLE	0.79+

Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization

>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)

Published Date : Jun 18 2022

SUMMARY :

bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where

ENTITIES

Entity	Category	Confidence
Frank Slootman	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Rob Hof	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
2014	DATE	0.99+
Matt Sulkins	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
2019	DATE	0.99+
Cheryl Knight	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Denise Persson	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Tony Bear	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Dell	ORGANIZATION	0.99+
July	DATE	0.99+
Geico	ORGANIZATION	0.99+
November	DATE	0.99+
Snowflake	TITLE	0.99+
40%	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
App Store	TITLE	0.99+
Capital One	ORGANIZATION	0.99+
second principle	QUANTITY	0.99+
Sanjeev Mohan	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
1300 native apps	QUANTITY	0.99+
Tony Bear	PERSON	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
Kristin Martin	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Snowflake Summit 2022	EVENT	0.99+
First	QUANTITY	0.99+
two	DATE	0.99+
Python	TITLE	0.99+
10 different tables	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
ETR	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Snowflake	EVENT	0.98+
one place	QUANTITY	0.98+
each week	QUANTITY	0.98+
O'Reilly	ORGANIZATION	0.98+
This week	DATE	0.98+
Hadoop World	EVENT	0.98+
this week	DATE	0.98+
Pure	ORGANIZATION	0.98+
about 40 partners	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
last quarter	DATE	0.98+
One	QUANTITY	0.98+
S3	TITLE	0.97+
Hadoop	LOCATION	0.97+
single	QUANTITY	0.97+
Caesars Forum	LOCATION	0.97+
Iceberg	TITLE	0.97+
single source	QUANTITY	0.97+
Silicon	ORGANIZATION	0.97+
Nearly 10,000 people	QUANTITY	0.97+
Apache Iceberg	ORGANIZATION	0.97+

theCUBE Insights with Industry Analysts | Snowflake Summit 2022

>>Okay. Okay. We're back at Caesar's Forum. The Snowflake summit 2022. The cubes. Continuous coverage this day to wall to wall coverage. We're so excited to have the analyst panel here, some of my colleagues that we've done a number. You've probably seen some power panels that we've done. David McGregor is here. He's the senior vice president and research director at Ventana Research. To his left is Tony Blair, principal at DB Inside and my in the co host seat. Sanjeev Mohan Sanremo. Guys, thanks so much for coming on. I'm glad we can. Thank you. You're very welcome. I wasn't able to attend the analyst action because I've been doing this all all day, every day. But let me start with you, Dave. What have you seen? That's kind of interested you. Pluses, minuses. Concerns. >>Well, how about if I focus on what I think valuable to the customers of snowflakes and our research shows that the majority of organisations, the majority of people, do not have access to analytics. And so a couple of things they've announced I think address those are helped to address those issues very directly. So Snow Park and support for Python and other languages is a way for organisations to embed analytics into different business processes. And so I think that will be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most most people in the organisation or not, analysts, they're doing some line of business function. Their HR managers, their marketing people, their salespeople, their finance people right there, not sitting there mucking around in the data. They're doing a job and they need analytics in that job. So, >>Tony, I thank you. I've heard a lot of data mesh talk this week. It's kind of funny. Can't >>seem to get away from it. You >>can't see. It seems to be gathering momentum, but But what have you seen? That's been interesting. >>What I have noticed. Unfortunately, you know, because the rooms are too small, you just can't get into the data mesh sessions, so there's a lot of interest in it. Um, it's still very I don't think there's very much understanding of it, but I think the idea that you can put all the data in one place which, you know, to me, stuff like it seems to be kind of sort of in a way, it sounds like almost like the Enterprise Data warehouse, you know, Clouded Cloud Native Edition, you know, bring it all in one place again. Um, I think it's providing, sort of, You know, it's I think, for these folks that think this might be kind of like a a linchpin for that. I think there are several other things that actually that really have made a bigger impression on me. Actually, at this event, one is is basically is, um we watch their move with Eunice store. Um, and it's kind of interesting coming, you know, coming from mongo db last week. And I see it's like these two companies seem to be going converging towards the same place at different speeds. I think it's not like it's going to get there faster than Mongo for a number of different reasons, but I see like a number of common threads here. I mean, one is that Mongo was was was a company. It's always been towards developers. They need you know, start cultivating data, people, >>these guys going the other way. >>Exactly. Bingo. And the thing is that but they I think where they're converging is the idea of operational analytics and trying to serve all constituencies. The other thing, which which also in terms of serving, you know, multiple constituencies is how snowflake is laid out Snow Park and what I'm finding like. There's an interesting I economy. On one hand, you have this very ingrained integration of Anaconda, which I think is pretty ingenious. On the other hand, you speak, let's say, like, let's say the data robot folks and say, You know something our folks wanna work data signs us. We want to work in our environment and use snowflake in the background. So I see those kind of some interesting sort of cross cutting trends. >>So, Sandy, I mean, Frank Sullivan, we'll talk about there's definitely benefits into going into the walled garden. Yeah, I don't think we dispute that, but we see them making moves and adding more and more open source capabilities like Apache iceberg. Is that a Is that a move to sort of counteract the narrative that the data breaks is put out there. Is that customer driven? What's your take on that? >>Uh, primarily I think it is to contract this whole notion that once you move data into snowflake, it's a proprietary format. So I think that's how it started. But it's hugely beneficial to the customers to the users, because now, if you have large amounts of data in parquet files, you can leave it on s three. But then you using the the Apache iceberg table format. In a snowflake, you get all the benefits of snowflakes. Optimizer. So, for example, you get the, you know, the micro partitioning. You get the meta data. So, uh, in a single query, you can join. You can do select from a snowflake table union and select from iceberg table, and you can do store procedures, user defined functions. So I think they what they've done is extremely interesting. Uh, iceberg by itself still does not have multi table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache iceberg in a raw format, they don't have it. But snowflake does, >>right? There's hence the delta. And maybe that maybe that closes over time. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I mean, it reminds me of, like reinvent in 2013, you know? But then I'm struck by the complexity of the last big data era and a dupe and all the different tools. And is this different, or is it the sort of same wine new new bottle? You guys have any thoughts on that? >>I think it's different and I'll tell you why. I think it's different because it's based around sequel. So if back to Tony's point, these vendors are coming at this from different angles, right? You've got data warehouse vendors and you've got data lake vendors and they're all going to meet in the middle. So in your case, you're taught operational analytical. But the same thing is true with Data Lake and Data Warehouse and Snowflake no longer wants to be known as the Data Warehouse. There a data cloud and our research again. I like to base everything off of that. >>I love what our >>research shows that organisation Two thirds of organisations have sequel skills and one third have big data skills, so >>you >>know they're going to meet in the middle. But it sure is a lot easier to bring along those people who know sequel already to that midpoint than it is to bring big data people to remember. >>Mrr Odula, one of the founders of Cloudera, said to me one time, John Kerry and the Cube, that, uh, sequel is the killer app for a Yeah, >>the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. Animals really have thought out the ease of use, you know? I mean, they thought about I mean, from the get go, they thought of too thin to polls. One is ease of use, and the other is scale. And they've had. And that's basically, you know, I think very much differentiates it. I mean, who do have the scale, but it didn't have the ease of use. But don't I >>still need? Like, if I have, you know, governance from this vendor or, you know, data prep from, you know, don't I still have to have expertise? That's sort of distributed in those those worlds, right? I mean, go ahead. Yeah. >>So the way I see it is snowflake is adding more and more capabilities right into the database. So, for example, they've they've gone ahead and added security and privacy so you can now create policies and do even set level masking, dynamic masking. But most organisations have more than snowflake. So what we are starting to see all around here is that there's a whole series of data catalogue companies, a bunch of companies that are doing dynamic data masking security and governance data observe ability, which is not a space snowflake has gone into. So there's a whole ecosystem of companies that that is mushrooming, although, you know so they're using the native capabilities of snowflake, but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other, like relational databases, you can run these cross platform capabilities in that layer. So so that way, you know, snowflakes done a great job of enabling that ecosystem about >>the stream lit acquisition. Did you see anything here that indicated there making strong progress there? Are you excited about that? You're sceptical. Go ahead. >>And I think it's like the last mile. Essentially. In other words, it's like, Okay, you have folks that are basically that are very, very comfortable with tableau. But you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency, um, to San James Point. I think part of it, this kind of plays into it is what makes this different from the ado Pere is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously to make put this native obviously snowflake acquired stream. Let's so we can expect that's extremely capabilities are going to be native. >>And the other thing, too, about the Hadoop ecosystem is Claudia had to help fund all those different projects and got really, really spread thin. I want to ask you guys about this super cloud we use. Super Cloud is this sort of metaphor for the next wave of cloud. You've got infrastructure aws, azure, Google. It's not multi cloud, but you've got that infrastructure you're building a layer on top of it that hides the underlying complexities of the primitives and the a p I s. And you're adding new value in this case, the data cloud or super data cloud. And now we're seeing now is that snowflake putting forth the notion that they're adding a super path layer. You can now build applications that you can monetise, which to me is kind of exciting. It makes makes this platform even less discretionary. We had a lot of talk on Wall Street about discretionary spending, and that's not discretionary. If you're monetising it, um, what do you guys think about that? Is this something that's that's real? Is it just a figment of my imagination, or do you see a different way of coming any thoughts on that? >>So, in effect, they're trying to become a data operating system, right? And I think that's wonderful. It's ambitious. I think they'll experience some success with that. As I said, applications are important. That's a great way to deliver information. You can monetise them, so you know there's there's a good economic model around it. I think they will still struggle, however, with bringing everything together onto one platform. That's always the challenge. Can you become the platform that's hard, hard to predict? You know, I think this is This is pretty exciting, right? A lot of energy, a lot of large ecosystem. There is a network effect already. Can they succeed in being the only place where data exists? You know, I think that's going to be a challenge. >>I mean, the fact is, I mean, this is a classic best of breed versus the umbrella play. The thing is, this is nothing new. I mean, this is like the you know, the old days with enterprise applications were basically oracle and ASAP vacuumed up all these. You know, all these applications in their in their ecosystem, whereas with snowflake is. And if you look at the cloud, folks, the hyper scale is still building out their own portfolios as well. Some are, You know, some hyper skills are more partner friendly than others. What? What Snowflake is saying is that we're going to give all of you folks who basically are competing against the hyper skills in various areas like data catalogue and pipelines and all that sort of wonderful stuff will make you basically, you know, all equal citizens. You know the burden is on you to basically we will leave. We will lay out the A P. I s Well, we'll allow you to basically, you know, integrate natively to us so you can provide as good experience. But the but the onus is on your back. >>Should the ecosystem be concerned, as they were back to reinvent 2014 that Amazon was going to nibble away at them or or is it different? >>I find what they're doing is different. Uh, for example, data sharing. They were the first ones out the door were data sharing at a large scale. And then everybody has jumped in and said, Oh, we also do data sharing. All the hyper scholars came in. But now what snowflake has done is they've taken it to the next level. Now they're saying it's not just data sharing. It's up sharing and not only up sharing. You can stream the thing you can build, test deploy, and then monetise it. Make it discoverable through, you know, through your marketplace >>you can monetise it. >>Yes. Yeah, so So I I think what they're doing is they are taking it a step further than what hyper scale as they are doing. And because it's like what they said is becoming like the data operating system You log in and you have all of these different functionalities you can do in machine learning. Now you can do data quality. You can do data preparation and you can do Monetisation. Who do you >>think is snowflakes? Biggest competitor? What do you guys think? It's a hard question, isn't it? Because you're like because we all get the we separate computer from storage. We have a cloud data and you go, Okay, that's nice, >>but there's, like, a crack. I think >>there's uniqueness. I >>mean, put it this way. In the old days, it would have been you know, how you know the prime household names. I think today is the hyper scholars and the idea what I mean again, this comes down to the best of breed versus by, you know, get it all from one source. So where is your comfort level? Um, so I think they're kind. They're their co op a Titian the hyper scale. >>Okay, so it's not data bricks, because why they're smaller. >>Well, there is some okay now within the best of breed area. Yes, there is competition. The obvious is data bricks coming in from the data engineering angle. You know, basically the snowflake coming from, you know, from the from the data analyst angle. I think what? Another potential competitor. And I think Snowflake, basically, you know, admitted as such potentially is mongo >>DB. Yeah, >>Exactly. So I mean, yes, there are two different levels of sort >>of a on a longer term collision course. >>Exactly. Exactly. >>Sort of service now and in salesforce >>thing that was that we actually get when I say that a lot of people just laughed. I was like, No, you're kidding. There's no way. I said Excuse me, >>But then you see Mongo last week. We're adding some analytics capabilities and always been developers, as you say, and >>they trashed sequel. But yet they finally have started to write their first real sequel. >>We have M c M Q. Well, now we have a sequel. So what >>were those numbers, >>Dave? Two thirds. One third. >>So the hyper scale is but the hyper scale urz are you going to trust your hyper scale is to do your cross cloud. I mean, maybe Google may be I mean, Microsoft, perhaps aws not there yet. Right? I mean, how important is cross cloud, multi cloud Super cloud Whatever you want to call it What is your data? >>Shows? Cloud is important if I remember correctly. Our research shows that three quarters of organisations are operating in the cloud and 52% are operating across more than one cloud. So, uh, two thirds of the organisations are in the cloud are doing multi cloud, so that's pretty significant. And now they may be operating across clouds for different reasons. Maybe one application runs in one cloud provider. Another application runs another cloud provider. But I do think organisations want that leverage over the hyper scholars right they want they want to be able to tell the hyper scale. I'm gonna move my workloads over here if you don't give us a better rate. Uh, >>I mean, I I think you know, from a database standpoint, I think you're right. I mean, they are competing against some really well funded and you look at big Query barely, you know, solid platform Red shift, for all its faults, has really done an amazing job of moving forward. But to David's point, you know those to me in any way. Those hyper skills aren't going to solve that cross cloud cloud problem, right? >>Right. No, I'm certainly >>not as quickly. No. >>Or with as much zeal, >>right? Yeah, right across cloud. But we're gonna operate better on our >>Exactly. Yes. >>Yes. Even when we talk about multi cloud, the many, many definitions, like, you know, you can mean anything. So the way snowflake does multi cloud and the way mongo db two are very different. So a snowflake says we run on all the hyper scalar, but you have to replicate your data. What Mongo DB is claiming is that one cluster can have notes in multiple different clouds. That is right, you know, quite something. >>Yeah, right. I mean, again, you hit that. We got to go. But, uh, last question, um, snowflake undervalued, overvalued or just about right >>in the stock market or in customers. Yeah. Yeah, well, but, you know, I'm not sure that's the right question. >>That's the question I'm asking. You know, >>I'll say the question is undervalued or overvalued for customers, right? That's really what matters. Um, there's a different audience. Who cares about the investor side? Some of those are watching, but But I believe I believe that the from the customer's perspective, it's probably valued about right, because >>the reason I I ask it, is because it has so hyped. You had $100 billion value. It's the past service now is value, which is crazy for this student Now. It's obviously come back quite a bit below its IPO price. So But you guys are at the financial analyst meeting. Scarpelli laid out 2029 projections signed up for $10 billion.25 percent free time for 20% operating profit. I mean, they better be worth more than they are today. If they do >>that. If I If I see the momentum here this week, I think they are undervalued. But before this week, I probably would have thought there at the right evaluation, >>I would say they're probably more at the right valuation employed because the IPO valuation is just such a false valuation. So hyped >>guys, I could go on for another 45 minutes. Thanks so much. David. Tony Sanjeev. Always great to have you on. We'll have you back for sure. Having us. All right. Thank you. Keep it right there. Were wrapping up Day two and the Cube. Snowflake. Summit 2022. Right back. Mm. Mhm.

Published Date : Jun 16 2022

SUMMARY :

What have you seen? And I also think that the native applications as part of the I've heard a lot of data mesh talk this week. seem to get away from it. It seems to be gathering momentum, but But what have you seen? but I think the idea that you can put all the data in one place which, And the thing is that but they I think where they're converging is the idea of operational that the data breaks is put out there. So, for example, you get the, you know, the micro partitioning. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I think it's different and I'll tell you why. But it sure is a lot easier to bring along those people who know sequel already the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. you know, data prep from, you know, don't I still have to have expertise? So so that way, you know, snowflakes done a great job of Did you see anything here that indicated there making strong is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously I want to ask you guys about this super cloud we Can you become the platform that's hard, hard to predict? I mean, this is like the you know, the old days with enterprise applications You can stream the thing you can build, test deploy, You can do data preparation and you can do We have a cloud data and you go, Okay, that's nice, I think I In the old days, it would have been you know, how you know the prime household names. You know, basically the snowflake coming from, you know, from the from the data analyst angle. Exactly. I was like, No, But then you see Mongo last week. But yet they finally have started to write their first real sequel. So what One third. So the hyper scale is but the hyper scale urz are you going to trust your hyper scale But I do think organisations want that leverage I mean, I I think you know, from a database standpoint, I think you're right. not as quickly. But we're gonna operate better on our Exactly. the hyper scalar, but you have to replicate your data. I mean, again, you hit that. but, you know, I'm not sure that's the right question. That's the question I'm asking. that the from the customer's perspective, it's probably valued about right, So But you guys are at the financial analyst meeting. But before this week, I probably would have thought there at the right evaluation, I would say they're probably more at the right valuation employed because the IPO valuation is just such Always great to have you on.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Frank Sullivan	PERSON	0.99+
Tony	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Tony Blair	PERSON	0.99+
Tony Sanjeev	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
David McGregor	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
$100 billion	QUANTITY	0.99+
Ventana Research	ORGANIZATION	0.99+
2013	DATE	0.99+
last week	DATE	0.99+
52%	QUANTITY	0.99+
Sanjeev Mohan Sanremo	PERSON	0.99+
more than one cloud	QUANTITY	0.99+
2014	DATE	0.99+
2029 projections	QUANTITY	0.99+
two companies	QUANTITY	0.99+
45 minutes	QUANTITY	0.99+
San James Point	LOCATION	0.99+
$10 billion.25 percent	QUANTITY	0.99+
one application	QUANTITY	0.99+
Odula	PERSON	0.99+
John Kerry	PERSON	0.99+
Python	TITLE	0.99+
Summit 2022	EVENT	0.99+
Data Warehouse	ORGANIZATION	0.99+
Snowflake	EVENT	0.98+
Scarpelli	PERSON	0.98+
Data Lake	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
this week	DATE	0.98+
today	DATE	0.98+
10 different tables	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
one	QUANTITY	0.97+
Apache	ORGANIZATION	0.97+
Day two	QUANTITY	0.97+
DB Inside	ORGANIZATION	0.96+
one place	QUANTITY	0.96+
one source	QUANTITY	0.96+
one third	QUANTITY	0.96+
Snowflake Summit 2022	EVENT	0.96+
One third	QUANTITY	0.95+
two thirds	QUANTITY	0.95+
Claudia	PERSON	0.94+
one time	QUANTITY	0.94+
one cloud provider	QUANTITY	0.94+
Two thirds	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
data lake	ORGANIZATION	0.92+
Snow Park	LOCATION	0.92+
Cloudera	ORGANIZATION	0.91+
two different levels	QUANTITY	0.91+
three	QUANTITY	0.91+
one cluster	QUANTITY	0.89+
single query	QUANTITY	0.87+
aws	ORGANIZATION	0.84+
first ones	QUANTITY	0.83+
Snowflake summit 2022	EVENT	0.83+
azure	ORGANIZATION	0.82+
mongo db	ORGANIZATION	0.82+
One	QUANTITY	0.81+
Eunice store	ORGANIZATION	0.8+
wave of	EVENT	0.78+
cloud	ORGANIZATION	0.77+
first real sequel	QUANTITY	0.77+
M c M Q.	PERSON	0.76+
Red shift	ORGANIZATION	0.74+
Anaconda	ORGANIZATION	0.73+
Snowflake	ORGANIZATION	0.72+
ASAP	ORGANIZATION	0.71+
Snow	ORGANIZATION	0.68+
snowflake	TITLE	0.66+
Park	TITLE	0.64+
Cube	COMMERCIAL_ITEM	0.63+
Apache	TITLE	0.63+
Mrr	PERSON	0.63+
senior vice president	PERSON	0.62+
Wall Street	ORGANIZATION	0.6+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

ENTITIES

Entity	Category	Confidence
Kristen Martin	PERSON	0.99+
Rob Hoff	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Dave	PERSON	0.99+
Zhamak	PERSON	0.99+
one	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 lakes	QUANTITY	0.99+
Sanji Mohan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Andrew	PERSON	0.99+
two	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Data Mesh: Delivering Data-Driven Value at Scale	TITLE	0.99+
Boston	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
14 plus years	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
second layer	QUANTITY	0.99+
2016	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
today	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
hundreds of lakes	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
theCUBE Studios	ORGANIZATION	0.98+
SQL	TITLE	0.98+
one unit	QUANTITY	0.98+
first	QUANTITY	0.98+
100 level	QUANTITY	0.98+
third point	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
three buckets	QUANTITY	0.98+
ETR	ORGANIZATION	0.98+
DevStack	TITLE	0.97+
One	QUANTITY	0.97+
wikibon.com	OTHER	0.97+
both	QUANTITY	0.97+
Thoughtworks	ORGANIZATION	0.96+
one set	QUANTITY	0.96+
one stream	QUANTITY	0.96+
Intuit	ORGANIZATION	0.95+
one way	QUANTITY	0.93+
two worlds	QUANTITY	0.93+
HelloFresh	ORGANIZATION	0.93+
this week	DATE	0.93+
last night	DATE	0.91+
fourth one	QUANTITY	0.91+
Snowflake	TITLE	0.91+
two different models	QUANTITY	0.91+
ML Analytics	TITLE	0.91+
Breaking Analysis	TITLE	0.87+
two worlds	QUANTITY	0.84+

Analyst Predictions 2022: The Future of Data Management

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

ENTITIES

Entity	Category	Confidence
381 databases	QUANTITY	0.99+
2014	DATE	0.99+
2022	DATE	0.99+
2021	DATE	0.99+
january of 2022	DATE	0.99+
100 users	QUANTITY	0.99+
jamal dagani	PERSON	0.99+
last week	DATE	0.99+
dave meninger	PERSON	0.99+
sanji	PERSON	0.99+
second question	QUANTITY	0.99+
15 converged data stores	QUANTITY	0.99+
dave vellante	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
three	QUANTITY	0.99+
sanjeev	PERSON	0.99+
2023	DATE	0.99+
15 data stores	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
last year	DATE	0.99+
sanjeev mohan	PERSON	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
carl	PERSON	0.99+
tony	PERSON	0.99+
carl olufsen	PERSON	0.99+
six years	QUANTITY	0.99+
david	PERSON	0.99+
carlos specter	PERSON	0.98+
both sides	QUANTITY	0.98+
2010s	DATE	0.98+
first backlash	QUANTITY	0.98+
five years	QUANTITY	0.98+
today	DATE	0.98+
dave	PERSON	0.98+
each	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
first	QUANTITY	0.98+
single platform	QUANTITY	0.98+
lake house	ORGANIZATION	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
doug	PERSON	0.97+
one word	QUANTITY	0.97+
this year	DATE	0.97+
wikibon.com	OTHER	0.97+
one platform	QUANTITY	0.97+
39	QUANTITY	0.97+
about 600 percent	QUANTITY	0.97+
two analysts	QUANTITY	0.97+
ten years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
five	QUANTITY	0.96+
one	QUANTITY	0.96+
three quarters	QUANTITY	0.96+
california	LOCATION	0.96+
google	ORGANIZATION	0.96+
single	QUANTITY	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Mohan: