Image Title

Search Results for Delta virus:

Breaking Analysis: Databricks faces critical strategic decisions…here’s why


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

David FloyerPERSON

0.99+

Mike OlsonPERSON

0.99+

2014DATE

0.99+

George GilbertPERSON

0.99+

Dave VellantePERSON

0.99+

GeorgePERSON

0.99+

Cheryl KnightPERSON

0.99+

Ken SchiffmanPERSON

0.99+

Andy JassyPERSON

0.99+

OracleORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Erik BradleyPERSON

0.99+

DavePERSON

0.99+

UberORGANIZATION

0.99+

thousandsQUANTITY

0.99+

Sun MicrosystemsORGANIZATION

0.99+

50 yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

Bob MugliaPERSON

0.99+

GartnerORGANIZATION

0.99+

AirbnbORGANIZATION

0.99+

60 yearsQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Ali GhodsiPERSON

0.99+

2010DATE

0.99+

DatabricksORGANIZATION

0.99+

Kristin MartinPERSON

0.99+

Rob HofPERSON

0.99+

threeQUANTITY

0.99+

15 yearsQUANTITY

0.99+

Databricks'ORGANIZATION

0.99+

two placesQUANTITY

0.99+

BostonLOCATION

0.99+

Tristan HandyPERSON

0.99+

M&AORGANIZATION

0.99+

Frank QuattronePERSON

0.99+

second elementQUANTITY

0.99+

Daren BrabhamPERSON

0.99+

TechAlpha PartnersORGANIZATION

0.99+

third elementQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

50 yearQUANTITY

0.99+

40%QUANTITY

0.99+

ClouderaORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

five yearsQUANTITY

0.99+

Robert Nishihara, Anyscale | AWS Startup Showcase S3 E1


 

(upbeat music) >> Hello everyone. Welcome to theCube's presentation of the "AWS Startup Showcase." The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three, episode one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. I'm excited I'm joined today by Robert Nishihara, who's the co-founder and CEO of a hot startup called Anyscale. He's here to talk about Ray, the open source project, Anyscale's infrastructure for foundation as well. Robert, thank you for joining us today. >> Yeah, thanks so much as well. >> I've been following your company since the founding pre pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with ChatGPT and OpenAI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream, so I think you guys are really well positioned. I'm looking forward to to talking with you today. But before we get into it, introduce the core mission for Anyscale. Why do you guys exist? What is the North Star for Anyscale? >> Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. You know, I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or you know, these companies getting a lot of value from AI, were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the Cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in, you know, in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that, or don't make it to production, is the need for this scale, the infrastructure lift, to actually make it happen. So our goal here with Anyscale and Ray, is to make that easy, is to make scalable computing easy. So that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like, all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. >> That programming example of how easy it is with Python reminds me of the early days of Cloud, when infrastructure as code was talked about was, it was just code the infrastructure programmable. That's super important. That's what AI people wanted, first program AI. That's the new trend. And I want to understand, if you don't mind explaining, the relationship that Anyscale has to these foundational models and particular the large language models, also called LLMs, was seen with like OpenAI and ChatGPT. Before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? >> Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then, of course, you know, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them. Okay, so Ray and Anyscale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models. Or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune, that, you know, don't want to train the large underlying foundation models, but that do want to fine tune them, do want to adapt them to their purposes, and build products around them and serve them, those are also using Ray and Anyscale for that fine tuning and that serving. And so the reason that Ray and Anyscale are important here is that, you know, building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources. And to actually take advantage of that and actually build these scalable applications, there's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and Anyscale to take care of that and manage the infrastructure and solve those infrastructure problems. Or you can build the infrastructure and manage the infrastructure yourself, which you can do, but it's going to slow your team down. It's going to, you know, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. They want to focus on product development and move faster. >> I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point, doing it yourself, hard to do. These are things where opportunities are and the Cloud did that with data centers. Turned a data center and made it an API. The heavy lifting went away and went to the Cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening that you guys are taking the learnings and making that available so people don't have to do that? >> That's exactly right. So today, if you want to succeed with AI, if you want to use AI in your business, infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point, and you know, with Ray and Anyscale, we're going to remove the infrastructure from the critical path so that as a developer or as a business, all you need to focus on is your application logic, what you want the the program to do, what you want your application to do, how you want the AI to actually interface with the rest of your product. Now the way that will happen is that Ray and Anyscale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray in Anyscale. And so I think something like this is really necessary for AI to reach its potential, for AI to have the impact and the reach that we think it will, you have to make it easier to do. >> And just for clarification to point out, if you don't mind explaining the relationship of Ray and Anyscale real quick just before we get into the presentation. >> So Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that, in order to provide an easy, a simple open source tool for building and running scalable applications. And Anyscale is the managed version of Ray, basically we will run Ray for you in the Cloud, provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. >> Awesome. I know you got a presentation on Ray and Anyscale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away and then when you're done presenting, we'll come back, I'll probably grill you with a few questions and then we'll close it out so take it away. >> Robert: Sounds great. So I'll say a little bit about how companies are using Ray and Anyscale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation, the underlying trend here, and this is a plot from OpenAI, is that the amount of compute needed to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers. But the point is, no matter how you slice and dice it, it' a astronomical rate. Now if you compare that to something we're all familiar with, like Moore's Law, which says that, you know, the processor performance doubles every roughly 18 months, you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications, and what you can do with a single chip, right. So even if Moore's Law were continuing strong and you know, doing what it used to be doing, even if that were the case, there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph, what we've seen, and what has been clear to us since we started this company, is that doing AI requires scaling. There's no way around it. It's not a nice to have, it's really a requirement. And so that led us to start Ray, which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project, it's been adopted by a tremendous number of companies. Companies like OpenAI, which use Ray to train their large models like ChatGPT, companies like Uber, which run all of their deep learning and classical machine learning on top of Ray, companies like Shopify or Spotify or Instacart or Lyft or Netflix, ByteDance, which use Ray for their machine learning infrastructure. Companies like Ant Group, which makes Alipay, you know, they use Ray across the board for fraud detection, for online learning, for detecting money laundering, you know, for graph processing, stream processing. Companies like Amazon, you know, run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since, over the past few years. And one of the most exciting use cases is really providing the infrastructure for building training, fine tuning, and serving foundation models. So I'll say a little bit about, you know, here are some examples of companies using Ray for foundation models. Cohere trains large language models. OpenAI also trains large language models. You can think about the workloads required there are things like supervised pre-training, also reinforcement learning from human feedback. So this is not only the regular supervised learning, but actually more complex reinforcement learning workloads that take human input about what response to a particular question, you know is better than a certain other response. And incorporating that into the learning. There's open source versions as well, like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects in organizations, training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level, there's the core Ray system. This is essentially low level primitives for building scalable Python applications. Things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system, what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries, scalable libraries for ingesting and pre-processing data, for training your models, for fine tuning those models, for hyper parameter tuning, for doing batch processing and batch inference, for doing model serving and deployment, right. And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right. They want to load their data and feed that into training. And Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray, the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature. The fact that it is common infrastructure for scaling arbitrary workloads, from data ingest to pre-processing to training to inference and serving, right. This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities. If they want to do reinforcement learning, if they want to use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray, being future proof and being flexible and general gives them that ability. Another reason people choose Ray in Anyscale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray, you know, making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale, you know, training to scale data ingest, pre-processing and so on. So scalability and performance, you know, are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any Cloud provider. Google, you know, Google Cloud, AWS, Asure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or JAX or XG Boost or Hugging Face or PyTorch Lightning, right, or Scikit-learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tooling in the machine learning ecosystem. That's things like weights and biases or ML flow, right. Or you know, different data platforms like Databricks, you know, Delta Lake or Snowflake or tools for model monitoring for feature stores, all of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then Anyscale is the scalable compute platform that's built on top, you know, that provides Ray. So Anyscale is a managed Ray service that runs in the Cloud. And what Anyscale does is it offers the best way to run Ray. And if you think about what you get with Anyscale, there are fundamentally two things. One is about moving faster, accelerating the time to market. And you get that by having the managed service so that as a developer you don't have to worry about managing infrastructure, you don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows. Things like easily moving from development to production, things like having the observability tooling, the debug ability to actually easily diagnose what's going wrong in a distributed application. So things like the dashboards and the other other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose Anyscale is superior infrastructure. So this is things like, you know, cost deficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling. Things like just overall better performance and faster scheduling. And so these are the kinds of things that Anyscale provides on top of Ray. It's the managed infrastructure. It's fast, it's like the developer productivity and velocity as well as performance. So this is what I wanted to share about Ray in Anyscale. >> John: Awesome. >> Provide that context. But John, I'm curious what you think. >> I love it. I love the, so first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an Anyscale platform, not- >> That's right. >> Tools. So you got tools in the platform. Okay, that's key. Love that managed service. Just curious, you mentioned Python multiple times, is that because of PyTorch and TensorFlow or Python's the most friendly with machine learning or it's because it's very common amongst all developers? >> That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, Ray is actually designed in a language agnostic way and there are companies out there that use Ray to build scalable Java applications. But for the most part right now we're focused on Python and being the best way to build these scalable Python and machine learning applications. But, of course, down the road there always is that potential. >> So if you're slinging Python code out there and you're watching that, you're watching this video, get on Anyscale bus quickly. Also, I just, while you were giving the presentation, I couldn't help, since you mentioned OpenAI, which by the way, congratulations 'cause they've had great scale, I've noticed in their rapid growth 'cause they were the fastest company to the number of users than anyone in the history of the computer industry, so major successor, OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning, so congratulations. But I actually typed into ChatGPT, what are the top three benefits of Anyscale and came up with scalability, flexibility, and ease of use. Obviously, scalability is what you guys are called. >> That's pretty good. >> So that's what they came up with. So they nailed it. Did you have an inside prompt training, buy it there? Only kidding. (Robert laughs) >> Yeah, we hard coded that one. >> But that's the kind of thing that came up really, really quickly if I asked it to write a sales document, it probably will, but this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer, to be more human, more natural. And this is clearly will be in every application in the future. >> Absolutely. This is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just a chat bot that you talk to. This is going to be how you get things done, right. How you use your web browser or how you use, you know, how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. >> This is going to be one of those things, we're going to look back at this time Robert and saying, "Yeah, from that company, that was the beginning of that wave." And just like AWS and Cloud Computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing and that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those, Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, "Okay, I love it. Let's go play around." But what's it going to cost me? Am I going to be tied to certain GPUs? What's the landscape look like from an operational standpoint, from the customer? Are they locked in and the benefit was flexibility, are you flexible to handle any Cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? >> Cost is super important here and many of the companies, I mean, companies are spending a huge amount on their Cloud computing, on AWS, and on doing AI, right. And I think a lot of the advantage of Anyscale, what we can provide here is not only better performance, but cost efficiency. Because if we can run something faster and more efficiently, it can also use less resources and you can lower your Cloud spending, right. We've seen companies go from, you know, 20% GPU utilization with their current setup and the current tools they're using to running on Anyscale and getting more like 95, you know, 100% GPU utilization. That's something like a five x improvement right there. So depending on the kind of application you're running, you know, it's a significant cost savings. We've seen companies that have, you know, processing petabytes of data every single day with Ray going from, you know, getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And when you have applications that are spending, you know, potentially $100 million a year and getting a 10 X cost savings is just absolutely enormous. So these are some of the kinds of- >> Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the Cloud, you got infrastructure, you got the platform, you got SaaS, same kind of thing's going to go on in AI. So I want to get into that, you know, ROI discussion and some of the impact with your customers that are leveraging the platform. But first I hear you got a demo. >> Robert: Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the Anyscale UI. I've started a little Anyscale Workspace. So Workspaces are the Anyscale concept for interactive developments, right. So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on Anyscale. And I'm just going to kick this off. This is going to train a large language model, so OPT. And it's doing this on 32 GPUs. We've got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change when I launch the Workspace. And what I can do is I can pull up VS code, right. Remember this is the interactive development experience. I can look at the actual code. Here it's using Ray train to train the torch model. We've got the training loop and we're saying that each worker gets access to one GPU and four CPU cores. And, of course, as I make the model larger, this is using deep speed, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right. And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or a different, you know, accelerator type, again, this is just a one line change. And here we're using Ray train to train the models, just taking my vanilla PyTorch model using Hugging Face and then scaling that across a bunch of GPUs. And, of course, if I want to look at the dashboard, I can go to the Ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at, you know, the CPU utilization here where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about Anyscale, both I can get that interactive development experience with VS code. You know, I can look at the dashboards. I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can, with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, you know, being able to take advantage of all the resources in the Cloud to scale. And it's like when, you know, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if they're idle. But imagine you go to sleep, I have this big cluster. You can turn it off, shut off the cluster, come back tomorrow, restart the Workspace, and you know, your big cluster is back up and all of your code changes are still there. All of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience we want to provide for our users. So that's what I wanted to share with you. >> Well, I think that whole, couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean human error is a big deal. People pass out at their computer. They've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on or your water running in your house. It's just, at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle. But you know, data cranking the models is doing, that's a big point. >> Another thing I want to add there about cost efficiency is that we make it really easy to use, if you're running on Anyscale, to use spot instances and these preemptable instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using Anyscale and they go from not using these spot instances 'cause they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. >> You know, this was my whole, my feature article at Reinvent last year when I met with Adam Selipsky, this next gen Cloud is here. I mean, it's not auto scale, it's infrastructure scale. It's agility. It's flexibility. I think this is where the world needs to go. Almost what DevOps did for Cloud and what you were showing me that demo had this whole SRE vibe. And remember Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. I mean, I might be a little bit off base there, but how would you explain it? >> It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure. Where developers only think about their application logic. And where businesses can do AI, can succeed with AI, and build these scalable applications, but they don't have to build, you know, an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code, on their application logic, and run the stuff out of the box. >> Awesome. Well, I appreciate the time. Before we wrap up here, give a plug for the company. I know you got a couple websites. Again, go, Ray's got its own website. You got Anyscale. You got an event coming up. Give a plug for the company looking to hire. Put a plug in for the company. >> Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right. We can be the infrastructure that enables all of that to happen, that makes it easy for companies to succeed with AI, and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Our adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models. But really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. You know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. And all of these companies can get value out of AI, can really use AI to improve their businesses. So if you're interested in learning more about Ray and Anyscale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if your business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them, and build applications and products around them, give us a call, talk to us. You know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering, product, go-to-market, and it's an exciting time. >> Robert Nishihara, co-founder and CEO of Anyscale, congratulations on a great company you've built and continuing to iterate on and you got growth ahead of you, you got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGPT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable so we're going to need that infrastructure. So thanks for coming on this season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top startups building foundational model infrastructure for AI and ML. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

episode one of the ongoing and you guys really had and other resources in the Cloud. and particular the large language and what you want to achieve. and the Cloud did that with data centers. the point, and you know, if you don't mind explaining and managing the infrastructure and you guys are positioning is that the amount of compute needed to do But John, I'm curious what you think. because that's the platform So you got tools in the platform. and being the best way to of the computer industry, Did you have an inside prompt and the large language models and tell it what you want it to do. So I have to ask you and you can lower your So I want to get into that, you know, and you know, your big cluster is back up So I think, you know, the on-demand instances. and what you were showing me that demo and run the stuff out of the box. I know you got a couple websites. and the opportunity is there, right. and you got growth ahead

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Robert NishiharaPERSON

0.99+

JohnPERSON

0.99+

RobertPERSON

0.99+

John FurrierPERSON

0.99+

NetflixORGANIZATION

0.99+

35 timesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

$100 millionQUANTITY

0.99+

UberORGANIZATION

0.99+

AWSORGANIZATION

0.99+

100%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

Ant GroupORGANIZATION

0.99+

firstQUANTITY

0.99+

PythonTITLE

0.99+

20%QUANTITY

0.99+

32 GPUsQUANTITY

0.99+

LyftORGANIZATION

0.99+

hundredsQUANTITY

0.99+

tomorrowDATE

0.99+

AnyscaleORGANIZATION

0.99+

threeQUANTITY

0.99+

128QUANTITY

0.99+

SeptemberDATE

0.99+

todayDATE

0.99+

Moore's LawTITLE

0.99+

Adam SelipskyPERSON

0.99+

PyTorchTITLE

0.99+

RayORGANIZATION

0.99+

second reasonQUANTITY

0.99+

64QUANTITY

0.99+

each workerQUANTITY

0.99+

each workerQUANTITY

0.99+

PhotoshopTITLE

0.99+

UC BerkeleyORGANIZATION

0.99+

JavaTITLE

0.99+

ShopifyORGANIZATION

0.99+

OpenAIORGANIZATION

0.99+

AnyscalePERSON

0.99+

thirdQUANTITY

0.99+

two thingsQUANTITY

0.99+

ByteDanceORGANIZATION

0.99+

SpotifyORGANIZATION

0.99+

OneQUANTITY

0.99+

95QUANTITY

0.99+

AsureORGANIZATION

0.98+

one lineQUANTITY

0.98+

one GPUQUANTITY

0.98+

ChatGPTTITLE

0.98+

TensorFlowTITLE

0.98+

last yearDATE

0.98+

first bucketQUANTITY

0.98+

bothQUANTITY

0.98+

two layersQUANTITY

0.98+

CohereORGANIZATION

0.98+

AlipayORGANIZATION

0.98+

RayPERSON

0.97+

oneQUANTITY

0.97+

InstacartORGANIZATION

0.97+

Daren Brabham & Erik Bradley | What the Spending Data Tells us About Supercloud


 

(gentle synth music) (music ends) >> Welcome back to Supercloud 2, an open industry collaboration between technologists, consultants, analysts, and of course practitioners to help shape the future of cloud. At this event, one of the key areas we're exploring is the intersection of cloud and data. And how building value on top of hyperscale clouds and across clouds is evolving, a concept of course we call "Supercloud". And we're pleased to welcome our friends from Enterprise Technology research, Erik Bradley and Darren Brabham. Guys, thanks for joining us, great to see you. we love to bring the data into these conversations. >> Thank you for having us, Dave, I appreciate it. >> Yeah, thanks. >> You bet. And so, let me do the setup on what is Supercloud. It's a concept that we've floated, Before re:Invent 2021, based on the idea that cloud infrastructure is becoming ubiquitous, incredibly powerful, but there's a lack of standards across the big three clouds. That creates friction. So we defined over the period of time, you know, better part of a year, a set of essential elements, deployment models for so-called supercloud, which create this common experience for specific cloud services that, of course, again, span multiple clouds and even on-premise data. So Erik, with that as background, I wonder if you could add your general thoughts on the term supercloud, maybe play proxy for the CIO community, 'cause you do these round tables, you talk to these guys all the time, you gather a lot of amazing information from senior IT DMs that compliment your survey. So what are your thoughts on the term and the concept? >> Yeah, sure. I'll even go back to last year when you and I did our predictions panel, right? And we threw it out there. And to your point, you know, there's some haters. Anytime you throw out a new term, "Is it marketing buzz? Is it worth it? Why are you even doing it?" But you know, from my own perspective, and then also speaking to the IT DMs that we interview on a regular basis, this is just a natural evolution. It's something that's inevitable in enterprise tech, right? The internet was not built for what it has become. It was never intended to be the underlying infrastructure of our daily lives and work. The cloud also was not built to be what it's become. But where we're at now is, we have to figure out what the cloud is and what it needs to be to be scalable, resilient, secure, and have the governance wrapped around it. And to me that's what supercloud is. It's a way to define operantly, what the next generation, the continued iteration and evolution of the cloud and what its needs to be. And that's what the supercloud means to me. And what depends, if you want to call it metacloud, supercloud, it doesn't matter. The point is that we're trying to define the next layer, the next future of work, which is inevitable in enterprise tech. Now, from the IT DM perspective, I have two interesting call outs. One is from basically a senior developer IT architecture and DevSecOps who says he uses the term all the time. And the reason he uses the term, is that because multi-cloud has a stigma attached to it, when he is talking to his business executives. (David chuckles) the stigma is because it's complex and it's expensive. So he switched to supercloud to better explain to his business executives and his CFO and his CIO what he's trying to do. And we can get into more later about what it means to him. But the inverse of that, of course, is a good CSO friend of mine for a very large enterprise says the concern with Supercloud is the reduction of complexity. And I'll explain, he believes anything that takes the requirement of specific expertise out of the equation, even a little bit, as a CSO worries him. So as you said, David, always two sides to the coin, but I do believe supercloud is a relevant term, and it is necessary because the cloud is continuing to be defined. >> You know, that's really interesting too, 'cause you know, Darren, we use Snowflake a lot as an example, sort of early supercloud, and you think from a security standpoint, we've always pushed Amazon and, "Are you ever going to kind of abstract the complexity away from all these primitives?" and their position has always been, "Look, if we produce these primitives, and offer these primitives, we we can move as the market moves. When you abstract, then it becomes harder to peel the layers." But Darren, from a data standpoint, like I say, we use Snowflake a lot. I think of like Tim Burners-Lee when Web 2.0 came out, he said, "Well this is what the internet was always supposed to be." So in a way, you know, supercloud is maybe what multi-cloud was supposed to be. But I mean, you think about data sharing, Darren, across clouds, it's always been a challenge. Snowflake always, you know, obviously trying to solve that problem, as are others. But what are your thoughts on the concept? >> Yeah, I think the concept fits, right? It is reflective of, it's a paradigm shift, right? Things, as a pendulum have swung back and forth between needing to piece together a bunch of different tools that have specific unique use cases and they're best in breed in what they do. And then focusing on the duct tape that holds 'em all together and all the engineering complexity and skill, it shifted from that end of the pendulum all the way back to, "Let's streamline this, let's simplify it. Maybe we have budget crunches and we need to consolidate tools or eliminate tools." And so then you kind of see this back and forth over time. And with data and analytics for instance, a lot of organizations were trying to bring the data closer to the business. That's where we saw self-service analytics coming in. And tools like Snowflake, what they did was they helped point to different databases, they helped unify data, and organize it in a single place that was, you know, in a sense neutral, away from a single cloud vendor or a single database, and allowed the business to kind of be more flexible in how it brought stuff together and provided it out to the business units. So Snowflake was an example of one of those times where we pulled back from the granular, multiple points of the spear, back to a simple way to do things. And I think Snowflake has continued to kind of keep that mantle to a degree, and we see other tools trying to do that, but that's all it is. It's a paradigm shift back to this kind of meta abstraction layer that kind of simplifies what is the reality, that you need a complex multi-use case, multi-region way of doing business. And it sort of reflects the reality of that. >> And you know, to me it's a spectrum. As part of Supercloud 2, we're talking to a number of of practitioners, Ionis Pharmaceuticals, US West, we got Walmart. And it's a spectrum, right? In some cases the practitioner's saying, "You know, the way I solve multi-cloud complexity is mono-cloud, I just do one cloud." (laughs) Others like Walmart are saying, "Hey, you know, we actually are building an abstraction layer ourselves, take advantage of it." So my general question to both of you is, is this a concept, is the lack of standards across clouds, you know, really a problem, you know, or is supercloud a solution looking for a problem? Or do you hear from practitioners that "No, this is really an issue, we have to bring together a set of standards to sort of unify our cloud estates." >> Allow me to answer that at a higher level, and then we're going to hand it over to Dr. Brabham because he is a little bit more detailed on the realtime streaming analytics use cases, which I think is where we're going to get to. But to answer that question, it really depends on the size and the complexity of your business. At the very large enterprise, Dave, Yes, a hundred percent. This needs to happen. There is complexity, there is not only complexity in the compute and actually deploying the applications, but the governance and the security around them. But for lower end or, you know, business use cases, and for smaller businesses, it's a little less necessary. You certainly don't need to have all of these. Some of the things that come into mind from the interviews that Darren and I have done are, you know, financial services, if you're doing real-time trading, anything that has real-time data metrics involved in your transactions, is going to be necessary. And another use case that we hear about is in online travel agencies. So I think it is very relevant, the complexity does need to be solved, and I'll allow Darren to explain a little bit more about how that's used from an analytics perspective. >> Yeah, go for it. >> Yeah, exactly. I mean, I think any modern, you know, multinational company that's going to have a footprint in the US and Europe, in China, or works in different areas like manufacturing, where you're probably going to have on-prem instances that will stay on-prem forever, for various performance reasons. You have these complicated governance and security and regulatory issues. So inherently, I think, large multinational companies and or companies that are in certain areas like finance or in, you know, online e-commerce, or things that need real-time data, they inherently are going to have a very complex environment that's going to need to be managed in some kind of cleaner way. You know, they're looking for one door to open, one pane of glass to look at, one thing to do to manage these multi points. And, streaming's a good example of that. I mean, not every organization has a real-time streaming use case, and may not ever, but a lot of organizations do, a lot of industries do. And so there's this need to use, you know, they want to use open-source tools, they want to use Apache Kafka for instance. They want to use different megacloud vendors offerings, like Google Pub/Sub or you know, Amazon Kinesis Firehose. They have all these different pieces they want to use for different use cases at different stages of maturity or proof of concept, you name it. They're going to have to have this complexity. And I think that's why we're seeing this need, to have sort of this supercloud concept, to juggle all this, to wrangle all of it. 'Cause the reality is, it's complex and you have to simplify it somehow. >> Great, thanks you guys. All right, let's bring up the graphic, and take a look. Anybody who follows the breaking analysis, which is co-branded with ETR Cube Insights powered by ETR, knows we like to bring data to the table. ETR does amazing survey work every quarter, 1200 plus 1500 practitioners that that answer a number of questions. The vertical axis here is net score, which is ETR's proprietary methodology, which is a measure of spending momentum, spending velocity. And the horizontal axis here is overlap, but it's the presence pervasiveness, and the dataset, the ends, that table insert on the bottom right shows you how the dots are plotted, the net score and then the ends in the survey. And what we've done is we've plotted a bunch of the so-called supercloud suspects, let's start in the upper right, the cloud platforms. Without these hyperscale clouds, you can't have a supercloud. And as always, Azure and AWS, up and to the right, it's amazing we're talking about, you know, 80 plus billion dollar company in AWS. Azure's business is, if you just look at the IaaS is in the 50 billion range, I mean it's just amazing to me the net scores here. Anything above 40% we consider highly elevated. And you got Azure and you got Snowflake, Databricks, HashiCorp, we'll get to them. And you got AWS, you know, right up there at that size, it's quite amazing. With really big ends as well, you know, 700 plus ends in the survey. So, you know, kind of half the survey actually has these platforms. So my question to you guys is, what are you seeing in terms of cloud adoption within the big three cloud players? I wonder if you could could comment, maybe Erik, you could start. >> Yeah, sure. Now we're talking data, now I'm happy. So yeah, we'll get into some of it. Right now, the January, 2023 TSIS is approaching 1500 survey respondents. One caveat, it's not closed yet, it will close on Friday, but with an end that big we are over statistically significant. We also recently did a cloud survey, and there's a couple of key points on that I want to get into before we get into individual vendors. What we're seeing here, is that annual spend on cloud infrastructure is expected to grow at almost a 70% CAGR over the next three years. The percentage of those workloads for cloud infrastructure are expected to grow over 70% as three years as well. And as you mentioned, Azure and AWS are still dominant. However, we're seeing some share shift spreading around a little bit. Now to get into the individual vendors you mentioned about, yes, Azure is still number one, AWS is number two. What we're seeing, which is incredibly interesting, CloudFlare is number three. It's actually beating GCP. That's the first time we've seen it. What I do want to state, is this is on net score only, which is our measure of spending intentions. When you talk about actual pervasion in the enterprise, it's not even close. But from a spending velocity intention point of view, CloudFlare is now number three above GCP, and even Salesforce is creeping up to be at GCPs level. So what we're seeing here, is a continued domination by Azure and AWS, but some of these other players that maybe might fit into your moniker. And I definitely want to talk about CloudFlare more in a bit, but I'm going to stop there. But what we're seeing is some of these other players that fit into your Supercloud moniker, are starting to creep up, Dave. >> Yeah, I just want to clarify. So as you also know, we track IaaS and PaaS revenue and we try to extract, so AWS reports in its quarterly earnings, you know, they're just IaaS and PaaS, they don't have a SaaS play, a little bit maybe, whereas Microsoft and Google include their applications and so we extract those out and if you do that, AWS is bigger, but in the surveys, you know, customers, they see cloud, SaaS to them as cloud. So that's one of the reasons why you see, you know, Microsoft as larger in pervasion. If you bring up that survey again, Alex, the survey results, you see them further to the right and they have higher spending momentum, which is consistent with what you see in the earnings calls. Now, interesting about CloudFlare because the CEO of CloudFlare actually, and CloudFlare itself uses the term supercloud basically saying, "Hey, we're building a new type of internet." So what are your thoughts? Do you have additional information on CloudFlare, Erik that you want to share? I mean, you've seen them pop up. I mean this is a really interesting company that is pretty forward thinking and vocal about how it's disrupting the industry. >> Sure, we've been tracking 'em for a long time, and even from the disruption of just a traditional CDN where they took down Akamai and what they're doing. But for me, the definition of a true supercloud provider can't just be one instance. You have to have multiple. So it's not just the cloud, it's networking aspect on top of it, it's also security. And to me, CloudFlare is the only one that has all of it. That they actually have the ability to offer all of those things. Whereas you look at some of the other names, they're still piggybacking on the infrastructure or platform as a service of the hyperscalers. CloudFlare does not need to, they actually have the cloud, the networking, and the security all themselves. So to me that lends credibility to their own internal usage of that moniker Supercloud. And also, again, just what we're seeing right here that their net score is now creeping above AGCP really does state it. And then just one real last thing, one of the other things we do in our surveys is we track adoption and replacement reasoning. And when you look at Cloudflare's adoption rate, which is extremely high, it's based on technical capabilities, the breadth of their feature set, it's also based on what we call the ability to avoid stack alignment. So those are again, really supporting reasons that makes CloudFlare a top candidate for your moniker of supercloud. >> And they've also announced an object store (chuckles) and a database. So, you know, that's going to be, it takes a while as you well know, to get database adoption going, but you know, they're ambitious and going for it. All right, let's bring the chart back up, and I want to focus Darren in on the ecosystem now, and really, we've identified Snowflake and Databricks, it's always fun to talk about those guys, and there are a number of other, you know, data platforms out there, but we use those too as really proxies for leaders. We got a bunch of the backup guys, the data protection folks, Rubric, Cohesity, and Veeam. They're sort of in a cluster, although Rubric, you know, ahead of those guys in terms of spending momentum. And then VMware, Tanzu and Red Hat as sort of the cross cloud platform. But I want to focus, Darren, on the data piece of it. We're seeing a lot of activity around data sharing, governed data sharing. Databricks is using Delta Sharing as their sort of place, Snowflakes is sort of this walled garden like the app store. What are your thoughts on, you know, in the context of Supercloud, cross cloud capabilities for the data platforms? >> Yeah, good question. You know, I think Databricks is an interesting player because they sort of have made some interesting moves, with their Data Lakehouse technology. So they're trying to kind of complicate, or not complicate, they're trying to take away the complications of, you know, the downsides of data warehousing and data lakes, and trying to find that middle ground, where you have the benefits of a managed, governed, you know, data warehouse environment, but you have sort of the lower cost, you know, capability of a data lake. And so, you know, Databricks has become really attractive, especially by data scientists, right? We've been tracking them in the AI machine learning sector for quite some time here at ETR, attractive for a data scientist because it looks and acts like a lake, but can have some managed capabilities like a warehouse. So it's kind of the best of both worlds. So in some ways I think you've seen sort of a data science driver for the adoption of Databricks that has now become a little bit more mainstream across the business. Snowflake, maybe the other direction, you know, it's a cloud data warehouse that you know, is starting to expand its capabilities and add on new things like Streamlit is a good example in the analytics space, with apps. So you see these tools starting to branch and creep out a bit, but they offer that sort of neutrality, right? We heard one IT decision maker we recently interviewed that referred to Snowflake and Databricks as the quote unquote Switzerland of what they do. And so there's this desirability from an organization to find these tools that can solve the complex multi-headed use-case of data and analytics, which every business unit needs in different ways. And figure out a way to do that, an elegant way that's governed and centrally managed, that federated kind of best of both worlds that you get by bringing the data close to the business while having a central governed instance. So these tools are incredibly powerful and I think there's only going to be room for growth, for those two especially. I think they're going to expand and do different things and maybe, you know, join forces with others and a lot of the power of what they do well is trying to define these connections and find these partnerships with other vendors, and try to be seen as the nice add-on to your existing environment that plays nicely with everyone. So I think that's where those two tools are going, but they certainly fit this sort of label of, you know, trying to be that supercloud neutral, you know, layer that unites everything. >> Yeah, and if you bring the graphic back up, please, there's obviously big data plays in each of the cloud platforms, you know, Microsoft, big database player, AWS is, you know, 11, 12, 15, data stores. And of course, you know, BigQuery and other, you know, data platforms within Google. But you know, I'm not sure the big cloud guys are going to go hard after so-called supercloud, cross-cloud services. Although, we see Oracle getting in bed with Microsoft and Azure, with a database service that is cross-cloud, certainly Google with Anthos and you know, you never say never with with AWS. I guess what I would say guys, and I'll I'll leave you with this is that, you know, just like all players today are cloud players, I feel like anybody in the business or most companies are going to be so-called supercloud players. In other words, they're going to have a cross-cloud strategy, they're going to try to build connections if they're coming from on-prem like a Dell or an HPE, you know, or Pure or you know, many of these other companies, Cohesity is another one. They're going to try to connect to their on-premise states, of course, and create a consistent experience. It's natural that they're going to have sort of some consistency across clouds. You know, the big question is, what's that spectrum look like? I think on the one hand you're going to have some, you know, maybe some rudimentary, you know, instances of supercloud or maybe they just run on the individual clouds versus where Snowflake and others and even beyond that are trying to go with a single global instance, basically building out what I would think of as their own cloud, and importantly their own ecosystem. I'll give you guys the last thought. Maybe you could each give us, you know, closing thoughts. Maybe Darren, you could start and Erik, you could bring us home on just this entire topic, the future of cloud and data. >> Yeah, I mean I think, you know, two points to make on that is, this question of these, I guess what we'll call legacy on-prem players. These, mega vendors that have been around a long time, have big on-prem footprints and a lot of people have them for that reason. I think it's foolish to assume that a company, especially a large, mature, multinational company that's been around a long time, it's foolish to think that they can just uproot and leave on-premises entirely full scale. There will almost always be an on-prem footprint from any company that was not, you know, natively born in the cloud after 2010, right? I just don't think that's reasonable anytime soon. I think there's some industries that need on-prem, things like, you know, industrial manufacturing and so on. So I don't think on-prem is going away, and I think vendors that are going to, you know, go very cloud forward, very big on the cloud, if they neglect having at least decent connectors to on-prem legacy vendors, they're going to miss out. So I think that's something that these players need to keep in mind is that they continue to reach back to some of these players that have big footprints on-prem, and make sure that those integrations are seamless and work well, or else their customers will always have a multi-cloud or hybrid experience. And then I think a second point here about the future is, you know, we talk about the three big, you know, cloud providers, the Google, Microsoft, AWS as sort of the opposite of, or different from this new supercloud paradigm that's emerging. But I want to kind of point out that, they will always try to make a play to become that and I think, you know, we'll certainly see someone like Microsoft trying to expand their licensing and expand how they play in order to become that super cloud provider for folks. So also don't want to downplay them. I think you're going to see those three big players continue to move, and take over what players like CloudFlare are doing and try to, you know, cut them off before they get too big. So, keep an eye on them as well. >> Great points, I mean, I think you're right, the first point, if you're Dell, HPE, Cisco, IBM, your strategy should be to make your on-premise state as cloud-like as possible and you know, make those differences as minimal as possible. And you know, if you're a customer, then the business case is going to be low for you to move off of that. And I think you're right. I think the cloud guys, if this is a real problem, the cloud guys are going to play in there, and they're going to make some money at it. Erik, bring us home please. >> Yeah, I'm going to revert back to our data and this on the macro side. So to kind of support this concept of a supercloud right now, you know Dave, you and I know, we check overall spending and what we're seeing right now is total year spent is expected to only be 4.6%. We ended 2022 at 5% even though it began at almost eight and a half. So this is clearly declining and in that environment, we're seeing the top two strategies to reduce spend are actually vendor consolidation with 36% of our respondents saying they're actively seeking a way to reduce their number of vendors, and consolidate into one. That's obviously supporting a supercloud type of play. Number two is reducing excess cloud resources. So when I look at both of those combined, with a drop in the overall spending reduction, I think you're on the right thread here, Dave. You know, the overall macro view that we're seeing in the data supports this happening. And if I can real quick, couple of names we did not touch on that I do think deserve to be in this conversation, one is HashiCorp. HashiCorp is the number one player in our infrastructure sector, with a 56% net score. It does multiple things within infrastructure and it is completely agnostic to your environment. And if we're also speaking about something that's just a singular feature, we would look at Rubric for data, backup, storage, recovery. They're not going to offer you your full cloud or your networking of course, but if you are looking for your backup, recovery, and storage Rubric, also number one in that sector with a 53% net score. Two other names that deserve to be in this conversation as we watch it move and evolve. >> Great, thank you for bringing that up. Yeah, we had both of those guys in the chart and I failed to focus in on HashiCorp. And clearly a Supercloud enabler. All right guys, we got to go. Thank you so much for joining us, appreciate it. Let's keep this conversation going. >> Always enjoy talking to you Dave, thanks. >> Yeah, thanks for having us. >> All right, keep it right there for more content from Supercloud 2. This is Dave Valente for John Ferg and the entire Cube team. We'll be right back. (gentle synth music) (music fades)

Published Date : Feb 17 2023

SUMMARY :

is the intersection of cloud and data. Thank you for having period of time, you know, and evolution of the cloud So in a way, you know, supercloud the data closer to the business. So my general question to both of you is, the complexity does need to be And so there's this need to use, you know, So my question to you guys is, And as you mentioned, Azure but in the surveys, you know, customers, the ability to offer and there are a number of other, you know, and maybe, you know, join forces each of the cloud platforms, you know, the three big, you know, And you know, if you're a customer, you and I know, we check overall spending and I failed to focus in on HashiCorp. to you Dave, thanks. Ferg and the entire Cube team.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

ErikPERSON

0.99+

DellORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

John FergPERSON

0.99+

DavePERSON

0.99+

WalmartORGANIZATION

0.99+

Erik BradleyPERSON

0.99+

DavidPERSON

0.99+

AWSORGANIZATION

0.99+

Dave ValentePERSON

0.99+

January, 2023DATE

0.99+

ChinaLOCATION

0.99+

USLOCATION

0.99+

HPEORGANIZATION

0.99+

50 billionQUANTITY

0.99+

Ionis PharmaceuticalsORGANIZATION

0.99+

Darren BrabhamPERSON

0.99+

56%QUANTITY

0.99+

4.6%QUANTITY

0.99+

EuropeLOCATION

0.99+

OracleORGANIZATION

0.99+

53%QUANTITY

0.99+

36%QUANTITY

0.99+

TanzuORGANIZATION

0.99+

DarrenPERSON

0.99+

1200QUANTITY

0.99+

Red HatORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

FridayDATE

0.99+

RubricORGANIZATION

0.99+

last yearDATE

0.99+

two sidesQUANTITY

0.99+

DatabricksORGANIZATION

0.99+

5%QUANTITY

0.99+

CohesityORGANIZATION

0.99+

two toolsQUANTITY

0.99+

VeeamORGANIZATION

0.99+

CloudFlareTITLE

0.99+

twoQUANTITY

0.99+

bothQUANTITY

0.99+

2022DATE

0.99+

OneQUANTITY

0.99+

Daren BrabhamPERSON

0.99+

three yearsQUANTITY

0.99+

TSISORGANIZATION

0.99+

BrabhamPERSON

0.99+

CloudFlareORGANIZATION

0.99+

1500 survey respondentsQUANTITY

0.99+

second pointQUANTITY

0.99+

first pointQUANTITY

0.98+

SnowflakeTITLE

0.98+

oneQUANTITY

0.98+

SupercloudORGANIZATION

0.98+

ETRORGANIZATION

0.98+

SnowflakeORGANIZATION

0.98+

AkamaiORGANIZATION

0.98+

Why Should Customers Care About SuperCloud


 

Hello and welcome back to Supercloud 2 where we examine the intersection of cloud and data in the 2020s. My name is Dave Vellante. Our Supercloud panel, our power panel is back. Maribel Lopez is the founder and principal analyst at Lopez Research. Sanjeev Mohan is former Gartner analyst and principal at Sanjeev Mohan. And Keith Townsend is the CTO advisor. Folks, welcome back and thanks for your participation today. Good to see you. >> Okay, great. >> Great to see you. >> Thanks. Let me start, Maribel, with you. Bob Muglia, we had a conversation as part of Supercloud the other day. And he said, "Dave, I like the work, you got to simplify this a little bit." So he said, quote, "A Supercloud is a platform." He said, "Think of it as a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." And then Nelu Mihai said, "Well, wait a minute. This is just going to create more stove pipes. We need more standards in an architecture," which is kind of what Berkeley Sky Computing initiative is all about. So there's a sort of a debate going on. Is supercloud an architecture, a platform? Or maybe it's just another buzzword. Maribel, do you have a thought on this? >> Well, the easy answer would be to say it's just a buzzword. And then we could just kill the conversation and be done with it. But I think the term, it's more than that, right? The term actually isn't new. You can go back to at least 2016 and find references to supercloud in Cornell University or assist in other documents. So, having said this, I think we've been talking about Supercloud for a while, so I assume it's more than just a fancy buzzword. But I think it really speaks to that undeniable trend of moving towards an abstraction layer to deal with the chaos of what we consider managing multiple public and private clouds today, right? So one definition of the technology platform speaks to a set of services that allows companies to build and run that technology smoothly without worrying about the underlying infrastructure, which really gets back to something that Bob said. And some of the question is where that lives. And you could call that an abstraction layer. You could call it cross-cloud services, hybrid cloud management. So I see momentum there, like legitimate momentum with enterprise IT buyers that are trying to deal with the fact that they have multiple clouds now. So where I think we're moving is trying to define what are the specific attributes and frameworks of that that would make it so that it could be consistent across clouds. What is that layer? And maybe that's what the supercloud is. But one of the things I struggle with with supercloud is. What are we really trying to do here? Are we trying to create differentiated services in the supercloud layer? Is a supercloud just another variant of what AWS, GCP, or others do? You spoken to Walmart about its cloud native platform, and that's an example of somebody deciding to do it themselves because they need to deal with this today and not wait for some big standards thing to happen. So whatever it is, I do think it's something. I think we're trying to maybe create an architecture out of it would be a better way of saying it so that it does get to those set of principles, but it also needs to be edge aware. I think whenever we talk about supercloud, we're always talking about like the big centralized cloud. And I think we need to think about all the distributed clouds that we're looking at in edge as well. So that might be one of the ways that supercloud evolves. >> So thank you, Maribel. Keith, Brian Gracely, Gracely's law, things kind of repeat themselves. We've seen it all before. And so what Muglia brought to the forefront is this idea of a platform where the platform provider is really responsible for the architecture. Of course, the drawback is then you get a a bunch of stove pipes architectures. But practically speaking, that's kind of the way the industry has always evolved, right? >> So if we look at this from the practitioner's perspective and we talk about platforms, traditionally vendors have provided the platforms for us, whether it's distribution of lineage managed by or provided by Red Hat, Windows, servers, .NET, databases, Oracle. We think of those as platforms, things that are fundamental we can build on top. Supercloud isn't today that. It is a framework or idea, kind of a visionary goal to get to a point that we can have a platform or a framework. But what we're seeing repeated throughout the industry in customers, whether it's the Walmarts that's kind of supersized the idea of supercloud, or if it's regular end user organizations that are coming out with platform groups, groups who normalize cloud native infrastructure, AWS multi-cloud, VMware resources to look like one thing internally to their developers. We're seeing this trend that there's a desire for a platform that provides the capabilities of a supercloud. >> Thank you for that. Sanjeev, we often use Snowflake as a supercloud example, and now would presumably would be a platform with an architecture that's determined by the vendor. Maybe Databricks is pushing for a more open architecture, maybe more of that nirvana that we were talking about before to solve for supercloud. But regardless, the practitioner discussions show. At least currently, there's not a lot of cross-cloud data sharing. I think it could be a killer use case, egress charges or a barrier. But how do you see it? Will that change? Will we hide that underlying complexity and start sharing data across cloud? Is that something that you think Snowflake or others will be able to achieve? >> So I think we are already starting to see some of that happen. Snowflake is definitely one example that gets cited a lot. But even we don't talk about MongoDB in this like, but you could have a MongoDB cluster, for instance, with nodes sitting in different cloud providers. So there are companies that are starting to do it. The advantage that these companies have, let's take Snowflake as an example, it's a centralized proprietary platform. And they are building the capabilities that are needed for supercloud. So they're building things like you can push down your data transformations. They have the entire security and privacy suite. Data ops, they're adding those capabilities. And if I'm not mistaken, it'll be very soon, we will see them offer data observability. So it's all works great as long as you are in one platform. And if you want resilience, then Snowflake, Supercloud, great example. But if your primary goal is to choose the most cost-effective service irrespective of which cloud it sits in, then things start falling sideways. For example, I may be a very big Snowflake user. And I like Snowflake's resilience. I can move from one cloud to another cloud. Snowflake does it for me. But what if I want to train a very large model? Maybe Databricks is a better platform for that. So how do I do move my workload from one platform to another platform? That tooling does not exist. So we need server hybrid, cross-cloud, data ops platform. Walmart has done a great job, but they built it by themselves. Not every company is Walmart. Like Maribel and Keith said, we need standards, we need reference architectures, we need some sort of a cost control. I was just reading recently, Accenture has been public about their AWS bill. Every time they get the bill is tens of millions of lines, tens of millions 'cause there are over thousand teams using AWS. If we have not been able to corral a usage of a single cloud, now we're talking about supercloud, we've got multiple clouds, and hybrid, on-prem, and edge. So till we've got some cross-platform tooling in place, I think this will still take quite some time for it to take shape. >> It's interesting. Maribel, Walmart would tell you that their on-prem infrastructure is cheaper to run than the stuff in the cloud. but at the same time, they want the flexibility and the resiliency of their three-legged stool model. So the point as Sanjeev was making about hybrid. It's an interesting balance, isn't it, between getting your lowest cost and at the same time having best of breed and scale? >> It's basically what you're trying to optimize for, as you said, right? And by the way, to the earlier point, not everybody is at Walmart's scale, so it's not actually cheaper for everybody to have the purchasing power to make the cloud cheaper to have it on-prem. But I think what you see almost every company, large or small, moving towards is this concept of like, where do I find the agility? And is the agility in building the infrastructure for me? And typically, the thing that gives you outside advantage as an organization is not how you constructed your cloud computing infrastructure. It might be how you structured your data analytics as an example, which cloud is related to that. But how do you marry those two things? And getting back to sort of Sanjeev's point. We're in a real struggle now where one hand we want to have best of breed services and on the other hand we want it to be really easy to manage, secure, do data governance. And those two things are really at odds with each other right now. So if you want all the knobs and switches of a service like geospatial analytics and big query, you're going to have to use Google tools, right? Whereas if you want visibility across all the clouds for your application of state and understand the security and governance of that, you're kind of looking for something that's more cross-cloud tooling at that point. But whenever you talk to somebody about cross-cloud tooling, they look at you like that's not really possible. So it's a very interesting time in the market. Now, we're kind of layering this concept of supercloud on it. And some people think supercloud's about basically multi-cloud tooling, and some people think it's about a whole new architectural stack. So we're just not there yet. But it's not all about cost. I mean, cloud has not been about cost for a very, very long time. Cloud has been about how do you really make the most of your data. And this gets back to cross-cloud services like Snowflake. Why did they even exist? They existed because we had data everywhere, but we need to treat data as a unified object so that we can analyze it and get insight from it. And so that's where some of the benefit of these cross-cloud services are moving today. Still a long way to go, though, Dave. >> Keith, I reached out to my friends at ETR given the macro headwinds, And you're right, Maribel, cloud hasn't really been about just about cost savings. But I reached out to the ETR, guys, what's your data show in terms of how customers are dealing with the economic headwinds? And they said, by far, their number one strategy to cut cost is consolidating redundant vendors. And a distant second, but still notable was optimizing cloud costs. Maybe using reserve instances, or using more volume buying. Nowhere in there. And I asked them to, "Could you go look and see if you can find it?" Do we see repatriation? And you hear this a lot. You hear people whispering as analysts, "You better look into that repatriation trend." It's pretty big. You can't find it. But some of the Walmarts in the world, maybe even not repatriating, but they maybe have better cost structure on-prem. Keith, what are you seeing from the practitioners that you talk to in terms of how they're dealing with these headwinds? >> Yeah, I just got into a conversation about this just this morning with (indistinct) who is an analyst over at GigaHome. He's reading the same headlines. Repatriation is happening at large scale. I think this is kind of, we have these quiet terms now. We have quiet quitting, we have quiet hiring. I think we have quiet repatriation. Most people haven't done away with their data centers. They're still there. Whether they're completely on-premises data centers, and they own assets, or they're partnerships with QTX, Equinix, et cetera, they have these private cloud resources. What I'm seeing practically is a rebalancing of workloads. Do I really need to pay AWS for this instance of SAP that's on 24 hours a day versus just having it on-prem, moving it back to my data center? I've talked to quite a few customers who were early on to moving their static SAP workloads onto the public cloud, and they simply moved them back. Surprising, I was at VMware Explore. And we can talk about this a little bit later on. But our customers, net new, not a lot that were born in the cloud. And they get to this point where their workloads are static. And they look at something like a Kubernetes, or a OpenShift, or VMware Tanzu. And they ask the question, "Do I need the scalability of cloud?" I might consider being a net new VMware customer to deliver this base capability. So are we seeing repatriation as the number one reason? No, I think internal IT operations are just naturally come to this realization. Hey, I have these resources on premises. The private cloud technologies have moved far along enough that I can just simply move this workload back. I'm not calling it repatriation, I'm calling it rightsizing for the operating model that I have. >> Makes sense. Yeah. >> Go ahead. >> If I missed something, Dave, why we are on this topic of repatriation. I'm actually surprised that we are talking about repatriation as a very big thing. I think repatriation is happening, no doubt, but it's such a small percentage of cloud migration that to me it's a rounding error in my opinion. I think there's a bigger problem. The problem is that people don't know where the cost is. If they knew where the cost was being wasted in the cloud, they could do something about it. But if you don't know, then the easy answer is cloud costs a lot and moving it back to on-premises. I mean, take like Capital One as an example. They got rid of all the data centers. Where are they going to repatriate to? They're all in the cloud at this point. So I think my point is that data observability is one of the places that has seen a lot of traction is because of cost. Data observability, when it first came into existence, it was all about data quality. Then it was all about data pipeline reliability. And now, the number one killer use case is FinOps. >> Maribel, you had a comment? >> Yeah, I'm kind of in violent agreement with both Sanjeev and Keith. So what are we seeing here? So the first thing that we see is that many people wildly overspent in the big public cloud. They had stranded cloud credits, so to speak. The second thing is, some of them still had infrastructure that was useful. So why not use it if you find the right workloads to what Keith was talking about, if they were more static workloads, if it was already there? So there is a balancing that's going on. And then I think fundamentally, from a trend standpoint, these things aren't binary. Everybody, for a while, everything was going to go to the public cloud and then people are like, "Oh, it's kind of expensive." Then they're like, "Oh no, they're going to bring it all on-prem 'cause it's really expensive." And it's like, "Well, that doesn't necessarily get me some of the new features and functionalities I might want for some of my new workloads." So I'm going to put the workloads that have a certain set of characteristics that require cloud in the cloud. And if I have enough capability on-prem and enough IT resources to manage certain things on site, then I'm going to do that there 'cause that's a more cost-effective thing for me to do. It's not binary. That's why we went to hybrid. And then we went to multi just to describe the fact that people added multiple public clouds. And now we're talking about super, right? So I don't look at it as a one-size-fits-all for any of this. >> A a number of practitioners leading up to Supercloud2 have told us that they're solving their cloud complexity by going in monocloud. So they're putting on the blinders. Even though across the organization, there's other groups using other clouds. You're like, "In my group, we use AWS, or my group, we use Azure. And those guys over there, they use Google. We just kind of keep it separate." Are you guys hearing this in your view? Is that risky? Are they missing out on some potential to tap best of breed? What do you guys think about that? >> Everybody thinks they're monocloud. Is anybody really monocloud? It's like a group is monocloud, right? >> Right. >> This genie is out of the bottle. We're not putting the genie back in the bottle. You might think your monocloud and you go like three doors down and figure out the guy or gal is on a fundamentally different cloud, running some analytics workload that you didn't know about. So, to Sanjeev's earlier point, they don't even know where their cloud spend is. So I think the concept of monocloud, how that's actually really realized by practitioners is primary and then secondary sources. So they have a primary cloud that they run most of their stuff on, and that they try to optimize. And we still have forked workloads. Somebody decides, "Okay, this SAP runs really well on this, or these analytics workloads run really well on that cloud." And maybe that's how they parse it. But if you really looked at it, there's very few companies, if you really peaked under the hood and did an analysis that you could find an actual monocloud structure. They just want to pull it back in and make it more manageable. And I respect that. You want to do what you can to try to streamline the complexity of that. >> Yeah, we're- >> Sorry, go ahead, Keith. >> Yeah, we're doing this thing where we review AWS service every day. Just in your inbox, learn about a new AWS service cursory. There's 238 AWS products just on the AWS cloud itself. Some of them are redundant, but you get the idea. So the concept of monocloud, I'm in filing agreement with Maribel on this that, yes, a group might say I want a primary cloud. And that primary cloud may be the AWS. But have you tried the licensed Oracle database on AWS? It is really tempting to license Oracle on Oracle Cloud, Microsoft on Microsoft. And I can't get RDS anywhere but Amazon. So while I'm driven to desire the simplicity, the reality is whether be it M&A, licensing, data sovereignty. I am forced into a multi-cloud management style. But I do agree most people kind of do this one, this primary cloud, secondary cloud. And I guarantee you're going to have a third cloud or a fourth cloud whether you want to or not via shadow IT, latency, technical reasons, et cetera. >> Thank you. Sanjeev, you had a comment? >> Yeah, so I just wanted to mention, as an organization, I'm complete agreement, no organization is monocloud, at least if it's a large organization. Large organizations use all kinds of combinations of cloud providers. But when you talk about a single workload, that's where the program arises. As Keith said, the 238 services in AWS. How in the world am I going to be an expert in AWS, but then say let me bring GCP or Azure into a single workload? And that's where I think we probably will still see monocloud as being predominant because the team has developed its expertise on a particular cloud provider, and they just don't have the time of the day to go learn yet another stack. However, there are some interesting things that are happening. For example, if you look at a multi-cloud example where Oracle and Microsoft Azure have that interconnect, so that's a beautiful thing that they've done because now in the newest iteration, it's literally a few clicks. And then behind the scene, your .NET application and your Oracle database in OCI will be configured, the identities in active directory are federated. And you can just start using a database in one cloud, which is OCI, and an application, your .NET in Azure. So till we see this kind of a solution coming out of the providers, I think it's is unrealistic to expect the end users to be able to figure out multiple clouds. >> Well, I have to share with you. I can't remember if he said this on camera or if it was off camera so I'll hold off. I won't tell you who it is, but this individual was sort of complaining a little bit saying, "With AWS, I can take their best AI tools like SageMaker and I can run them on my Snowflake." He said, "I can't do that in Google. Google forces me to go to BigQuery if I want their excellent AI tools." So he was sort of pushing, kind of tweaking a little bit. Some of the vendor talked that, "Oh yeah, we're so customer-focused." Not to pick on Google, but I mean everybody will say that. And then you say, "If you're so customer-focused, why wouldn't you do a ABC?" So it's going to be interesting to see who leads that integration and how broadly it's applied. But I digress. Keith, at our first supercloud event, that was on August 9th. And it was only a few months after Broadcom announced the VMware acquisition. A lot of people, myself included said, "All right, cuts are coming." Generally, Tanzu is probably going to be under the radar, but it's Supercloud 22 and presumably VMware Explore, the company really... Well, certainly the US touted its Tanzu capabilities. I wasn't at VMware Explore Europe, but I bet you heard similar things. Hawk Tan has been blogging and very vocal about cross-cloud services and multi-cloud, which doesn't happen without Tanzu. So what did you hear, Keith, in Europe? What's your latest thinking on VMware's prospects in cross-cloud services/supercloud? >> So I think our friend and Cube, along host still be even more offended at this statement than he was when I sat in the Cube. This was maybe five years ago. There's no company better suited to help industries or companies, cross-cloud chasm than VMware. That's not a compliment. That's a reality of the industry. This is a very difficult, almost intractable problem. What I heard that VMware Europe were customers serious about this problem, even more so than the US data sovereignty is a real problem in the EU. Try being a company in Switzerland and having the Swiss data solvency issues. And there's no local cloud presence there large enough to accommodate your data needs. They had very serious questions about this. I talked to open source project leaders. Open source project leaders were asking me, why should I use the public cloud to host Kubernetes-based workloads, my projects that are building around Kubernetes, and the CNCF infrastructure? Why should I use AWS, Google, or even Azure to host these projects when that's undifferentiated? I know how to run Kubernetes, so why not run it on-premises? I don't want to deal with the hardware problems. So again, really great questions. And then there was always the specter of the problem, I think, we all had with the acquisition of VMware by Broadcom potentially. 4.5 billion in increased profitability in three years is a unbelievable amount of money when you look at the size of the problem. So a lot of the conversation in Europe was about industry at large. How do we do what regulators are asking us to do in a practical way from a true technology sense? Is VMware cross-cloud great? >> Yeah. So, VMware, obviously, to your point. OpenStack is another way of it. Actually, OpenStack, uptake is still alive and well, especially in those regions where there may not be a public cloud, or there's public policy dictating that. Walmart's using OpenStack. As you know in IT, some things never die. Question for Sanjeev. And it relates to this new breed of data apps. And Bob Muglia and Tristan Handy from DBT Labs who are participating in this program really got us thinking about this. You got data that resides in different clouds, it maybe even on-prem. And the machine polls data from different systems. No humans involved, e-commerce, ERP, et cetera. It creates a plan, outcomes. No human involvement. Today, you're on a CRM system, you're inputting, you're doing forms, you're, you're automating processes. We're talking about a new breed of apps. What are your thoughts on this? Is it real? Is it just way off in the distance? How does machine intelligence fit in? And how does supercloud fit? >> So great point. In fact, the data apps that you're talking about, I call them data products. Data products first came into limelight in the last couple of years when Jamal Duggan started talking about data mesh. I am taking data products out of the data mesh concept because data mesh, whether data mesh happens or not is analogous to data products. Data products, basically, are taking a product management view of bringing data from different sources based on what the consumer needs. We were talking earlier today about maybe it's my vacation rentals, or it may be a retail data product, it may be an investment data product. So it's a pre-packaged extraction of data from different sources. But now I have a product that has a whole lifecycle. I can version it. I have new features that get added. And it's a very business data consumer centric. It uses machine learning. For instance, I may be able to tell whether this data product has stale data. Who is using that data? Based on the usage of the data, I may have a new data products that get allocated. I may even have the ability to take existing data products, mash them up into something that I need. So if I'm going to have that kind of power to create a data product, then having a common substrate underneath, it can be very useful. And that could be supercloud where I am making API calls. I don't care where the ERP, the CRM, the survey data, the pricing engine where they sit. For me, there's a logical abstraction. And then I'm building my data product on top of that. So I see a new breed of data products coming out. To answer your question, how early we are or is this even possible? My prediction is that in 2023, we will start seeing more of data products. And then it'll take maybe two to three years for data products to become mainstream. But it's starting this year. >> A subprime mortgages were a data product, definitely were humans involved. All right, let's talk about some of the supercloud, multi-cloud players and what their future looks like. You can kind of pick your favorites. VMware, Snowflake, Databricks, Red Hat, Cisco, Dell, HP, Hashi, IBM, CloudFlare. There's many others. cohesive rubric. Keith, I wanted to start with CloudFlare because they actually use the term supercloud. and just simplifying what they said. They look at it as taking serverless to the max. You write your code and then you can deploy it in seconds worldwide, of course, across the CloudFlare infrastructure. You don't have to spin up containers, you don't go to provision instances. CloudFlare worries about all that infrastructure. What are your thoughts on CloudFlare this approach and their chances to disrupt the current cloud landscape? >> As Larry Ellison said famously once before, the network is the computer, right? I thought that was Scott McNeley. >> It wasn't Scott McNeley. I knew it was on Oracle Align. >> Oracle owns that now, owns that line. >> By purpose or acquisition. >> They should have just called it cloud. >> Yeah, they should have just called it cloud. >> Easier. >> Get ahead. >> But if you think about the CloudFlare capability, CloudFlare in its own right is becoming a decent sized cloud provider. If you have compute out at the edge, when we talk about edge in the sense of CloudFlare and points of presence, literally across the globe, you have all of this excess computer, what do you do with it? First offering, let's disrupt data in the cloud. We can't start the conversation talking about data. When they say we're going to give you object-oriented or object storage in the cloud without egress charges, that's disruptive. That we can start to think about supercloud capability of having compute EC2 run in AWS, pushing and pulling data from CloudFlare. And now, I've disrupted this roach motel data structure, and that I'm freely giving away bandwidth, basically. Well, the next layer is not that much more difficult. And I think part of CloudFlare's serverless approach or supercloud approaches so that they don't have to commit to a certain type of compute. It is advantageous. It is a feature for me to be able to go to EC2 and pick a memory heavy model, or a compute heavy model, or a network heavy model, CloudFlare is taken away those knobs. and I'm just giving code and allowing that to run. CloudFlare has a massive network. If I can put the code closest using the CloudFlare workers, if I can put that code closest to where the data is at or residing, super compelling observation. The question is, does it scale? I don't get the 238 services. While Server List is great, I have to know what I'm going to build. I don't have a Cognito, or RDS, or all these other services that make AWS, GCP, and Azure appealing from a builder's perspective. So it is a very interesting nascent start. It's great because now they can hide compute. If they don't have the capacity, they can outsource that maybe at a cost to one of the other cloud providers, but kind of hiding the compute behind the surplus architecture is a really unique approach. >> Yeah. And they're dipping their toe in the water. And they've announced an object store and a database platform and more to come. We got to wrap. So I wonder, Sanjeev and Maribel, if you could maybe pick some of your favorites from a competitive standpoint. Sanjeev, I felt like just watching Snowflake, I said, okay, in my opinion, they had the right strategy, which was to run on all the clouds, and then try to create that abstraction layer and data sharing across clouds. Even though, let's face it, most of it might be happening across regions if it's happening, but certainly outside of an individual account. But I felt like just observing them that anybody who's traditional on-prem player moving into the clouds or anybody who's a cloud native, it just makes total sense to write to the various clouds. And to the extent that you can simplify that for users, it seems to be a logical strategy. Maybe as I said before, what multi-cloud should have been. But are there companies that you're watching that you think are ahead in the game , or ones that you think are a good model for the future? >> Yes, Snowflake, definitely. In fact, one of the things we have not touched upon very much, and Keith mentioned a little bit, was data sovereignty. Data residency rules can require that certain data should be written into certain region of a certain cloud. And if my cloud provider can abstract that or my database provider, then that's perfect for me. So right now, I see Snowflake is way ahead of this pack. I would not put MongoDB too far behind. They don't really talk about this thing. They are in a different space, but now they have a lakehouse, and they've got all of these other SQL access and new capabilities that they're announcing. So I think they would be quite good with that. Oracle is always a dark forest. Oracle seems to have revived its Cloud Mojo to some extent. And it's doing some interesting stuff. Databricks is the other one. I have not seen Databricks. They've been very focused on lakehouse, unity, data catalog, and some of those pieces. But they would be the obvious challenger. And if they come into this space of supercloud, then they may bring some open source technologies that others can rely on like Delta Lake as a table format. >> Yeah. One of these infrastructure players, Dell, HPE, Cisco, even IBM. I mean, I would be making my infrastructure as programmable and cloud friendly as possible. That seems like table stakes. But Maribel, any companies that stand out to you that we should be paying attention to? >> Well, we already mentioned a bunch of them, so maybe I'll go a slightly different route. I'm watching two companies pretty closely to see what kind of traction they get in their established companies. One we already talked about, which is VMware. And the thing that's interesting about VMware is they're everywhere. And they also have the benefit of having a foot in both camps. If you want to do it the old way, the way you've always done it with VMware, they got all that going on. If you want to try to do a more cross-cloud, multi-cloud native style thing, they're really trying to build tools for that. So I think they have really good access to buyers. And that's one of the reasons why I'm interested in them to see how they progress. The other thing, I think, could be a sleeping horse oddly enough is Google Cloud. They've spent a lot of work and time on Anthos. They really need to create a certain set of differentiators. Well, it's not necessarily in their best interest to be the best multi-cloud player. If they decide that they want to differentiate on a different layer of the stack, let's say they want to be like the person that is really transformative, they talk about transformation cloud with analytics workloads, then maybe they do spend a good deal of time trying to help people abstract all of the other underlying infrastructure and make sure that they get the sexiest, most meaningful workloads into their cloud. So those are two people that you might not have expected me to go with, but I think it's interesting to see not just on the things that might be considered, either startups or more established independent companies, but how some of the traditional providers are trying to reinvent themselves as well. >> I'm glad you brought that up because if you think about what Google's done with Kubernetes. I mean, would Google even be relevant in the cloud without Kubernetes? I could argue both sides of that. But it was quite a gift to the industry. And there's a motivation there to do something unique and different from maybe the other cloud providers. And I'd throw in Red Hat as well. They're obviously a key player and Kubernetes. And Hashi Corp seems to be becoming the standard for application deployment, and terraform, or cross-clouds, and there are many, many others. I know we're leaving lots out, but we're out of time. Folks, I got to thank you so much for your insights and your participation in Supercloud2. Really appreciate it. >> Thank you. >> Thank you. >> Thank you. >> This is Dave Vellante for John Furrier and the entire Cube community. Keep it right there for more content from Supercloud2.

Published Date : Jan 10 2023

SUMMARY :

And Keith Townsend is the CTO advisor. And he said, "Dave, I like the work, So that might be one of the that's kind of the way the that we can have a Is that something that you think Snowflake that are starting to do it. and the resiliency of their and on the other hand we want it But I reached out to the ETR, guys, And they get to this point Yeah. that to me it's a rounding So the first thing that we see is to Supercloud2 have told us Is anybody really monocloud? and that they try to optimize. And that primary cloud may be the AWS. Sanjeev, you had a comment? of a solution coming out of the providers, So it's going to be interesting So a lot of the conversation And it relates to this So if I'm going to have that kind of power and their chances to disrupt the network is the computer, right? I knew it was on Oracle Align. Oracle owns that now, Yeah, they should have so that they don't have to commit And to the extent that you And if my cloud provider can abstract that that stand out to you And that's one of the reasons Folks, I got to thank you and the entire Cube community.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
KeithPERSON

0.99+

Dave VellantePERSON

0.99+

Jamal DugganPERSON

0.99+

Nelu MihaiPERSON

0.99+

IBMORGANIZATION

0.99+

MaribelPERSON

0.99+

Bob MugliaPERSON

0.99+

CiscoORGANIZATION

0.99+

DellORGANIZATION

0.99+

EuropeLOCATION

0.99+

OracleORGANIZATION

0.99+

Tristan HandyPERSON

0.99+

Keith TownsendPERSON

0.99+

Larry EllisonPERSON

0.99+

Brian GracelyPERSON

0.99+

BobPERSON

0.99+

HPORGANIZATION

0.99+

AWSORGANIZATION

0.99+

EquinixORGANIZATION

0.99+

QTXORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

Maribel LopezPERSON

0.99+

August 9thDATE

0.99+

DavePERSON

0.99+

GracelyPERSON

0.99+

AmazonORGANIZATION

0.99+

WalmartsORGANIZATION

0.99+

Red HatORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

SanjeevPERSON

0.99+

MicrosoftORGANIZATION

0.99+

HashiORGANIZATION

0.99+

GigaHomeORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

2023DATE

0.99+

Hawk TanPERSON

0.99+

GoogleORGANIZATION

0.99+

two companiesQUANTITY

0.99+

two thingsQUANTITY

0.99+

BroadcomORGANIZATION

0.99+

SwitzerlandLOCATION

0.99+

SnowflakeTITLE

0.99+

SnowflakeORGANIZATION

0.99+

HPEORGANIZATION

0.99+

twoQUANTITY

0.99+

238 servicesQUANTITY

0.99+

two peopleQUANTITY

0.99+

2016DATE

0.99+

GartnerORGANIZATION

0.99+

tens of millionsQUANTITY

0.99+

three yearsQUANTITY

0.99+

DBT LabsORGANIZATION

0.99+

fourth cloudQUANTITY

0.99+

Breaking Analysis: Cyber Firms Revert to the Mean


 

(upbeat music) >> From theCube Studios in Palo Alto in Boston, bringing you data driven insights from theCube and ETR. This is Breaking Analysis with Dave Vellante. >> While by no means a safe haven, the cybersecurity sector has outpaced the broader tech market by a meaningful margin, that is up until very recently. Cybersecurity remains the number one technology priority for the C-suite, but as we've previously reported the CISO's budget has constraints just like other technology investments. Recent trends show that economic headwinds have elongated sales cycles, pushed deals into future quarters, and just like other tech initiatives, are pacing cybersecurity investments and breaking them into smaller chunks. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this Breaking Analysis we explain how cybersecurity trends are reverting to the mean and tracking more closely with other technology investments. We'll make a couple of valuation comparisons to show the magnitude of the challenge and which cyber firms are feeling the heat, which aren't. There are some exceptions. We'll then show the latest survey data from ETR to quantify the contraction in spending momentum and close with a glimpse of the landscape of emerging cybersecurity companies, the private companies that could be ripe for acquisition, consolidation, or disruptive to the broader market. First, let's take a look at the recent patterns for cyber stocks relative to the broader tech market as a benchmark, as an indicator. Here's a year to date comparison of the bug ETF, which comprises a basket of cyber security names, and we compare that with the tech heavy NASDAQ composite. Notice that on April 13th of this year the cyber ETF was actually in positive territory while the NAS was down nearly 14%. Now by August 16th, the green turned red for cyber stocks but they still meaningfully outpaced the broader tech market by more than 950 basis points as of December 2nd that Delta had contracted. As you can see, the cyber ETF is now down nearly 25%, year to date, while the NASDAQ is down 27% and change. Now take a look at just how far a few of the high profile cybersecurity names have fallen. Here are six security firms that we've been tracking closely since before the pandemic. We've been, you know, tracking dozens but let's just take a look at this data and the subset. We show for comparison the S&P 500 and the NASDAQ, again, just for reference, they're both up since right before the pandemic. They're up relative to right before the pandemic, and then during the pandemic the S&P shot up more than 40%, relative to its pre pandemic level, around February is what we're using for the pre pandemic level, and the NASDAQ peaked at around 65% higher than that February level. They're now down 85% and 71% of their previous. So they're at 85% and 71% respectively from their pandemic highs. You compare that to these six companies, Splunk, which was and still is working through a transition is well below its pre pandemic market value and 44, it's 44% of its pre pandemic high as of last Friday. Palo Alto Networks is the most interesting here, in that it had been facing challenges prior to the pandemic related to a pivot to the Cloud which we reported on at the time. But as we said at that time we believe the company would sort out its Cloud transition, and its go to market challenges, and sales compensation issues, which it did as you can see. And its valuation jumped from 24 billion prior to Covid to 56 billion, and it's holding 93% of its peak value. Its revenue run rate is now over 6 billion with a healthy growth rate of 24% expected for the next quarter. Similarly, Fortinet has done relatively well holding 71% of its peak Covid value, with a healthy 34% revenue guide for the coming quarter. Now, Okta has been the biggest disappointment, a darling of the pandemic Okta's communication snafu, with what was actually a pretty benign hack combined with difficulty absorbing its 7 billion off zero acquisition, knocked the company off track. Its valuation has dropped by 35 billion since its peak during the pandemic, and that's after a nice beat and bounce back quarter just announced by Okta. Now, in our view Okta remains a viable long-term leader in identity. However, its recent fiscal 24 revenue guide was exceedingly conservative at around 16% growth. So either the company is sandbagging, or has such poor visibility that it wants to be like super cautious or maybe it's actually seeing a dramatic slowdown in its business momentum. After all, this is a company that not long ago was putting up 50% plus revenue growth rates. So it's one that bears close watching. CrowdStrike is another big name that we've been talking about on Breaking Analysis for quite some time. It like Okta has led the industry in a key ETR performance indicator that measures customer spending momentum. Just last week, CrowdStrike announced revenue increased more than 50% but new ARR was soft and the company guided conservatively. Not surprisingly, the stock got absolutely crushed as CrowdStrike blamed tepid demand from smaller and midsize firms. Many analysts believe that competition from Microsoft was one factor along with cautious spending amongst those midsize and smaller customers. Notably, large customers remain active. So we'll see if this is a longer term trend or an anomaly. Zscaler is another company in the space that we've reported having great customer spending momentum from the ETR data. But even though the company beat expectations for its recent quarter, like other companies its Outlook was conservative. So other than Palo Alto, and to a lesser extent Fortinet, these companies and others that we're not showing here are feeling the economic pinch and it shows in the compression of value. CrowdStrike, for example, had a 70 billion valuation at one point during the pandemic Zscaler top 50 billion, Okta 45 billion. Now, having said that Palo Alto Networks, Fortinet, CrowdStrike, and Zscaler are all still trading well above their pre pandemic levels that we tracked back in February of 2020. All right, let's go now back to ETR'S January survey and take a look at how much things have changed since the beginning of the year. Remember, this is obviously pre Ukraine, and pre all the concerns about the economic headwinds but here's an X Y graph that shows a net score, or spending momentum on the y-axis, and market presence on the x-axis. The red dotted line at 40% on the vertical indicates a highly elevated net score. Anything above that we think is, you know, super elevated. Now, we filtered the data here to show only those companies with more than 50 responses in the ETR survey. Still really crowded. Note that there were around 20 companies above that red 40% mark, which is a very, you know, high number. It's a, it's a crowded market, but lots of companies with, you know, positive momentum. Now let's jump ahead to the most recent October survey and take a look at what, what's happening. Same graphic plotting, spending momentum, and market presence, and look at the number of companies above that red line and how it's been squashed. It's really compressing, it's still a crowded market, it's still, you know, plenty of green, but the number of companies above 40% that, that key mark has gone from around 20 firms down to about five or six. And it speaks to that compression and IT spending, and of course the elongated sales cycles pushing deals out, taking them in smaller chunks. I can't tell you how many conversations with customers I had, at last week at Reinvent underscoring this exact same trend. The buyers are getting pressure from their CFOs to slow things down, do more with less and, and, and prioritize projects to those that absolutely are critical to driving revenue or cutting costs. And that's rippling through all sectors, including cyber. Now, let's do a bit more playing around with the ETR data and take a look at those companies with more than a hundred citations in the survey this quarter. So N, greater than or equal to a hundred. Now remember the followers of Breaking Analysis know that each quarter we take a look at those, what we call four star security firms. That is, those are the, that are in, that hit the top 10 for both spending momentum, net score, and the N, the mentions in the survey, the presence, the pervasiveness in the survey, and that's what we show here. The left most chart is sorted by spending momentum or net score, and the right hand chart by shared N, or the number of mentions in the survey, that pervasiveness metric. that solid red line denotes the cutoff point at the top 10. And you'll note we've actually cut it off at 11 to account for Auth 0, which is now part of Okta, and is going through a go to market transition, you know, with the company, they're kind of restructuring sales so they can take advantage of that. So starting on the left with spending momentum, again, net score, Microsoft leads all vendors, typical Microsoft, very prominent, although it hadn't always done so, it, for a while, CrowdStrike and Okta were, were taking the top spot, now it's Microsoft. CrowdStrike, still always near the top, but note that CyberArk and Cloudflare have cracked the top five in Okta, which as I just said was consistently at the top, has dropped well off its previous highs. You'll notice that Palo Alto Network Palo Alto Networks with a 38% net score, just below that magic 40% number, is healthy, especially as you look over to the right hand chart. Take a look at Palo Alto with an N of 395. It is the largest of the independent pure play security firms, and has a very healthy net score, although one caution is that net score has dropped considerably since the beginning of the year, which is the case for most of the top 10 names. The only exception is Fortinet, they're the only ones that saw an increase since January in spending momentum as ETR measures it. Now this brings us to the four star security firms, that is those that hit the top 10 in both net score on the left hand side and market presence on the right hand side. So it's Microsoft, Palo Alto, CrowdStrike, Okta, still there even not accounting for a Auth 0, just Okta on its own. If you put in Auth 0, it's, it's even stronger. Adding then in Fortinet and Zscaler. So Microsoft, Palo Alto, CrowdStrike, Okta, Fortinet, and Zscaler. And as we've mentioned since January, only Fortinet has shown an increase in net score since, since that time, again, since the January survey. Now again, this talks to the compression in spending. Now one of the big themes we hear constantly in cybersecurity is the market is overcrowded. Everybody talks about that, me included. The implication there, is there's a lot of room for consolidation and that consolidation can come in the form of M&A, or it can come in the form of people consolidating onto a single platform, and retiring some other vendors, and getting rid of duplicate vendors. We're hearing that as a big theme as well. Now, as we saw in the previous, previous chart, this is a very crowded market and we've seen lots of consolidation in 2022, in the form of M&A. Literally hundreds of M&A deals, with some of the largest companies going private. SailPoint, KnowBe4, Barracuda, Mandiant, Fedora, these are multi billion dollar acquisitions, or at least billion dollars and up, and many of them multi-billion, for these companies, and hundreds more acquisitions in the cyberspace, now less you think the pond is overfished, here's a chart from ETR of emerging tech companies in the cyber security industry. This data comes from ETR's Emerging Technologies Survey, ETS, which is this diamond in a rough that I found a couple quarters ago, and it's ripe with companies that are candidates for M&A. Many would've liked, many of these companies would've liked to, gotten to the public markets during the pandemic, but they, you know, couldn't get there. They weren't ready. So the graph, you know, similar to the previous one, but different, it shows net sentiment on the vertical axis and that's a measurement of, of, of intent to adopt against a mind share on the X axis, which measures, measures the awareness of the vendor in the community. So this is specifically a survey that ETR goes out and, and, and fields only to track those emerging tech companies that are private companies. Now, some of the standouts in Mindshare, are OneTrust, BeyondTrust, Tanium and Endpoint, Net Scope, which we've talked about in previous Breaking Analysis. 1Password, which has been acquisitive on its own. In identity, the managed security service provider, Arctic Wolf Network, a company we've also covered, we've had their CEO on. We've talked about MSSPs as a real trend, particularly in small and medium sized business, we'll come back to that, Sneek, you know, kind of high flyer in both app security and containers, and you can just see the number of companies in the space this huge and it just keeps growing. Now, just to make it a bit easier on the eyes we filtered the data on these companies with with those, and isolated on those with more than a hundred responses only within the survey. And that's what we show here. Some of the names that we just mentioned are a bit easier to see, but these are the ones that really stand out in ERT, ETS, survey of private companies, OneTrust, BeyondTrust, Taniam, Netscope, which is in Cloud, 1Password, Arctic Wolf, Sneek, BitSight, SecurityScorecard, HackerOne, Code42, and Exabeam, and Sim. All of these hit the ETS survey with more than a hundred responses by, by the IT practitioners. Okay, so these firms, you know, maybe they do some M&A on their own. We've seen that with Sneek, as I said, with 1Password has been inquisitive, as have others. Now these companies with the larger footprint, these private companies, will likely be candidate for both buying companies and eventually going public when the markets settle down a bit. So again, no shortage of players to affect consolidation, both buyers and sellers. Okay, so let's finish with some key questions that we're watching. CrowdStrike in particular on its earnings calls cited softness from smaller buyers. Is that because these smaller buyers have stopped adopting? If so, are they more at risk, or are they tactically moving toward the easy button, aka, Microsoft's good enough approach. What does that mean for the market if smaller company cohorts continue to soften? How about MSSPs? Will companies continue to outsource, or pause on on that, as well as try to free up, to try to free up some budget? Adam Celiski at Reinvent last week said, "If you want to save money the Cloud's the best place to do it." Is the cloud the best place to save money in cyber? Well, it would seem that way from the standpoint of controlling budgets with lots of, lots of optionality. You could dial up and dial down services, you know, or does the Cloud add another layer of complexity that has to be understood and managed by Devs, for example? Now, consolidation should favor the likes of Palo Alto and CrowdStrike, cause they're platform players, and some of the larger players as well, like Cisco, how about IBM and of course Microsoft. Will that happen? And how will economic uncertainty impact the risk equation, a particular concern is increase of tax on vulnerable sectors of the population, like the elderly. How will companies and governments protect them from scams? And finally, how many cybersecurity companies can actually remain independent in the slingshot economy? In so many ways the market is still strong, it's just that expectations got ahead of themselves, and now as earnings forecast come, come, come down and come down to earth, it's going to basically come down to who can execute, generate cash, and keep enough runway to get through the knothole. And the one certainty is nobody really knows how tight that knothole really is. All right, let's call it a wrap. Next week we dive deeper into Palo Alto Networks, and take a look at how and why that company has held up so well and what to expect at Ignite, Palo Alto's big user conference coming up later this month in Las Vegas. We'll be there with theCube. Okay, many thanks to Alex Myerson on production and manages the podcast, Ken Schiffman as well, as our newest edition to our Boston studio. Great to have you Ken. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Silicon Angle. He does some great editing for us. Thank you to all. Remember these episodes are all available as podcasts. Wherever you listen, just search Breaking Analysis podcast. I publish each week on wikibond.com and siliconangle.com, or you can email me directly David.vellante@siliconangle.com or DM me @DVellante, or comment on our LinkedIn posts. Please do checkout etr.ai, they got the best survey data in the enterprise tech business. This is Dave Vellante for theCube Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis. (upbeat music)

Published Date : Dec 5 2022

SUMMARY :

with Dave Vellante. and of course the elongated

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Dave VellantePERSON

0.99+

December 2ndDATE

0.99+

OktaORGANIZATION

0.99+

DeltaORGANIZATION

0.99+

Ken SchiffmanPERSON

0.99+

ZscalerORGANIZATION

0.99+

FortinetORGANIZATION

0.99+

Cheryl KnightPERSON

0.99+

Adam CeliskiPERSON

0.99+

CrowdStrikeORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

August 16thDATE

0.99+

April 13thDATE

0.99+

Rob HofPERSON

0.99+

NASDAQORGANIZATION

0.99+

IBMORGANIZATION

0.99+

93%QUANTITY

0.99+

Kristin MartinPERSON

0.99+

Palo AltoLOCATION

0.99+

Arctic Wolf NetworkORGANIZATION

0.99+

38%QUANTITY

0.99+

40%QUANTITY

0.99+

71%QUANTITY

0.99+

JanuaryDATE

0.99+

Palo AltoORGANIZATION

0.99+

Palo Alto NetworksORGANIZATION

0.99+

50%QUANTITY

0.99+

February of 2020DATE

0.99+

Las VegasLOCATION

0.99+

7 billionQUANTITY

0.99+

six companiesQUANTITY

0.99+

SplunkORGANIZATION

0.99+

2022DATE

0.99+

BarracudaORGANIZATION

0.99+

34%QUANTITY

0.99+

24%QUANTITY

0.99+

FebruaryDATE

0.99+

last weekDATE

0.99+

last FridayDATE

0.99+

SailPointORGANIZATION

0.99+

FirstQUANTITY

0.99+

more than 50%QUANTITY

0.99+

85%QUANTITY

0.99+

each weekQUANTITY

0.99+

44%QUANTITY

0.99+

35 billionQUANTITY

0.99+

70 billionQUANTITY

0.99+

KenPERSON

0.99+

KnowBe4ORGANIZATION

0.99+

27%QUANTITY

0.99+

56 billionQUANTITY

0.99+

NetscopeORGANIZATION

0.99+

OctoberDATE

0.99+

Next weekDATE

0.99+

one factorQUANTITY

0.99+

bothQUANTITY

0.99+

hundredsQUANTITY

0.99+

44QUANTITY

0.99+

dozensQUANTITY

0.99+

BeyondTrustORGANIZATION

0.99+

David.vellante@siliconangle.comOTHER

0.99+

24 billionQUANTITY

0.99+

Satish Iyer, Dell Technologies | SuperComputing 22


 

>>We're back at Super Computing, 22 in Dallas, winding down the final day here. A big show floor behind me. Lots of excitement out there, wouldn't you say, Dave? Just >>Oh, it's crazy. I mean, any, any time you have NASA presentations going on and, and steampunk iterations of cooling systems that the, you know, it's, it's >>The greatest. I've been to hundreds of trade shows. I don't think I've ever seen NASA exhibiting at one like they are here. Dave Nicholson, my co-host. I'm Paul Gell, in which with us is Satish Ier. He is the vice president of emerging services at Dell Technologies and Satit, thanks for joining us on the cube. >>Thank you. Paul, >>What are emerging services? >>Emerging services are actually the growth areas for Dell. So it's telecom, it's cloud, it's edge. So we, we especially focus on all the growth vectors for, for the companies. >>And, and one of the key areas that comes under your jurisdiction is called apex. Now I'm sure there are people who don't know what Apex is. Can you just give us a quick definition? >>Absolutely. So Apex is actually Dells for a into cloud, and I manage the Apex services business. So this is our way of actually bringing cloud experience to our customers, OnPrem and in color. >>But, but it's not a cloud. I mean, you don't, you don't have a Dell cloud, right? It's, it's of infrastructure as >>A service. It's infrastructure and platform and solutions as a service. Yes, we don't have our own e of a public cloud, but we want to, you know, this is a multi-cloud world, so technically customers want to consume where they want to consume. So this is Dell's way of actually, you know, supporting a multi-cloud strategy for our customers. >>You, you mentioned something just ahead of us going on air. A great way to describe Apex, to contrast Apex with CapEx. There's no c there's no cash up front necessary. Yeah, I thought that was great. Explain that, explain that a little more. Well, >>I mean, you know, one, one of the main things about cloud is the consumption model, right? So customers would like to pay for what they consume, they would like to pay in a subscription. They would like to not prepay CapEx ahead of time. They want that economic option, right? So I think that's one of the key tenets for anything in cloud. So I think it's important for us to recognize that and think Apex is basically a way by which customers pay for what they consume, right? So that's a absolutely a key tenant for how, how we want to design Apex. So it's absolutely right. >>And, and among those services are high performance computing services. Now I was not familiar with that as an offering in the Apex line. What constitutes a high performance computing Apex service? >>Yeah, I mean, you know, I mean, this conference is great, like you said, you know, I, there's so many HPC and high performance computing folks here, but one of the things is, you know, fundamentally, if you look at high performance computing ecosystem, it is quite complex, right? And when you call it as an Apex HPC or Apex offering offer, it brings a lot of the cloud economics and cloud, you know, experience to the HPC offer. So fundamentally, it's about our ability for customers to pay for what they consume. It's where Dell takes a lot of the day to day management of the infrastructure on our own so that customers don't need to do the grunge work of managing it, and they can really focus on the actual workload, which actually they run on the CHPC ecosystem. So it, it is, it is high performance computing offer, but instead of them buying the infrastructure, running all of that by themself, we make it super easy for customers to consume and manage it across, you know, proven designs, which Dell always implements across these verticals. >>So what, what makes the high performance computing offering as opposed to, to a rack of powered servers? What do you add in to make it >>Hpc? Ah, that's a great question. So, I mean, you know, so this is a platform, right? So we are not just selling infrastructure by the drink. So we actually are fundamentally, it's based on, you know, we, we, we launch two validated designs, one for life science sales, one for manufacturing. So we actually know how these PPO work together, how they actually are validated design tested solution. And we also, it's a platform. So we actually integrate the softwares on the top. So it's just not the infrastructure. So we actually integrate a cluster manager, we integrate a job scheduler, we integrate a contained orchestration layer. So a lot of these things, customers have to do it by themself, right? If they're buy the infrastructure. So by basically we are actually giving a platform or an ecosystem for our customers to run their workloads. So make it easy for them to actually consume those. >>That's Now is this, is this available on premises for customer? >>Yeah, so we, we, we make it available customers both ways. So we make it available OnPrem for customers who want to, you know, kind of, they want to take that, take that economics. We also make it available in a colo environment if the customers want to actually, you know, extend colo as that OnPrem environment. So we do both. >>What are, what are the requirements for a customer before you roll that equipment in? How do they sort of have to set the groundwork for, >>For Well, I think, you know, fundamentally it starts off with what the actual use case is, right? So, so if you really look at, you know, the two validated designs we talked about, you know, one for, you know, healthcare life sciences, and one other one for manufacturing, they do have fundamentally different requirements in terms of what you need from those infrastructure systems. So, you know, the customers initially figure out, okay, how do they actually require something which is going to require a lot of memory intensive loads, or do they actually require something which has got a lot of compute power. So, you know, it all depends on what they would require in terms of the workloads to be, and then we do havet sizing. So we do have small, medium, large, we have, you know, multiple infrastructure options, CPU core options. Sometimes the customer would also wanna say, you know what, as long as the regular CPUs, I also want some GPU power on top of that. So those are determinations typically a customer makes as part of the ecosystem, right? And so those are things which would, they would talk to us about to say, okay, what is my best option in terms of, you know, kind of workloads I wanna run? And then they can make a determination in terms of how, how they would actually going. >>So this, this is probably a particularly interesting time to be looking at something like HPC via Apex with, with this season of Rolling Thunder from various partners that you have, you know? Yep. We're, we're all expecting that Intel is gonna be rolling out new CPU sets from a powered perspective. You have your 16th generation of PowerEdge servers coming out, P C I E, gen five, and all of the components from partners like Invidia and Broadcom, et cetera, plugging into them. Yep. What, what does that, what does that look like from your, from your perch in terms of talking to customers who maybe, maybe they're doing things traditionally and they're likely to be not, not fif not 15 G, not generation 15 servers. Yeah. But probably more like 14. Yeah, you're offering a pretty huge uplift. Yep. What, what do those conversations look >>Like? I mean, customers, so talking about partners, right? I mean, of course Dell, you know, we, we, we don't bring any solutions to the market without really working with all of our partners, whether that's at the infrastructure level, like you talked about, you know, Intel, amd, Broadcom, right? All the chip vendors, all the way to software layer, right? So we have cluster managers, we have communities orchestrators. So we usually what we do is we bring the best in class, whether it's a software player or a hardware player, right? And we bring it together as a solution. So we do give the customers a choice, and the customers always want to pick what you they know actually is awesome, right? So they that, that we actually do that. And, you know, and one of the main aspects of, especially when you talk about these things, bringing it as a service, right? >>We take a lot of guesswork away from our customer, right? You know, one of the good example of HPC is capacity, right? So customers, these are very, you know, I would say very intensive systems. Very complex systems, right? So customers would like to buy certain amount of capacity, they would like to grow and, you know, come back, right? So give, giving them the flexibility to actually consume more if they want, giving them the buffer and coming down. All of those things are very important as we actually design these things, right? And that takes some, you know, customers are given a choice, but it actually, they don't need to worry about, oh, you know, what happens if I actually have a spike, right? There's already buffer capacity built in. So those are awesome things. When we talk about things as a service, >>When customers are doing their ROI analysis, buying CapEx on-prem versus, versus using Apex, is there a point, is there a crossover point typically at which it's probably a better deal for them to, to go OnPrem? >>Yeah, I mean, it it like specifically talking about hpc, right? I mean, why, you know, we do have a ma no, a lot of customers consume high performance compute and public cloud, right? That's not gonna go away, right? But there are certain reasons why they would look at OnPrem or they would look at, for example, Ola environment, right? One of the main reasons they would like to do that is purely have to do with cost, right? These are pretty expensive systems, right? There is a lot of ingress, egress, there is a lot of data going back and forth, right? Public cloud, you know, it costs money to put data in or actually pull data back, right? And the second one is data residency and security requirements, right? A lot of these things are probably proprietary set of information. We talked about life sciences, there's a lot of research, right? >>Manufacturing, a lot of these things are just, just in time decision making, right? You are on a factory floor, you gotta be able to do that. Now there is a latency requirement. So I mean, I think a lot of things play, you know, plays into this outside of just cost, but data residency requirements, ingress, egress are big things. And when you're talking about mass moments of data you wanna put and pull it back in, they would like to kind of keep it close, keep it local, and you know, get a, get a, get a price >>Point. Nevertheless, I mean, we were just talking to Ian Coley from aws and he was talking about how customers have the need to sort of move workloads back and forth between the cloud and on-prem. That's something that they're addressing without posts. You are very much in the, in the on-prem world. Do you have, or will you have facilities for customers to move workloads back and forth? Yeah, >>I wouldn't, I wouldn't necessarily say, you know, Dell's cloud strategy is multi-cloud, right? So we basically, so it kind of falls into three, I mean we, some customers, some workloads are suited always for public cloud. It's easier to consume, right? There are, you know, customers also consume on-prem, the customers also consuming Kohler. And we also have like Dell's amazing piece of software like storage software. You know, we make some of these things available for customers to consume a software IP on their public cloud, right? So, you know, so this is our multi-cloud strategy. So we announced a project in Alpine, in Delta fold. So you know, if you look at those, basically customers are saying, I love your Dell IP on this, on this product, on the storage, can you make it available through, in this public environment, whether, you know, it's any of the hyper skill players. So if we do all of that, right? So I think it's, it shows that, you know, it's not always tied to an infrastructure, right? Customers want to consume the best thumb and if we need to be consumed in hyperscale, we can make it available. >>Do you support containers? >>Yeah, we do support containers on hpc. We have, we have two container orchestrators we have to support. We, we, we have aner similarity, we also have a container options to customers. Both options. >>What kind of customers are you signing up for the, for the HPC offerings? Are they university research centers or is it tend to be smaller >>Companies? It, it's, it's, you know, the last three days, this conference has been great. We probably had like, you know, many, many customers talking to us. But HC somewhere in the range of 40, 50 customers, I would probably say lot of interest from educational institutions, universities research, to your point, a lot of interest from manufacturing, factory floor automation. A lot of customers want to do dynamic simulations on factory floor. That is also quite a bit of interest from life sciences pharmacies because you know, like I said, we have two designs, one on life sciences, one on manufacturing, both with different dynamics on the infrastructure. So yeah, quite a, quite a few interest definitely from academics, from life sciences, manufacturing. We also have a lot of financials, big banks, you know, who wants to simulate a lot of the, you know, brokerage, a lot of, lot of financial data because we have some, you know, really optimized hardware we announced in Dell for, especially for financial services. So there's quite a bit of interest from financial services as well. >>That's why that was great. We often think of Dell as, as the organization that democratizes all things in it eventually. And, and, and, and in that context, you know, this is super computing 22 HPC is like the little sibling trailing around, trailing behind the super computing trend. But we definitely have seen this move out of just purely academia into the business world. Dell is clearly a leader in that space. How has Apex overall been doing since you rolled out that strategy, what, two couple? It's been, it's been a couple years now, hasn't it? >>Yeah, it's been less than two years. >>How are, how are, how are mainstream Dell customers embracing Apex versus the traditional, you know, maybe 18 months to three year upgrade cycle CapEx? Yeah, >>I mean I look, I, I think that is absolutely strong momentum for Apex and like we, Paul pointed out earlier, we started with, you know, making the infrastructure and the platforms available to customers to consume as a service, right? We have options for customers, you know, to where Dell can fully manage everything end to end, take a lot of the pain points away, like we talked about because you know, managing a cloud scale, you know, basically environment for the customers, we also have options where customers would say, you know what, I actually have a pretty sophisticated IT organization. I want Dell to manage the infrastructure, but up to this level in the layer up to the guest operating system, I'll take care of the rest, right? So we are seeing customers who are coming to us with various requirements in terms of saying, I can do up to here, but you take all of this pain point away from me or you do everything for me. >>It all depends on the customer. So we do have wide interest. So our, I would say our products and the portfolio set in Apex is expanding and we are also learning, right? We are getting a lot of feedback from customers in terms of what they would like to see on some of these offers. Like the example we just talked about in terms of making some of the software IP available on a public cloud where they'll look at Dell as a software player, right? That's also is absolutely critical. So I think we are giving customers a lot of choices. Our, I would say the choice factor and you know, we are democratizing, like you said, expanding in terms of the customer choices. And I >>Think it's, we're almost outta our time, but I do wanna be sure we get to Dell validated designs, which you've mentioned a couple of times. How specific are the, well, what's the purpose of these designs? How specific are they? >>They, they are, I mean I, you know, so the most of these valid, I mean, again, we look at these industries, right? And we look at understanding exactly how would, I mean we have huge embedded base of customers utilizing HPC across our ecosystem in Dell, right? So a lot of them are CapEx customers. We actually do have an active customer profile. So these validated designs takes into account a lot of customer feedback, lot of partner feedback in terms of how they utilize this. And when you build these solutions, which are kind of end to end and integrated, you need to start anchoring on something, right? And a lot of these things have different characteristics. So these validated design basically prove to us that, you know, it gives a very good jump off point for customers. That's the way I look at it, right? So a lot of them will come to the table with, they don't come to the blank sheet of paper when they say, oh, you know what I'm, this, this is my characteristics of what I want. I think this is a great point for me to start from, right? So I think that that gives that, and plus it's the power of validation, really, right? We test, validate, integrate, so they know it works, right? So all of those are hypercritical. When you talk to, >>And you mentioned healthcare, you, you mentioned manufacturing, other design >>Factoring. We just announced validated design for financial services as well, I think a couple of days ago in the event. So yep, we are expanding all those DVDs so that we, we can, we can give our customers a choice. >>We're out of time. Sat ier. Thank you so much for joining us. Thank you. At the center of the move to subscription to everything as a service, everything is on a subscription basis. You really are on the leading edge of where, where your industry is going. Thanks for joining us. >>Thank you, Paul. Thank you Dave. >>Paul Gillum with Dave Nicholson here from Supercomputing 22 in Dallas, wrapping up the show this afternoon and stay with us for, they'll be half more soon.

Published Date : Nov 17 2022

SUMMARY :

Lots of excitement out there, wouldn't you say, Dave? you know, it's, it's He is the vice Thank you. So it's telecom, it's cloud, it's edge. Can you just give us a quick definition? So this is our way I mean, you don't, you don't have a Dell cloud, right? So this is Dell's way of actually, you know, supporting a multi-cloud strategy for our customers. You, you mentioned something just ahead of us going on air. I mean, you know, one, one of the main things about cloud is the consumption model, right? an offering in the Apex line. we make it super easy for customers to consume and manage it across, you know, proven designs, So, I mean, you know, so this is a platform, if the customers want to actually, you know, extend colo as that OnPrem environment. So, you know, the customers initially figure out, okay, how do they actually require something which is going to require Thunder from various partners that you have, you know? I mean, of course Dell, you know, we, we, So customers, these are very, you know, I would say very intensive systems. you know, we do have a ma no, a lot of customers consume high performance compute and public cloud, in, they would like to kind of keep it close, keep it local, and you know, get a, Do you have, or will you have facilities So you know, if you look at those, basically customers are saying, I love your Dell IP on We have, we have two container orchestrators We also have a lot of financials, big banks, you know, who wants to simulate a you know, this is super computing 22 HPC is like the little sibling trailing around, take a lot of the pain points away, like we talked about because you know, managing a cloud scale, you know, we are democratizing, like you said, expanding in terms of the customer choices. How specific are the, well, what's the purpose of these designs? So these validated design basically prove to us that, you know, it gives a very good jump off point for So yep, we are expanding all those DVDs so that we, Thank you so much for joining us. Paul Gillum with Dave Nicholson here from Supercomputing 22 in Dallas,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TerryPERSON

0.99+

Dave NicholsonPERSON

0.99+

AWSORGANIZATION

0.99+

Ian ColeyPERSON

0.99+

Dave VellantePERSON

0.99+

Terry RamosPERSON

0.99+

DavePERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

EuropeLOCATION

0.99+

Paul GellPERSON

0.99+

DavidPERSON

0.99+

Paul GillumPERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

John FurrierPERSON

0.99+

Andy JassyPERSON

0.99+

190 daysQUANTITY

0.99+

AmazonORGANIZATION

0.99+

PaulPERSON

0.99+

European Space AgencyORGANIZATION

0.99+

Max PetersonPERSON

0.99+

DellORGANIZATION

0.99+

CIAORGANIZATION

0.99+

AfricaLOCATION

0.99+

oneQUANTITY

0.99+

Arcus GlobalORGANIZATION

0.99+

fourQUANTITY

0.99+

BahrainLOCATION

0.99+

D.C.LOCATION

0.99+

EvereeORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

JohnPERSON

0.99+

UKLOCATION

0.99+

four hoursQUANTITY

0.99+

USLOCATION

0.99+

DallasLOCATION

0.99+

Stu MinimanPERSON

0.99+

Zero DaysTITLE

0.99+

NASAORGANIZATION

0.99+

WashingtonLOCATION

0.99+

Palo Alto NetworksORGANIZATION

0.99+

CapgeminiORGANIZATION

0.99+

Department for Wealth and PensionsORGANIZATION

0.99+

IrelandLOCATION

0.99+

Washington, DCLOCATION

0.99+

an hourQUANTITY

0.99+

ParisLOCATION

0.99+

five weeksQUANTITY

0.99+

1.8 billionQUANTITY

0.99+

thousandsQUANTITY

0.99+

GermanyLOCATION

0.99+

450 applicationsQUANTITY

0.99+

Department of DefenseORGANIZATION

0.99+

AsiaLOCATION

0.99+

John WallsPERSON

0.99+

Satish IyerPERSON

0.99+

LondonLOCATION

0.99+

GDPRTITLE

0.99+

Middle EastLOCATION

0.99+

42%QUANTITY

0.99+

Jet Propulsion LabORGANIZATION

0.99+

Kirk Haslbeck, Collibra, Data Citizens 22


 

(atmospheric music) >> Welcome to theCUBE Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Haslbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you, welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now? And why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before, and the variety has changed and the volume has grown. And while I think that remains true there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave you and I were going to build a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be, what those incidents would look like. Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, kind of starts there, where everybody's realizing that we're all data companies, and if we are using bad data we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him cause I didn't know. And I used to ask those types of questions about any lock breaks and cylinders, and if it's manual or automatic. And I realized, I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality, and for years we did the MIT, CDO, IQ event. We didn't do it last year at COVID, messed everything up. But the observation I would make there, your thoughts is, data quality used to be information quality, used to be this back office function, and then it became sort of front office with financial services, and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well they sort of flipped the bit from sort of a data as a risk to data as an asset. And now as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's so fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app, and I check out the Nasdaq market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources, and data providers, and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is, we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting, and why I think the CDO is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale, and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing, a lot of the topic and areas changing. The way that I like to say it or break it down because the lingo is constantly moving, as a target on the space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong, and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data, and why would that matter? Well, if your data never arrived, or only part of it arrived, or didn't arrive on time, it's likely stale, and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data. But it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff? Made the acquisition, you got data quality, observability coming together. You guys have a lot of expertise in this area, but you hear providence of data. You just talked about stale data, the whole trend toward realtime. How is Collibra approaching the problem and what's unique about your approach? >> Well I think where we're fortunate is with our background. Myself and team, we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with, before it was called data observability or reliability, was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights. And they want to see break records and breaking trends together, so they can correlate the root cause. And we hear that all the time. "I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact." So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows you can actually achieve total data governance. At this point with the acquisition of what was a Lineage company years ago, and then my company OwlDQ, now Collibra Data Quality. Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was. They would just say, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you got to announce new products, right? It is your yearly event. What's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's this, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and BigQuery, and Databricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook into these databases, and while we've always worked with the same databases in the past they're supported today. We're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now? Is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peak for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described, you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, Databricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool. But then Google's got the open data cloud. If you heard, Google next. And now Databricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way, up until now I'm hearing, to really understand the relationships between all those and have confidence across, it's like yamarket AMI, you should just be a note on the mesh. I don't care if it's a data warehouse or a data lake, or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And it's, for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into BigQuery tomorrow using our tool, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage in access, privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because, we've got talks about well the cloud guys you're going to own the world. And of course now we're seeing that the ecosystem is finding so much white space to add value connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts on the trends that we've talked about and data Citizens 22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there they want to know where everything is, where their sensitive data is, if it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SaaS based solutions, and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent. All right, Kirk Haslbeck, thanks so much for coming on theCUBE and previewing Data Citizens 22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right. And thank you for watching. Keep it right there for more coverage from theCUBE. (atmospheric music)

Published Date : Nov 2 2022

SUMMARY :

Kirk, good to see you, welcome. Excited to be here. And now you lead data quality at Collibra. And it's so complex that the And now as we say, we're going and I check out the Nasdaq market cap. of the thing that you're observing and what's unique about your approach? ahead of the curve there and some examples, And the one right now is these and has the proper lineage, providence. and get the answers. And of course now we're and solutions that hopefully and previewing Data Citizens 22. And thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

DavePERSON

0.99+

CollibraORGANIZATION

0.99+

2010DATE

0.99+

Kirk HaslbeckPERSON

0.99+

oneQUANTITY

0.99+

OwlDQORGANIZATION

0.99+

KirkPERSON

0.99+

50 dayQUANTITY

0.99+

GoogleORGANIZATION

0.99+

10 dayQUANTITY

0.99+

DatabricksORGANIZATION

0.99+

two sidesQUANTITY

0.99+

last yearDATE

0.99+

Collibra Data QualityORGANIZATION

0.99+

next weekDATE

0.99+

Data CitizensORGANIZATION

0.99+

tomorrowDATE

0.98+

two other thingsQUANTITY

0.98+

BigQueryTITLE

0.98+

five secondsQUANTITY

0.98+

one clickQUANTITY

0.97+

todayDATE

0.97+

CollibraTITLE

0.96+

Wall StreetLOCATION

0.96+

SQL PushdownTITLE

0.94+

Data Citizens 22ORGANIZATION

0.93+

COVIDORGANIZATION

0.93+

SnowflakeTITLE

0.91+

NasdaqORGANIZATION

0.9+

Data Citizens 22ORGANIZATION

0.89+

Delta LakeTITLE

0.89+

EgressORGANIZATION

0.89+

MITEVENT

0.89+

more than a coupleQUANTITY

0.87+

a decade agoDATE

0.85+

zeroQUANTITY

0.84+

CitizensORGANIZATION

0.83+

Data Citizens 2022 CollibraEVENT

0.83+

yearsDATE

0.81+

thousands of dataQUANTITY

0.8+

Data Citizens 22TITLE

0.78+

two domain expertsQUANTITY

0.77+

SnowflakeORGANIZATION

0.76+

IQEVENT

0.76+

coupleQUANTITY

0.75+

CollibraPERSON

0.75+

theCUBEORGANIZATION

0.71+

many numbersQUANTITY

0.7+

Vice PresidentPERSON

0.68+

LineageORGANIZATION

0.66+

DatabricksTITLE

0.64+

too long agoDATE

0.62+

threeQUANTITY

0.6+

DataORGANIZATION

0.57+

CDOEVENT

0.53+

minuteQUANTITY

0.53+

CDOTITLE

0.53+

numberQUANTITY

0.51+

AMIORGANIZATION

0.44+

QualityPERSON

0.43+

Collibra Data Citizens 22


 

>>Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions and they were largely confined to regulatory regulated industries that had to comply with public policy mandates. But as the cloud went mainstream, the tech giants showed us how valuable data could become and the value proposition for data quality and trust. It evolved from primarily a compliance driven issue to becoming a lynchpin of competitive advantage. But data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper specialized skills to develop data architectures and processes to serve the myriad data needs of organizations. >>And it resulted in a lot of frustration with data initiatives for most organizations that didn't have the resources of the cloud guys and the social media giants to really attack their data problems and turn data into gold. This is why today for example, this quite a bit of momentum to rethinking monolithic data architectures. You see, you hear about initiatives like data mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business Uni users, you hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that, but also how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. >>In other words, while it's enticing to experiment and run fast and loose with data initiatives kinda like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated. And intelligence governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is gonna use data that isn't trusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. >>Hello and welcome to the Cube's coverage of Data Citizens made possible by Calibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Ante and I'm one of the hosts of our program, which is running in parallel to data citizens. Now at the Cube we like to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the themes from the keynote speakers at Data Citizens and we'll hear from several of the executives. Felix Von Dala, who is the co-founder and CEO of Collibra, will join us along with one of the other founders of Collibra, Stan Christians, who's gonna join my colleague Lisa Martin. I'm gonna also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the, the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Hasselbeck. >>He's the vice president of Data quality at Collibra. He's an amazingly smart dude who founded Owl dq, a company that he sold to Col to Collibra last year. Now many companies, they didn't make it through the Hado era, you know, they missed the industry waves and they became Driftwood. Collibra, on the other hand, has evolved its business. They've leveraged the cloud, expanded its product portfolio, and leaned in heavily to some major partnerships with cloud providers, as well as receiving a strategic investment from Snowflake earlier this year. So it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. >>Last year, the Cube Covered Data Citizens Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hado movement, we had data lakes, we'd spark the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of ai, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you know, more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights for data, trust the data, and begin to use that data in new ways, fueling data, products, monetization and insights data citizens 2022 is back and we're pleased to have Felix Van Dema, who is the founder and CEO of Collibra. He's on the cube or excited to have you, Felix. Good to see you again. >>Likewise Dave. Thanks for having me again. >>You bet. All right, we're gonna get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever and get current on what Collibra has been up to over the past year and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >>Yeah, absolutely. And, and I think you said it well, Dave, and and the intro that that rising complexity and fragmentation in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use has only gotten kinda more, more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well, which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. >>So it's become much more acute. And, and to your earlier point, we do live in a different world and and the the past couple of years we could probably just kind of brute for it, right? We could focus on, on the top line. There was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are, are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, How do we truly get value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale it data, not just from a a technology and infrastructure perspective, but how do you actually scale data from an organizational perspective, right? You said at the the people and process, how do we do that at scale? And that's only, only only becoming much more important. And we do believe that the, the economic environment that we find ourselves in today is gonna be catalyst for organizations to really dig out more seriously if, if, if, if you will, than they maybe have in the have in the best. >>You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated it was gonna get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >>Yeah, absolutely. We, we started Colli in 2008. So in some sense and the, the last kind of financial crisis, and that was really the, the start of Colli where we found product market fit, working with large finance institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis and kind of here we are again in a very different environment, of course 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it data citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we are still relatively early in that, in that journey. >>Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a company and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your, your current momentum? >>Yeah, absolutely. Again, there's, there's a lot of tail organizations that are only maturing the data practices and we've seen it kind of transform or, or, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world where it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners like Google, Amazon, Snowflake, data bricks and, and others, right? As those kind of new modern data infrastructures, modern data architectures that are definitely all moving to the cloud, a great opportunity for us, our partners and of course our customers to help them kind of transition to the cloud even faster. >>And so we see a lot of excitement and momentum there within an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course, data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging ai, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architecture arch architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical so that they're really excited about about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believe. Then federated focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations. And so that aligns really well with our vision and, and from a product perspective, we've seen a lot of momentum with our customers there as well. >>Yeah, you know, a couple things there. I mean, the acquisition of i l dq, you know, Kirk Hasselbeck and, and their team, it's interesting, you know, the whole data quality used to be this back office function and, and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh. You mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So let's chat a little bit about the, the products. We're gonna go deeper in into products later on at, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the, the the under the covers in security, sort of making data more accessible for people just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >>Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission, either customers are still start, are just starting on that, on that journey. We wanna make it as easy as possible for the, for our organization to actually get started because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for really to, to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving. >>A lot of kind of ease of adoption, ease of use, but also then how do we make sure that lio becomes this kind of mission critical enterprise platform from a security performance architecture scale supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme from an innovation perspective, From a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction, how to make it easy. How do we make, how do we make available to true kind of shopping experience that anybody in your organization can, in a very easy search first way, find the right data product, find the right dataset, that data can then consume usage analytics. How do you, how do we help organizations drive adoption, tell them where they're working really well and where they have opportunities homepages again to, to make things easy for, for people, for anyone in your organization to kind of get started with ppia, you mentioned workflow designer, again, we have a very powerful enterprise platform. >>One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around K Bear Protect, which in partnership with Snowflake, which has been a strategic investor in kib, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data is managed as much more effective, effective rate, really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. >>So we launch more data quality cloud product as well as making use of those, those native compute capabilities in platforms like Snowflake, Data, Bricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down. So actually pushing down the computer and data quality, the monitoring into the underlying platform, which again, from a scale performance and ease of use perspective is gonna make a massive difference. And then more broadly, we, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So there's a lot coming out. The, the team has been work at work really hard and we are really, really excited about what we are coming, what we're bringing to markets. >>Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you, you talked about, you know, the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard. So how do you see sort of the future and, you know, give us the, your closing thoughts please? >>Yeah, absolutely. And I, and I think we we're really at this pivotal moment, and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not gonna fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can. It's kind of, of our, it's our mission. And so I'm really, really excited to see what we, what we are gonna, how the marks gonna evolve over the next, next few quarters and years. I think the trend is clearly there when we talk about data mesh, this kind of federated approach folks on data products is just another signal that we believe that a lot of our organization are now at the time. >>The understanding need to go beyond just the technology. I really, really think about how do we actually scale data as a business function, just like we've done with it, with, with hr, with, with sales and marketing, with finance. That's how we need to think about data. I think now is the time given the economic environment that we are in much more focus on control, much more focused on productivity efficiency and now's the time. We need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >>Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much and good luck in, in San Diego. I know you're gonna crush it out there. >>Thank you Dave. >>Yeah, it's a great spot for an in-person event and, and of course the content post event is gonna be available@collibra.com and you can of course catch the cube coverage@thecube.net and all the news@siliconangle.com. This is Dave Valante for the cube, your leader in enterprise and emerging tech coverage. >>Hi, I'm Jay from Collibra's Data Office. Today I want to talk to you about Collibra's data intelligence cloud. We often say Collibra is a single system of engagement for all of your data. Now, when I say data, I mean data in the broadest sense of the word, including reference and metadata. Think of metrics, reports, APIs, systems, policies, and even business processes that produce or consume data. Now, the beauty of this platform is that it ensures all of your users have an easy way to find, understand, trust, and access data. But how do you get started? Well, here are seven steps to help you get going. One, start with the data. What's data intelligence? Without data leverage the Collibra data catalog to automatically profile and classify your enterprise data wherever that data lives, databases, data lakes or data warehouses, whether on the cloud or on premise. >>Two, you'll then wanna organize the data and you'll do that with data communities. This can be by department, find a business or functional team, however your organization organizes work and accountability. And for that you'll establish community owners, communities, make it easy for people to navigate through the platform, find the data and will help create a sense of belonging for users. An important and related side note here, we find it's typical in many organizations that data is thought of is just an asset and IT and data offices are viewed as the owners of it and who are really the central teams performing analytics as a service provider to the enterprise. We believe data is more than an asset, it's a true product that can be converted to value. And that also means establishing business ownership of data where that strategy and ROI come together with subject matter expertise. >>Okay, three. Next, back to those communities there, the data owners should explain and define their data, not just the tables and columns, but also the related business terms, metrics and KPIs. These objects we call these assets are typically organized into business glossaries and data dictionaries. I definitely recommend starting with the topics that are most important to the business. Four, those steps that enable you and your users to have some fun with it. Linking everything together builds your knowledge graph and also known as a metadata graph by linking or relating these assets together. For example, a data set to a KPI to a report now enables your users to see what we call the lineage diagram that visualizes where the data in your dashboards actually came from and what the data means and who's responsible for it. Speaking of which, here's five. Leverage the calibra trusted business reporting solution on the marketplace, which comes with workflows for those owners to certify their reports, KPIs, and data sets. >>This helps them force their trust in their data. Six, easy to navigate dashboards or landing pages right in your platform for your company's business processes are the most effective way for everyone to better understand and take action on data. Here's a pro tip, use the dashboard design kit on the marketplace to help you build compelling dashboards. Finally, seven, promote the value of this to your users and be sure to schedule enablement office hours and new employee onboarding sessions to get folks excited about what you've built and implemented. Better yet, invite all of those community and data owners to these sessions so that they can show off the value that they've created. Those are my seven tips to get going with Collibra. I hope these have been useful. For more information, be sure to visit collibra.com. >>Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. My name is Dave Valante. With us is Kirk Hasselbeck, who's the vice president of Data Quality of Collibra Kirk, good to see you. Welcome. >>Thanks for having me, Dave. Excited to be here. >>You bet. Okay, we're gonna discuss data quality observability. It's a hot trend right now. You founded a data quality company, OWL dq, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >>Yeah, absolutely. It's, it's definitely exciting times for data quality, which you're right, has been around for a long time. So why now and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And, and while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as, as to why this is becoming so important now. And, and I guess you could kind of break this down simply and think about if Dave, you and I were gonna build, you know, a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, you know, what the ramifications could be, what, what those incidents would look like, or maybe better yet, we try to build a, a new trading algorithm with a crossover strategy where the 50 day crosses the, the 10 day average. >>And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, you know, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. You know, I, I bought a car not too long ago and my dad called and said, How many cylinders does it have? And I realized in that moment, you know, I might have failed him because, cause I didn't know. And, and I used to ask those types of questions about any lock brakes and cylinders and, and you know, if it's manual or, or automatic and, and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips, I, I really don't know that much about it. >>And, and that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the, the individuals loading and consuming all of this data for the company actually may not know that much about the data itself, and that's not even their job anymore. So we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >>You know, the other thing too about data quality, and for years we did the MIT CDO IQ event, we didn't do it last year, Covid messed everything up. But the observation I would make there thoughts is, is it data quality? Used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a, a risk to data as a, as an asset. And now as we say, we're gonna talk about observability. And so it's really become front and center just the whole quality issue because data's so fundamental, hasn't it? >>Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my, my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And, and that's kind of what's going on. There's, there's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before calibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is, is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, you know, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's, it's not ever going to be based on one or two domain experts anymore. >>So, So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they, are they cousins? What's your perspective on that? >>Yeah, it's, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the, the lingo is constantly moving is, you know, as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens, it's wrong and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. You know, everybody's talking about fresh data and stale data and, and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good in the bads. That was kind of your, your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data, but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >>So what's the Collibra angle on all this stuff made the acquisition, you got data quality observability coming together, you guys have a lot of expertise in, in this area, but you hear providence of data, you just talked about, you know, stale data, you know, the, the whole trend toward real time. How is Calibra approaching the problem and what's unique about your approach? >>Well, I think where we're fortunate is with our background, myself and team, we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the, the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution, it's more advanced than some of the observation techniques that that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights, and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong, just show me the big picture, help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows, you can actually achieve total data governance at this point with the acquisition of what was a Lineage company years ago, and then my company Ldq now Collibra, Data quality Collibra may be the best positioned for total data governance and intelligence in the space. >>Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you gotta announce new products, right? You're yearly event what's, what's new. Give us a sense as to what products are coming out, but specifically around data quality and observability. >>Absolutely. There's this, you know, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks is Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook in to these databases. And while we've always worked with the the same databases in the past, they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did your, my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? >>And with these native integrations that we're building and about to unveil, here's kind of a sneak peek for, for next week at Data Citizens. We're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration, you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress, cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >>So this is interesting because what you just described, you know, you mentioned Snowflake, you mentioned Google, Oh actually you mentioned yeah, data bricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool, but then Google's got the open data cloud. If you heard, you know, Google next and now data bricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm, I'm hearing to, to really understand the relationships between all those and have confidence across, you know, it's like Jak Dani, you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And, and, and that's what you're bringing to the table, Is that right? Did I get that right? >>Yeah, that's right. And it's, for us, it's, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now, we can send them the, the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network costs, zero egress cost, zero latency of time. And so when you were to log into Big Query tomorrow using our tool or like, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there, just like all of the major brands that you mentioned, but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And, and we think that this positions us to be the leader there. >>I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. All right, Kirk, give us your, your final thoughts and on on the trends that we've talked about and Data Citizens 22. >>Absolutely. Well, I think, you know, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there, they wanna know where everything is, where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're gonna see more one click solutions, more SAS based solutions and solutions that hopefully prove faster time to value on, on all of these modern cloud platforms. >>Excellent. All right, Kurt Hasselbeck, thanks so much for coming on the Cube and previewing Data Citizens 22. Appreciate it. >>Thanks for having me, Dave. >>You're welcome. Right, and thank you for watching. Keep it right there for more coverage from the Cube. Welcome to the Cube's virtual Coverage of Data Citizens 2022. My name is Dave Valante and I'm here with Laura Sellers, who's the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >>Thank you. Nice to be here. >>Yeah, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now when I think about historically fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >>Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Calibra a little bit a over a year ago was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, Oh, go ahead. >>I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it was just so complicated. But, but please carry on. I'd love to hear more about this. >>Yeah, I, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create. And also helping with data literacy, with something like usage analytics, it's really about driving adoption of the CLE platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called workflow designer. And we love our workflows at Libra, it's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflow flows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >>Y yes, there's definitely a lot to unpack there. I I, you know, you mentioned this idea of, of of, of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. I let's go with analogy. Why is it so important to data consumers? >>I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >>Yeah, I think when you, you, you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges, you know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies, you know, like the, the tech comes fast and furious. You got all these open source projects and get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view and how can organizations overcome these challenges? >>You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of, of and understand all the technologies that are coming. You also look at as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. >>It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is d is is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage, it's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >>Here's Topica data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, g gdpr, and, you know, California Consumer Privacy Act all becomes, becomes so much important. The cloud is really changed things in terms of performance and scale and of course partnering for, for, with Snowflake it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as a, as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically inter interested in sort of joint engineering or, and product innovation efforts, you know, beyond the standard go to market stuff? >>Definitely. So you mentioned there were a strategic investor in Calibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of our unified platform or touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which, which has always existed. We're able to profile and classify that data we're announcing with Calibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforce. So again, people can get more value out of their snowflake more quickly as far as time to value with, with our policies for all business users to be able to create. >>We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed with within Snowflake as well as the data quality. Pushdown, as I mentioned, data quality, you brought it up. It is a new, it is a, a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is again, a big ease of use push for us at Collibra of that ability to, to push it into snowflake, take advantage of the data, the data source, and the engine that already lives there and get the right and make sure you have the right quality. >>I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and, and that governance that I, that I need. You know, we've said many times on the cube that one of the notable differences in cloud this decade versus last decade, I mean ob there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in, in the hyperscale offerings cuz you got more stack, you know, mature stack capabilities and you know, it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google, whomever, and what do you see as your role and what's the Collibra sweet spot? >>Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and kbra protect there, but also tighter data plex integration. So similar to what you've seen with our strategic moves around Snowflake and, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of data plex. We also have great partners in SI's Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's as they're really important to help people with their whole data strategy and driving that data driven culture and, and Collibra being the core of it. >>Hi Laura, we're gonna, we're gonna end it there, but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >>Yeah, definitely. So I, I wanna say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust and get access to that data going forward. >>Well congratulations on all the pro progress. It was great to have you on the cube first time I believe, and really appreciate you, you taking the time with us. >>Yes, thank you for your time. >>You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the cube, your leader in enterprise and emerging tech coverage. >>So data modernization oftentimes means moving some of your storage and computer to the cloud where you get the benefit of scale and security and so on. But ultimately it doesn't take away the silos that you have. We have more locations, more tools and more processes with which we try to get value from this data. To do that at scale in an organization, people involved in this process, they have to understand each other. So you need to unite those people across those tools, processes, and systems with a shared language. When I say customer, do you understand the same thing as you hearing customer? Are we counting them in the same way so that shared language unites us and that gives the opportunity for the organization as a whole to get the maximum value out of their data assets and then they can democratize data so everyone can properly use that shared language to find, understand, and trust the data asset that's available. >>And that's where Collibra comes in. We provide a centralized system of engagement that works across all of those locations and combines all of those different user types across the whole business. At Collibra, we say United by data and that also means that we're united by data with our customers. So here is some data about some of our customers. There was the case of an online do it yourself platform who grew their revenue almost three times from a marketing campaign that provided the right product in the right hands of the right people. In other case that comes to mind is from a financial services organization who saved over 800 K every year because they were able to reuse the same data in different kinds of reports and before there was spread out over different tools and processes and silos, and now the platform brought them together so they realized, oh, we're actually using the same data, let's find a way to make this more efficient. And the last example that comes to mind is that of a large home loan, home mortgage, mortgage loan provider where they have a very complex landscape, a very complex architecture legacy in the cloud, et cetera. And they're using our software, they're using our platform to unite all the people and those processes and tools to get a common view of data to manage their compliance at scale. >>Hey everyone, I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizens. Stan, it's great to have you back on the cube. >>Hey Lisa, nice to be. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation. It also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow gonna to be a data citizen, right? So you need to make sure that these people are aware of it. You need that. People have skills and competencies to do with data what necessary and that's on, all right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in dashboard to actually make that decision and take that action. Right? >>And once you have that why to the organization, that's when you have a good data culture. Now that's continuous effort for most organizations because they're always moving, somehow they're hiring new people and it has to be continuous effort because we've seen that on the hand. Organizations continue challenged their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefit. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example, and we did an IDC study earlier this year, quite interesting. I can recommend anyone to it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this re reasons you're trying to bring both of those together and the ones that get data intelligence right, are successful and competitive. That's, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, the organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, laptops, what have you, you're not using those, right? Or you know, you're delivering them throughout organization, but not enabling your colleagues to actually do something with that asset. Same thing as through with data today, right? If you're not properly using the data asset and competitors are, they're gonna to get more advantage. So as to how you get this done, establish this. There's angles to look at, Lisa. So one angle is obviously the leadership whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? So I'm just gonna summarize it as a data leader for a second. >>So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can leadership in the organization and also the business value. And that's important. Cause those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You, I really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like s then the in place to really start upgrading that culture inch by inch if you'll, >>Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Calibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speaks here, are very excited. You know, we have Barb from m MIT speaking about data monetization. We have Dilla at the last minute. So really exciting agen agenda. Can't wait to get back out there essentially. So over the years at, we've doing this since two and eight, so a good years and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, if you, so everybody's wearing all sorts of hat at time. But over the years I've run, you know, presales that sales partnerships, product cetera. And as our company got a little bit biggish, we're now thousand two. Something like people in the company. >>I believe systems and processes become a lot important. So we said you CBRA isn't the size our customers we're getting there in of organization structure, process systems, et cetera. So we said it's really time for us to put our money where is and to our own data office, which is what we were seeing customers', organizations worldwide. And they organizations have HR units, they have a finance unit and over time they'll all have a department if you'll, that is responsible somehow for the data. So we said, ok, let's try to set an examples that other people can take away with it, right? Can take away from it. So we set up a data strategy, we started building data products, took care of the data infrastructure. That's sort of good stuff. And in doing all of that, ISA exactly as you said, we said, okay, we need to also use our product and our own practices and from that use, learn how we can make the product better, learn how we make, can make the practice better and share that learning with all the, and on, on the Monday mornings, we sometimes refer to eating our dog foods on Friday evenings. >>We referred to that drinking our own champagne. I like it. So we, we had a, we had the driver to do this. You know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should, this is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders, if you'll or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products can run, the data can flow and you know, the quality can be checked. >>And then we have a data intelligence or data governance builders where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the CBRA approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a map and started execution use of the use case. And a important ones are very simple. We them with our, our customers as well, people talking about the cata, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and privacy. So they have their process registry and they can see how the data flows. >>So that's a starting place and that turns into a marketplace so that if new analysts and data citizens join kbra, they immediately have a place to go to, to look at, see, ok, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? So they can immediately get access data. And another one that we is around trusted business. We're seeing that since, you know, self-service BI allowed everyone to make beautiful dashboards, you know, pie, pie charts. I always, my pet pee is the pie chart because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted this reporting. So we know if a, the dashboard, a data product essentially is built, we not that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either, Right. And that silver browser, right? Absolutely >>Decay. >>Exactly. Yes, >>Absolutely. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with to your organization, but there's a few that we use that might be of interest. Use those pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Are the, is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data sciences and products. Are people using them? Are they getting value from it? >>Can we calculate that value in ay perspective, right? Yeah. So that we can to the rest of the business continue to say we're tracking all those numbers and those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or, or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so and so forth. So these are an set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in thousand 10 there may have been like 10 achieve data officers or something. Gartner has exact numbers on them, but then they grew, you know, industries and the number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data. So you'll see over the years that's gonna evolve more digital and more data products. So for next years, my, my prediction is it's all products because it's an immediate link between data and, and the essentially, right? Right. So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. I think there's gonna be a continued challenge for the chief officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? >>So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful and the ones who get that will the ones that do it on the basis of data monetization, right? Connecting value to the data and making that value clear to all the data citizens in the organization, right? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an asset. >>Absolutely. Because there's so much value that can be extracted. Organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful and being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the cube at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage. >>Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra. Remember, all these videos are available on demand@thecube.net. And don't forget to check out silicon angle.com for all the news and wiki bod.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. We'll see you soon.

Published Date : Nov 2 2022

SUMMARY :

largely about getting the technology to work. Now the cloud is definitely helping with that, but also how do you automate governance? So you can see how data governance has evolved into to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the So it's a really interesting story that we're thrilled to be sharing And we said at the time, you know, maybe it's time to rethink data innovation. 2020s from the previous decade, and what challenges does that bring for your customers? as data becomes more impactful than important, the level of scrutiny with respect to privacy, So again, I think it just another incentive for organization to now truly look at data You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated the last kind of financial crisis, and that was really the, the start of Colli where we found product market Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners And the second is that those data pipelines that are now being created in the cloud, I mean, the acquisition of i l dq, you know, So that's really the theme of a lot of the innovation that we're driving. And so that's the big theme from an innovation perspective, One of our key differentiators is the ability to really drive a lot of automation through workflows. So actually pushing down the computer and data quality, one of the key principles you think about monetization. And I, and I think we we're really at this pivotal moment, and I think you said it well. We need to look beyond just the I know you're gonna crush it out there. This is Dave Valante for the cube, your leader in enterprise and Without data leverage the Collibra data catalog to automatically And for that you'll establish community owners, a data set to a KPI to a report now enables your users to see what Finally, seven, promote the value of this to your users and Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. And now you lead data quality at Collibra. imagine if we get that wrong, you know, what the ramifications could be, And I realized in that moment, you know, I might have failed him because, cause I didn't know. And it's so complex that the way companies consume them in the IT function is And so it's really become front and center just the whole quality issue because data's so fundamental, nowadays to this topic is, so maybe we could surface all of these problems with So the language is changing a you know, stale data, you know, the, the whole trend toward real time. we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. And the one right now is these hyperscalers in the cloud. And I think if you look at the whole So this is interesting because what you just described, you know, you mentioned Snowflake, And so when you were to log into Big Query tomorrow using our I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, Seeing that across the board, people used to know it was a zip code and nowadays Appreciate it. Right, and thank you for watching. Nice to be here. Can can you explain to our audience why the ability to manage data across the entire organization. I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it And one of the big pushes and passions we have at Collibra is to help with I I, you know, you mentioned this idea of, and really speeding the time to value for any of the business analysts, So where do you see, you know, the friction in adopting new data technologies? So one of the other things we're announcing with, with all of the innovations that are coming is So anybody in the organization is only getting access to the data they should have access to. So it was kind of smart that you guys were early on and We're able to profile and classify that data we're announcing with Calibra Protect this week that and get the right and make sure you have the right quality. I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, We also are doing more with Google around, you know, GCP and kbra protect there, you know, this year, the event your, your perspectives. And so it's all about everybody being able to easily It was great to have you on the cube first time I believe, cube, your leader in enterprise and emerging tech coverage. the cloud where you get the benefit of scale and security and so on. And the last example that comes to mind is that of a large home loan, home mortgage, Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, And one of the conclusions they found as they So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them But the IDC study that you just mentioned demonstrates they're three times So as to how you get this done, establish this. part of the equation of getting that right, is it's not enough to just have that leadership out Talk to us about how you are building a data culture within Collibra and But over the years I've run, you know, So we said you the data products can run, the data can flow and you know, the quality can be checked. The catalog for the data scientists to know what's in their data lake, and data citizens join kbra, they immediately have a place to go to, Yes, success of the data office. So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? to look into a crystal ball, what do you see in terms of the maturation industries and the number is estimated to be about 20,000 right now. How is that going to evolve for the next couple of years? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences And as the data show that you mentioned in that IDC study, the leader in live tech coverage. Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
LauraPERSON

0.99+

Lisa MartinPERSON

0.99+

DavePERSON

0.99+

AmazonORGANIZATION

0.99+

HeinekenORGANIZATION

0.99+

Dave ValantePERSON

0.99+

Laura SellersPERSON

0.99+

2008DATE

0.99+

CollibraORGANIZATION

0.99+

AdobeORGANIZATION

0.99+

Felix Von DalaPERSON

0.99+

GoogleORGANIZATION

0.99+

Felix Van DemaPERSON

0.99+

sevenQUANTITY

0.99+

Stan ChristiansPERSON

0.99+

2010DATE

0.99+

LisaPERSON

0.99+

San DiegoLOCATION

0.99+

JayPERSON

0.99+

50 dayQUANTITY

0.99+

FelixPERSON

0.99+

oneQUANTITY

0.99+

Kurt HasselbeckPERSON

0.99+

Bank of AmericaORGANIZATION

0.99+

10 yearQUANTITY

0.99+

California Consumer Privacy ActTITLE

0.99+

10 dayQUANTITY

0.99+

SixQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

Dave AntePERSON

0.99+

Last yearDATE

0.99+

demand@thecube.netOTHER

0.99+

ETR Enterprise Technology ResearchORGANIZATION

0.99+

BarryPERSON

0.99+

GartnerORGANIZATION

0.99+

one partQUANTITY

0.99+

PythonTITLE

0.99+

2010sDATE

0.99+

2020sDATE

0.99+

CalibraLOCATION

0.99+

last yearDATE

0.99+

twoQUANTITY

0.99+

CalibraORGANIZATION

0.99+

K Bear ProtectORGANIZATION

0.99+

two sidesQUANTITY

0.99+

Kirk HasselbeckPERSON

0.99+

12 monthsQUANTITY

0.99+

tomorrowDATE

0.99+

AWSORGANIZATION

0.99+

BarbPERSON

0.99+

StanPERSON

0.99+

Data CitizensORGANIZATION

0.99+

Kirk Haslbeck, Collibra | Data Citizens '22


 

(bright upbeat music) >> Welcome to theCUBE's Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Hasselbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you. Welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ and it was acquired by Collibra last year. Congratulations! And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now, and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave, you and I were going to build, you know a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be? What those incidents would look like? Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We'll probably have major financial ramifications in that sense. So, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him because 'cause I didn't know. And I used to ask those types of questions about any lock brakes and cylinders and if it's manual or automatic and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute but that's really what's setting the foreground for this observability play and why everybody's so interested, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality and for years we did the MIT CDOIQ event we didn't do it last year at COVID, messed everything up. But the observation I would make there love thoughts is it data quality used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a a risk to data as an asset. And now, as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor. But with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is listening right intently nowadays to this topic is so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So, how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the lingo is constantly moving as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong and when it doesn't, it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff made the acquisition you got data quality observability coming together, you guys have a lot of expertise in this area but you hear providence of data you just talked about stale data, the whole trend toward real time. How is Collibra approaching the problem and what's unique about your approach? >> Well, I think where we're fortunate is with our background, myself and team we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact connecting it with lineage and catalog, metadata. And as that grows, you can actually achieve total data governance. At this point, with the acquisition of what was a lineage company years ago and then my company OwlDQ, now Collibra Data Quality, Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, they just said, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens '22 that you're announcing you got to announce new products, right? Your yearly event, what's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a salike model. And we've started to hook in to these databases. And while we've always worked with the same databases in the past they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands? Did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peek for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra Data Quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, the Data Bricks. Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool but then Google's got the open data cloud. If you heard Google Nest and now Data Bricks doesn't call it the data cloud but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm hearing to really understand the relationships between all those and have confidence across, it's like (indistinct) you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into Big BigQuery tomorrow using our tool or let or say Snowflake, for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because every talks about wow the cloud guys are going to own the world and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts and on the trends that we've talked about and Data Citizens '22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board people used to know it was a zip code and nowadays with the amount of data that's out there, they want to know where everything is where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases, how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SAS-based solutions and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent, all right. Kurt Hasselbeck, thanks so much for coming on theCUBE and previewing Data Citizens '22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right, and thank you for watching. Keep it right there for more coverage from theCUBE.

Published Date : Oct 24 2022

SUMMARY :

Kirk, good to see you. Excited to be here. and it was acquired by Collibra last year. And it's so complex that the And now, as we say, we're going and I check out the NASDAQ market cap. and areas changing the and what's unique about your approach? of the curve there when most and some examples, remember and data activity happens in the database. and has the proper lineage, providence. and get the answers. and on the trends that we've talked about and solutions that hopefully and previewing Data Citizens '22. All right, and thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

DavePERSON

0.99+

CollibraORGANIZATION

0.99+

Kurt HasselbeckPERSON

0.99+

2010DATE

0.99+

oneQUANTITY

0.99+

Kirk HasselbeckPERSON

0.99+

50 dayQUANTITY

0.99+

KirkPERSON

0.99+

10 dayQUANTITY

0.99+

OwlDQORGANIZATION

0.99+

Kirk HaslbeckPERSON

0.99+

next weekDATE

0.99+

GoogleORGANIZATION

0.99+

last yearDATE

0.99+

two sidesQUANTITY

0.99+

thousandsQUANTITY

0.99+

NASDAQORGANIZATION

0.99+

SnowflakeTITLE

0.99+

Data CitizensORGANIZATION

0.99+

Data BricksORGANIZATION

0.99+

two other thingsQUANTITY

0.98+

one clickQUANTITY

0.98+

tomorrowDATE

0.98+

todayDATE

0.98+

five secondsQUANTITY

0.97+

two domainQUANTITY

0.94+

Collibra Data QualityTITLE

0.92+

MIT CDOIQEVENT

0.9+

Data Citizens '22TITLE

0.9+

EgressORGANIZATION

0.89+

Delta LakeTITLE

0.89+

threeQUANTITY

0.86+

zeroQUANTITY

0.85+

Big QueryTITLE

0.85+

about a decade agoDATE

0.85+

SQL PushdownTITLE

0.83+

Data Citizens 2022 CollibraEVENT

0.82+

Big BigQueryTITLE

0.81+

more than a coupleQUANTITY

0.79+

coupleQUANTITY

0.78+

one bigQUANTITY

0.77+

Collibra Data QualityORGANIZATION

0.75+

CollibraOTHER

0.75+

Google NestORGANIZATION

0.75+

Data Citizens '22ORGANIZATION

0.74+

zero latencyQUANTITY

0.72+

SASORGANIZATION

0.71+

SnowflakeORGANIZATION

0.69+

COVIDORGANIZATION

0.69+

years agoDATE

0.68+

Wall StreetLOCATION

0.66+

theCUBEORGANIZATION

0.66+

many numbersQUANTITY

0.63+

CollibraPERSON

0.63+

timesQUANTITY

0.61+

DataORGANIZATION

0.61+

too longDATE

0.6+

Vice PresidentPERSON

0.57+

dataQUANTITY

0.56+

CDOTITLE

0.52+

BricksTITLE

0.48+

Breaking Analysis: CEO Nuggets from Microsoft Ignite & Google Cloud Next


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> This past week we saw two of the Big 3 cloud providers present the latest update on their respective cloud visions, their business progress, their announcements and innovations. The content at these events had many overlapping themes, including modern cloud infrastructure at global scale, applying advanced machine intelligence, AKA AI, end-to-end data platforms, collaboration software. They talked a lot about the future of work automation. And they gave us a little taste, each company of the Metaverse Web 3.0 and much more. Despite these striking similarities, the differences between these two cloud platforms and that of AWS remains significant. With Microsoft leveraging its massive application software footprint to dominate virtually all markets and Google doing everything in its power to keep up with the frenetic pace of today's cloud innovation, which was set into motion a decade and a half ago by AWS. Hello and welcome to this week's Wikibon CUBE Insights, powered by ETR. In this Breaking Analysis, we unpack the immense amount of content presented by the CEOs of Microsoft and Google Cloud at Microsoft Ignite and Google Cloud Next. We'll also quantify with ETR survey data the relative position of these two cloud giants in four key sectors: cloud IaaS, BI analytics, data platforms and collaboration software. Now one thing was clear this past week, hybrid events are the thing. Google Cloud Next took place live over a 24-hour period in six cities around the world, with the main gathering in New York City. Microsoft Ignite, which normally is attended by 30,000 people, had a smaller event in Seattle, in person with a virtual audience around the world. AWS re:Invent, of course, is much different. Yes, there's a virtual component at re:Invent, but it's all about a big live audience gathering the week after Thanksgiving, in the first week of December in Las Vegas. Regardless, Satya Nadella keynote address was prerecorded. It was highly produced and substantive. It was visionary, energetic with a strong message that Azure was a platform to allow customers to build their digital businesses. Doing more with less, which was a key theme of his. Nadella covered a lot of ground, starting with infrastructure from the compute, highlighting a collaboration with Arm-based, Ampere processors. New block storage, 60 regions, 175,000 miles of fiber cables around the world. He presented a meaningful multi-cloud message with Azure Arc to support on-prem and edge workloads, as well as of course the public cloud. And talked about confidential computing at the infrastructure level, a theme we hear from all cloud vendors. He then went deeper into the end-to-end data platform that Microsoft is building from the core data stores to analytics, to governance and the myriad tooling Microsoft offers. AI was next with a big focus on automation, AI, training models. He showed demos of machines coding and fixing code and machines automatically creating designs for creative workers and how Power Automate, Microsoft's RPA tooling, would combine with Microsoft Syntex to understand documents and provide standard ways for organizations to communicate with those documents. There was of course a big focus on Azure as developer cloud platform with GitHub Copilot as a linchpin using AI to assist coders in low-code and no-code innovations that are coming down the pipe. And another giant theme was a workforce transformation and how Microsoft is using its heritage and collaboration and productivity software to move beyond what Nadella called productivity paranoia, i.e., are remote workers doing their jobs? In a world where collaboration is built into intelligent workflows, and he even showed a glimpse of the future with AI-powered avatars and partnerships with Meta and Cisco with Teams of all firms. And finally, security with a bevy of tools from identity, endpoint, governance, et cetera, stressing a suite of tools from a single provider, i.e., Microsoft. So a couple points here. One, Microsoft is following in the footsteps of AWS with silicon advancements and didn't really emphasize that trend much except for the Ampere announcement. But it's building out cloud infrastructure at a massive scale, there is no debate about that. Its plan on data is to try and provide a somewhat more abstracted and simplified solutions, which differs a little bit from AWS's approach of the right database tool, for example, for the right job. Microsoft's automation play appears to provide simple individual productivity tools, kind of a ground up approach and make it really easy for users to drive these bottoms up initiatives. We heard from UiPath that forward five last month, a little bit of a different approach of horizontal automation, end-to-end across platforms. So quite a different play there. Microsoft's angle on workforce transformation is visionary and will continue to solidify in our view its dominant position with Teams and Microsoft 365, and it will drive cloud infrastructure consumption by default. On security as well as a cloud player, it has to have world-class security, and Azure does. There's not a lot of debate about that, but the knock on Microsoft is Patch Tuesday becomes Hack Wednesday because Microsoft releases so many patches, it's got so much Swiss cheese in its legacy estate and patching frequently, it becomes a roadmap and a trigger for hackers. Hey, patch Tuesday, these are all the exploits that you can go after so you can act before the patches are implemented. And so it's really become a problem for users. As well Microsoft is competing with many of the best-of-breed platforms like CrowdStrike and Okta, which have market momentum and appear to be more attractive horizontal plays for customers outside of just the Microsoft cloud. But again, it's Microsoft. They make it easy and very inexpensive to adopt. Now, despite the outstanding presentation by Satya Nadella, there are a couple of statements that should raise eyebrows. Here are two of them. First, as he said, Azure is the only cloud that supports all organizations and all workloads from enterprises to startups, to highly regulated industries. I had a conversation with Sarbjeet Johal about this, to make sure I wasn't just missing something and we were both surprised, somewhat, by this claim. I mean most certainly AWS supports more certifications for example, and we would think it has a reasonable case to dispute that claim. And the other statement, Nadella made, Azure is the only cloud provider enabling highly regulated industries to bring their most sensitive applications to the cloud. Now, reasonable people can debate whether AWS is there yet, but very clearly Oracle and IBM would have something to say about that statement. Now maybe it's not just, would say, "Oh, they're not real clouds, you know, they're just going to hosting in the cloud if you will." But still, when it comes to mission-critical applications, you would think Oracle is really the the leader there. Oh, and Satya also mentioned the claim that the Edge browser, the Microsoft Edge browser, no questions asked, he said, is the best browser for business. And we could see some people having some questions about that. Like isn't Edge based on Chrome? Anyway, so we just had to question these statements and challenge Microsoft to defend them because to us it's a little bit of BS and makes one wonder what else in such as awesome keynote and it was awesome, it was hyperbole. Okay, moving on to Google Cloud Next. The keynote started with Sundar Pichai doing a virtual session, he was remote, stressing the importance of Google Cloud. He mentioned that Google Cloud from its Q2 earnings was on a $25-billion annual run rate. What he didn't mention is that it's also on a 3.6 billion annual operating loss run rate based on its first half performance. Just saying. And we'll dig into that issue a little bit more later in this episode. He also stressed that the investments that Google has made to support its core business and search, like its global network of 22 subsea cables to support things like, YouTube video, great performance obviously that we all rely on, those innovations there. Innovations in BigQuery to support its search business and its threat analysis that it's always had and its AI, it's always been an AI-first company, he's stressed, that they're all leveraged by the Google Cloud Platform, GCP. This is all true by the way. Google has absolutely awesome tech and the talk, as well as his talk, Pichai, but also Kurian's was forward thinking and laid out a vision of the future. But it didn't address in our view, and I talked to Sarbjeet Johal about this as well, today's challenges to the degree that Microsoft did and we expect AWS will at re:Invent this year, it was more out there, more forward thinking, what's possible in the future, somewhat less about today's problem, so I think it's resonates less with today's enterprise players. Thomas Kurian then took over from Sundar Pichai and did a really good job of highlighting customers, and I think he has to, right? He has to say, "Look, we are in this game. We have customers, 9 out of the top 10 media firms use Google Cloud. 8 out of the top 10 manufacturers. 9 out of the top 10 retailers. Same for telecom, same for healthcare. 8 out of the top 10 retail banks." He and Sundar specifically referenced a number of companies, customers, including Avery Dennison, Groupe Renault, H&M, John Hopkins, Prudential, Minna Bank out of Japan, ANZ bank and many, many others during the session. So you know, they had some proof points and you got to give 'em props for that. Now like Microsoft, Google talked about infrastructure, they referenced training processors and regions and compute optionality and storage and how new workloads were emerging, particularly data-driven workloads in AI that required new infrastructure. He explicitly highlighted partnerships within Nvidia and Intel. I didn't see anything on Arm, which somewhat surprised me 'cause I believe Google's working on that or at least has come following in AWS's suit if you will, but maybe that's why they're not mentioning it or maybe I got to do more research there, but let's park that for a minute. But again, as we've extensively discussed in Breaking Analysis in our view when it comes to compute, AWS via its Annapurna acquisition is well ahead of the pack in this area. Arm is making its way into the enterprise, but all three companies are heavily investing in infrastructure, which is great news for customers and the ecosystem. We'll come back to that. Data and AI go hand in hand, and there was no shortage of data talk. Google didn't mention Snowflake or Databricks specifically, but it did mention, by the way, it mentioned Mongo a couple of times, but it did mention Google's, quote, Open Data cloud. Now maybe Google has used that term before, but Snowflake has been marketing the data cloud concept for a couple of years now. So that struck as a shot across the bow to one of its partners and obviously competitor, Snowflake. At BigQuery is a main centerpiece of Google's data strategy. Kurian talked about how they can take any data from any source in any format from any cloud provider with BigQuery Omni and aggregate and understand it. And with the support of Apache Iceberg and Delta and Hudi coming in the future and its open Data Cloud Alliance, they talked a lot about that. So without specifically mentioning Snowflake or Databricks, Kurian co-opted a lot of messaging from these two players, such as life and tech. Kurian also talked about Google Workspace and how it's now at 8 million users up from 6 million just two years ago. There's a lot of discussion on developer optionality and several details on tools supported and the open mantra of Google. And finally on security, Google brought out Kevin Mandian, he's a CUBE alum, extremely impressive individual who's CEO of Mandiant, a leading security service provider and consultancy that Google recently acquired for around 5.3 billion. They talked about moving from a shared responsibility model to a shared fate model, which is again, it's kind of a shot across AWS's bow, kind of shared responsibility model. It's unclear that Google will pay the same penalty if a customer doesn't live up to its portion of the shared responsibility, but we can probably assume that the customer is still going to bear the brunt of the pain, nonetheless. Mandiant is really interesting because it's a services play and Google has stated that it is not a services company, it's going to give partners in the channel plenty of room to play. So we'll see what it does with Mandiant. But Mandiant is a very strong enterprise capability and in the single most important area security. So interesting acquisition by Google. Now as well, unlike Microsoft, Google is not competing with security leaders like Okta and CrowdStrike. Rather, it's partnering aggressively with those firms and prominently putting them forth. All right. Let's get into the ETR survey data and see how Microsoft and Google are positioned in four key markets that we've mentioned before, IaaS, BI analytics, database data platforms and collaboration software. First, let's look at the IaaS cloud. ETR is just about to release its October survey, so I cannot share the that data yet. I can only show July data, but we're going to give you some directional hints throughout this conversation. This chart shows net score or spending momentum on the vertical axis and overlap or presence in the data, i.e., how pervasive the platform is. That's on the horizontal axis. And we've inserted the Wikibon estimates of IaaS revenue for the companies, the Big 3. Actually the Big 4, we included Alibaba. So a couple of points in this somewhat busy data chart. First, Microsoft and AWS as always are dominant on both axes. The red dotted line there at 40% on the vertical axis. That represents a highly elevated spending velocity and all of the Big 3 are above the line. Now at the same time, GCP is well behind the two leaders on the horizontal axis and you can see that in the table insert as well in our revenue estimates. Now why is Azure bigger in the ETR survey when AWS is larger according to the Wikibon revenue estimates? And the answer is because Microsoft with products like 365 and Teams will often be considered by respondents in the survey as cloud by customers, so they fit into that ETR category. But in the insert data we're stripping out applications and SaaS from Microsoft and Google and we're only isolating on IaaS. The other point is when you take a look at the early October returns, you see downward pressure as signified by those dotted arrows on every name. The only exception was Dell, or Dell and IBM, which showing slightly improved momentum. So the survey data generally confirms what we know that AWS and Azure have a massive lead and strong momentum in the marketplace. But the real story is below the line. Unlike Google Cloud, which is on pace to lose well over 3 billion on an operating basis this year, AWS's operating profit is around $20 billion annually. Microsoft's Intelligent Cloud generated more than $30 billion in operating income last fiscal year. Let that sink in for a moment. Now again, that's not to say Google doesn't have traction, it does and Kurian gave some nice proof points and customer examples in his keynote presentation, but the data underscores the lead that Microsoft and AWS have on Google in cloud. And here's a breakdown of ETR's proprietary net score methodology, that vertical axis that we showed you in the previous chart. It asks customers, are you adopting the platform new? That's that lime green. Are you spending 6% or more? That's the forest green. Is you're spending flat? That's the gray. Is you're spending down 6% or worse? That's the pinkest color. Or are you replacing the platform, defecting? That's the bright red. You subtract the reds from the greens and you get a net score. Now one caveat here, which actually is really favorable from Microsoft, the Microsoft data that we're showing here is across the entire Microsoft portfolio. The other point is, this is July data, we'll have an update for you once ETR releases its October results. But we're talking about meaningful samples here, the ends. 620 for AWS over a thousand from Microsoft in more than 450 respondents in the survey for Google. So the real tell is replacements, that bright red. There is virtually no churn for AWS and Microsoft, but Google's churn is 5x, those two in the survey. Now 5% churn is not high, but you'd like to see three things for Google given it's smaller size. One is less churn, two is much, much higher adoption rates in the lime green. Three is a higher percentage of those spending more, the forest green. And four is a lower percentage of those spending less. And none of these conditions really applies here for Google. GCP is still not growing fast enough in our opinion, and doesn't have nearly the traction of the two leaders and that shows up in the survey data. All right, let's look at the next sector, BI analytics. Here we have that same XY dimension. Again, Microsoft dominating the picture. AWS very strong also in both axes. Tableau, very popular and respectable of course acquired by Salesforce on the vertical axis, still looking pretty good there. And again on the horizontal axis, big presence there for Tableau. And Google with Looker and its other platforms is also respectable, but it again, has some work to do. Now notice Streamlit, that's a recent Snowflake acquisition. It's strong in the vertical axis and because of Snowflake's go-to-market (indistinct), it's likely going to move to the right overtime. Grafana is also prominent in the Y axis, but a glimpse at the most recent survey data shows them slightly declining while Looker actually improves a bit. As does Cloudera, which we'll move up slightly. Again, Microsoft just blows you away, doesn't it? All right, now let's get into database and data platform. Same X Y dimensions, but now database and data warehouse. Snowflake as usual takes the top spot on the vertical axis and it is actually keeps moving to the right as well with again, Microsoft and AWS is dominant in the market, as is Oracle on the X axis, albeit it's got less spending velocity, but of course it's the database king. Google is well behind on the X axis but solidly above the 40% line on the vertical axis. Note that virtually all platforms will see pressure in the next survey due to the macro environment. Microsoft might even dip below the 40% line for the first time in a while. Lastly, let's look at the collaboration and productivity software market. This is such an important area for both Microsoft and Google. And just look at Microsoft with 365 and Teams up into the right. I mean just so impressive in ubiquitous. And we've highlighted Google. It's in the pack. It certainly is a nice base with 174 N, which I can tell you that N will rise in the next survey, which is an indication that more people are adopting. But given the investment and the tech behind it and all the AI and Google's resources, you'd really like to see Google in this space above the 40% line, given the importance of this market, of this collaboration area to Google's success and the degree to which they emphasize it in their pitch. And look, this brings up something that we've talked about before on Breaking Analysis. Google doesn't have a tech problem. This is a go-to-market and marketing challenge that Google faces and it's up against two go-to-market champs and Microsoft and AWS. And Google doesn't have the enterprise sales culture. It's trying, it's making progress, but it's like that racehorse that has all the potential in the world, but it's just missing some kind of key ingredient to put it over at the top. It's always coming in third, (chuckles) but we're watching and Google's obviously, making some investments as we shared with earlier. All right. Some final thoughts on what we learned this week and in this research: customers and partners should be thrilled that both Microsoft and Google along with AWS are spending so much money on innovation and building out global platforms. This is a gift to the industry and we should be thankful frankly because it's good for business, it's good for competitiveness and future innovation as a platform that can be built upon. Now we didn't talk much about multi-cloud, we haven't even mentioned supercloud, but both Microsoft and Google have a story that resonates with customers in cross cloud capabilities, unlike AWS at this time. But we never say never when it comes to AWS. They sometimes and oftentimes surprise you. One of the other things that Sarbjeet Johal and John Furrier and I have discussed is that each of the Big 3 is positioning to their respective strengths. AWS is the best IaaS. Microsoft is building out the kind of, quote, we-make-it-easy-for-you cloud, and Google is trying to be the open data cloud with its open-source chops and excellent tech. And that puts added pressure on Snowflake, doesn't it? You know, Thomas Kurian made some comments according to CRN, something to the effect that, we are the only company that can do the data cloud thing across clouds, which again, if I'm being honest is not really accurate. Now I haven't clarified these statements with Google and often things get misquoted, but there's little question that, as AWS has done in the past with Redshift, Google is taking a page out of Snowflake, Databricks as well. A big difference in the Big 3 is that AWS doesn't have this big emphasis on the up-the-stack collaboration software that both Microsoft and Google have, and that for Microsoft and Google will drive captive IaaS consumption. AWS obviously does some of that in database, a lot of that in database, but ISVs that compete with Microsoft and Google should have a greater affinity, one would think, to AWS for competitive reasons. and the same thing could be said in security, we would think because, as I mentioned before, Microsoft competes very directly with CrowdStrike and Okta and others. One of the big thing that Sarbjeet mentioned that I want to call out here, I'd love to have your opinion. AWS specifically, but also Microsoft with Azure have successfully created what Sarbjeet calls brand distance. AWS from the Amazon Retail, and even though AWS all the time talks about Amazon X and Amazon Y is in their product portfolio, but you don't really consider it part of the retail organization 'cause it's not. Azure, same thing, has created its own identity. And it seems that Google still struggles to do that. It's still very highly linked to the sort of core of Google. Now, maybe that's by design, but for enterprise customers, there's still some potential confusion with Google, what's its intentions? How long will they continue to lose money and invest? Are they going to pull the plug like they do on so many other tools? So you know, maybe some rethinking of the marketing there and the positioning. Now we didn't talk much about ecosystem, but it's vital for any cloud player, and Google again has some work to do relative to the leaders. Which brings us to supercloud. The ecosystem and end customers are now in a position this decade to digitally transform. And we're talking here about building out their own clouds, not by putting in and building data centers and installing racks of servers and storage devices, no. Rather to build value on top of the hyperscaler gift that has been presented. And that is a mega trend that we're watching closely in theCUBE community. While there's debate about the supercloud name and so forth, there little question in our minds that the next decade of cloud will not be like the last. All right, we're going to leave it there today. Many thanks to Sarbjeet Johal, and my business partner, John Furrier, for their input to today's episode. Thanks to Alex Myerson who's on production and manages the podcast and Ken Schiffman as well. Kristen Martin and Cheryl Knight helped get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does some wonderful editing. And check out SiliconANGLE, a lot of coverage on Google Cloud Next and Microsoft Ignite. Remember, all these episodes are available as podcast wherever you listen. Just search Breaking Analysis podcast. I publish each week on wikibon.com and siliconangle.com. And you can always get in touch with me via email, david.vellante@siliconangle.com or you can DM me at dvellante or comment on my LinkedIn posts. And please do check out etr.ai, the best survey data in the enterprise tech business. This is Dave Vellante for the CUBE Insights, powered by ETR. Thanks for watching and we'll see you next time on Breaking Analysis. (gentle music)

Published Date : Oct 15 2022

SUMMARY :

with Dave Vellante. and the degree to which they

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AWSORGANIZATION

0.99+

IBMORGANIZATION

0.99+

NadellaPERSON

0.99+

Alex MyersonPERSON

0.99+

NvidiaORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Kevin MandianPERSON

0.99+

OracleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Cheryl KnightPERSON

0.99+

Kristen MartinPERSON

0.99+

Thomas KurianPERSON

0.99+

DellORGANIZATION

0.99+

Ken SchiffmanPERSON

0.99+

OctoberDATE

0.99+

Satya NadellaPERSON

0.99+

SeattleLOCATION

0.99+

John FurrierPERSON

0.99+

3.6 billionQUANTITY

0.99+

Rob HofPERSON

0.99+

SundarPERSON

0.99+

PrudentialORGANIZATION

0.99+

JulyDATE

0.99+

New York CityLOCATION

0.99+

H&MORGANIZATION

0.99+

KurianPERSON

0.99+

twoQUANTITY

0.99+

6%QUANTITY

0.99+

Minna BankORGANIZATION

0.99+

5xQUANTITY

0.99+

Sarbjeet JohalPERSON

0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022


 

(upbeat music) >> Welcome back everyone to The Cubes coverage here in Seattle, Washington. For AWS's Marketplace Seller Conference. It's the big news within the Amazon partner network, combining with marketplace, forming the Amazon partner organization. Part of a big reorg as they grow to the next level, NextGen cloud, mid-game on the chessboard. Cube's got it covered. I'm John Furry, your host at Cube. Great guests here from Data bricks. Both cube alumni's. Jack Anderson, GM and VP of the Databricks partnership team for AWS. You handle that relationship and Joel Minick vice president of product and partner marketing. You guys have the keys to the kingdom with Databricks and AWS. Thanks for joining. Good to see you again. >> Thanks for having us back. >> Yeah, John, great to be here. >> So I feel like we're at Reinvent 2013. Small event, no stage, but there's a real shift happening with procurement. Obviously it's a no brainer on the micro, you know, people should be buying online. Self-service, Cloud Scale. But Amazon's got billions being sold through their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website, marketplace. Merge our partner organizations, have more synergy and frictionless experiences so everyone can make more money and customer's are going to be happier. >> Yeah, that's right. >> I mean, you're running relationship. You're in the middle of it. >> Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do, is to work with Amazon on these really, you know, unique use cases. >> You know, I interviewed Ali many times over the years. I remember many years ago, maybe six, seven years ago, we were talking. He's like, "we're all in on AWS." Obviously now the success of Databricks, you've got multiple clouds, see that. Customers have choice. But I remember the strategy early on. It was like, we're going to be deep. So this is, speaks volumes to the relationship you have. Years. Jack, take us through the relationship that Databricks has with AWS from a partner perspective. Joel, and from a product perspective. Because it's not like you guys are Johnny come lately, new to the scene. >> Right. >> You've been there, almost president creation of this wave. What's the relationship and how does it relate to what's going on today? >> So most people may not know that Databricks was born on AWS. We actually did our first $100 million of revenue on Amazon. And today we're obviously available on multiple clouds. But we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon, and marketplace broadens our reach. And so, we think of marketplace in three different aspects. We've got the marketplace private offer business, which we've been doing for a number of years. Matter of fact, we were driving well over a hundred percent year over year growth in private offers. And we have a nine figure business. So it's a very significant business. And when a customer uses a private offer, that private offer counts against their private pricing agreement with AWS. So they get pricing power against their private pricing. So it's really important it goes on their Amazon bill. In may we launched our pay as you go, on demand offering. And in five short months, we have well over a thousand subscribers. And what this does, is it really reduces the barriers to entry. It's low friction. So anybody in an enterprise or startup or public sector company can start to use Databricks on AWS, in a consumption based model, and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation, pilots, POCs. They're really learning the value of that first, use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offer, CPPO. Super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with Databricks on behalf of customers. >> So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data, coming in with the buyers you get. And obviously the integration piece all fitting in there. >> Exactly. >> Okay, so those are the offers, that's current, what's in marketplace today. Is that the products... What are people buying? >> Yeah. >> I mean, I guess what's the... Joel, what are people buying in the marketplace? And what does it mean for them? >> So fundamentally what they're buying is the ability to take silos out of their organization. And that is the problem that Databricks is out there to solve. Which is, when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data. And your teams are trying to use all of this data to solve really complicated problems. And as Databricks, as the Lakehouse Company, what we're helping customers do is, how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find, through the marketplace, those rapid adoption use cases where they can get rid of these data warehousing, data lake silos they've had in the past. Get their unstructured and structured data onto one data platform, an open data platform, that is no longer adherent to any proprietary formats and standards and something they can, very much, very easily, integrate into the rest of their data environment. Apply one common data governance layer on top of that. So that from the time they ingest that data, to the time they use that data, to the time they share that data, inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform, with that common governance solution, they'd being able to bring all of those use cases together. Across their real time streaming, their data engineering, their BI, their AI. All of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges they just couldn't solve before. A good example of this, you know, one of the world's now largest data streaming platforms runs on Databricks with AWS. And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses. That they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data, from what the customers are doing with their platform and the recommendations they want to push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex. But by building it on Databricks, they were able to release it in record time and have grown at a record pace to now be the number one platform. >> And this product, it's impacting product development. >> Absolutely. >> I mean, this is like the difference between lagging months of product development, to like days. >> Yes. >> Pretty much what you're getting at. >> Yes. >> So total agility. >> Mm-hmm. >> I got that. Okay, now, I'm a customer I want to buy in the marketplace, but you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from AWS's leadership, Chris, was up there speaking, and Mona was, "Hey, he's a CRO conference chief revenue officer" conversation. Which means someone's getting compensated. So, if I'm the sales rep at Databricks, what's my motion to the customer? Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Or, how do you handle it? >> Well, I'd add what Joel just talked about with, you know, with the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, it's the entire Data Bricks offering. And- >> The flagship, all the, the top stuff. >> Everything, the flagship, the complete offering. So it's not segmented. It's not a sub segment. >> Okay. >> It's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we view this two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend Databricks for the right situation. Same thing with Databricks, our sales force wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get Databricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars. You have $5 million of spend. You put that spend through the flywheel with AWS marketplace, and then you can use that in your negotiations with AWS to get better pricing overall. So that's how we view it. >> So customers are driving. This sounds like. >> Correct. For sure. >> So they're looking at this as saying, Hey, I'm going to just get purchasing power with all my relationships. Because it's a solution architectural market, right? >> Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider. If they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing power. >> Okay, Joel, we're going to date ourselves. At least I will. So back in the old days, (group laughter) It used to be, do a Barney deal with someone, Hey, let's go to market together. You got to get paper, you do a biz dev deal. And then you got to say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for Databricks, or any company is, to go find those partners and do deals, versus now Amazon is the center point for the customer. So you can still do those joint deals, but this seems to be flipping the script a little bit. >> Well, it is, but we still have vars and consulting partners that are doing implementation work. Very valuable work, advisory work, that can actually work with marketplace through the CPPO offering. So the marketplace allows multiple ways to procure your solution. >> So it doesn't change your business structure. It just makes it more efficient. >> That's correct. >> That's a great way to say it. >> Yeah, that's great. >> Okay. So, that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >> Yes. >> Absolutely. >> Economically. >> Economically, it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon. Especially when it comes back to co-selling, right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution. And our teams are working backwards from those use cases, you know, to collaborate and land them. >> Yeah. I want to get that out there. Go ahead, Joel. >> So one of the other things I might add to that too, you know, and why this is advantageous for companies like Databricks to work through the marketplace. Is it makes it so much easier for customers to deploy a solution. It's very, literally, one click through the marketplace to get Databricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. >> You know, it's interesting. I want to bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself, EDI back in the old days, you know, all that craziness. Now this is all the internet, basically through the console. I get the infrastructure side, you know, spin up and provision some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage. A little nuanced. Most enterprises aren't yet there of integrating tech, on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrators dilemma, not an innovator's dilemma. So like, I want to integrate. So now I have integration points with Databricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you got to build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or am I off because, no one's going to be buying software like they used to. They buy software to integrate it. >> Yeah, no- >> Because everything's integrated. >> I think AWS has done a great job at creating a partner ecosystem, right? To give customers the right tools for the right jobs. And those might be with third parties. Databricks is doing the same thing with our partner connect program, right? We've got customer partners like Five Tran and DBT that, you know, augment and enhance our platform. And so you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >> Yeah. It's almost like, you know, bundling and un bundling. I was talking about this with, with Dave Alante about Supercloud. Which is why wouldn't a customer want the best solution in their architecture? Period. In its class. If someone's got API security or an API gateway. Well, you know, I don't want to be forced to buy something because it's part of a suite. And that's where you see things get sub optimized. Where someone dominates a category and they have, oh, you got to buy my version of this. >> Joel and I were talking, we were actually saying, what's really important about Databricks, is that customers control the data, right? You want to comment on that? >> Yeah. I was going to say, you know, what you're pushing on there, we think is extraordinarily, you know, the way the market is going to go. Is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically, I think, really strong places, Databricks and AWS have lined up, is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the Lakehouse, one thing we've always been extremely committed to, as a company, is building the data platform on an open foundation. And we do that primarily through Delta Lake and making sure that, to Jack's point, with Databricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed Databricks to have the breadth of integrations that it has with all the other data tools out there. Because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >> When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what am I foreclosing if I go with something else that's not as open? What's the customer's downside as you think about what's around the corner in the industry? Because if you believe it's going to be open, open source, which I think open source software is the software industry, and integration is a big deal. Because software's going to be plentiful. >> Sure. >> Let's face it. It's a good time to be in software business. But Cloud's booming. So what's the downside, from your Databricks perspective? You see a buyer clicking on Databricks versus that alternative. What's potentially should they be a nervous about, down the road, if they go with a more proprietary or locked in approach? >> Yeah. >> Well, I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also, then, beholden to the pace at which that provider is able to innovate. >> Mm-hmm. >> And I think we've seen lots of times over history where, you know, a proprietary format may run ahead, for a while, on a lot of innovation. But as that market control begins to solidify, that desire to innovate begins to degrade. Whereas in the open formats- >> So extract rents versus innovation. (John laughs) >> Exactly. Yeah, exactly. >> I'll say it. >> But in the open world, you know, you have to continue to innovate. >> Yeah. >> And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you are always going to be at the forefront of what is the latest. >> You know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked with proprietary. >> Yeah. >> You know, SNA and IBM, deck net was digital. You know the rest. And then TCPIP was part of the open systems interconnect. >> Mm-hmm. >> Revolutionary (indistinct) a big part of that, as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. >> Yeah. >> But that helped inter operate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid-game, then you get the end game. You're not there at the end game yet at Cloud. But Cloud- >> There's, always some form of lock in, right? Andy Jazzy will address it, you know, when making a decision. But if you're going to make a decision you want to reduce- You don't want to be limited, right? So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now, is an AI driven business, right? And so it has to do with, can you get that data out of silos? Can you organize it and secure it? And then can you work with data scientists to feed those models? >> Yeah. >> In a very consistent manner. And so the tools of tomorrow will, to Joel's point, will be open and we want interoperability with those tools. >> And choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds, is that they have to compete directly too. Redshift competes directly with a lot of other stuff. But they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're going to be, they're onto it. This is the- >> To Amazon's credit by having these solutions that may compete with native services in marketplace, they are providing customers with choice, low price- >> And access to the core value. Which is the hardware- >> Exactly. >> Which is their platform. Okay. So I want to get you guys thought on something else I see emerging. This is, again, kind of Cube rumination moment. So on stage, Chris unpacked a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing, compensation, workflows, services behind the curtain. And one of those things he mentioned was, they talk about resellers or channel partners, depending upon what you talk about. We believe, Dave and I believe on the Cube, that the entire indirect sales channel of the industry is going to be disrupted radically. Because those players were selling hardware in the old days and software. That game is going to change. You mentioned you guys have a program, let me get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in. Which means that the old reseller channels are going to be rewritten. They're going to be refactored with this new kinds of access. Because you've got scale, you've got money and you've got product. And you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, a value added reseller or VAB or business. >> You've got to evolve. >> You got to, you got to be here. >> Yes. >> Yeah. >> How are you guys working with those partners? Because you say you have a product in your marketplace there. How do I make money if I'm a reseller with Databricks, with Amazon? Take me through that use case. >> Well I'll let Joel comment, but I think it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations. They need expertise from consulting and SI partners. If those consulting and SI partners happen to resell the solution as well. Well, that's another aspect of their business. But I really think it is the expertise that the partners bring to help customers get outcomes. >> Joel, channel big opportunity for Amazon to reimagine this. >> For sure. Yeah. And I think, you know, to your comment about how do resellers take advantage of that, I think what Jack was pushing on is spot on. Which is, it's becoming more and more about the expertise you bring to the table. And not just transacting the software. But now actually helping customers make the right choices. And we're seeing, you know, both SIs begin to be able to resell solutions and finding a lot of opportunity in that. >> Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's going to be the evolution that this goes. >> At the end of the day, it's about services, right? >> For sure. Yeah. >> I mean... >> You've got a great service. You're going to have high gross profits. >> Yeah >> Managed service provider business is alive and well, right? Because there are a number of customers that want that type of a service. >> I think that's going to be a really hot, hot button for you guys. I think being the way you guys are open, this channel, partner services model coming in, to the fold, really kind of makes for kind of that Supercloud like experience, where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on, within Databricks. >> For sure. >> On top of this ecosystem. How does that work? This is kind of like, hasn't been written up in business school and case studies yet. This is new. What is this? >> I think, you know, what it comes down to is, you're seeing ecosystems begin to evolve around the data platforms. And that's going to be one of the big, kind of, new horizons for us as we think about what drives ecosystems. It's going to be around, well, what's the data platform that I'm using? And then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around Databricks in its Lakehouse platform as well. As customers are looking at well, if I'm standing these Lakehouses up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >> I mean you think about ecosystem theory, we're living a whole nother dream. And I'm not kidding. It hasn't yet been written up and for business school case studies is that, we're now in a whole nother connective tissue, ecology thing happening. Where you have dependencies and value proposition. Economics, connectedness. So you have relationships in these ecosystems. >> And I think one of the great things about the relationships with these ecosystems, is that there's a high degree of overlap. >> Yeah. >> So you're seeing that, you know, the way that the cloud business is evolving, the ecosystem partners of Databricks, are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for you. >> Joel, Jack, I love it because you know what it means? The best ecosystem will win, if you keep it open. >> Sure, sure. >> You can see everything. If you're going to do it in the dark, you know, you don't know the outcome. I mean, this is really kind of what we're talking about. >> And John, can I just add that when I was at Amazon, we had a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? >> Yeah. >> And so there's a platform decision being made and that ecosystem is going to evolve around the platform. >> Yeah, and I totally agree. And the word innovation gets kicked around. That's why, you know, when we had our Supercloud panel, it was called the innovators dilemma, with a slash through it, called the integrater's dilemma. Innovation is the digital transformation. So- >> Absolutely. >> Like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating? If APIs are connective tissue, what's automation, what's the service messages look like? I mean, a whole nother set of, kind of thinking, goes on in these new ecosystems and these new products. >> And that thinking is, has been born in Delta Sharing, right? So the idea that you can have a multi-cloud implementation of Databricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud solution. >> Well, Databricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great growing ecosystem. And again, I think a shining example of what every enterprise is going to do. Build on top of something, operating model, get that operating model, driving revenue. >> Mm-hmm. >> Yeah. >> Whether, you're Goldman Sachs or capital one or XYZ corporation. >> S and P global, NASDAQ. >> Yeah. >> We've got, you know, the biggest verticals in the world are solving tough problems with Databricks. I think we'd be remiss because if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions. Whether it's the relationship we have with our engineering and service teams. Our marketing teams, you know, product development. And we're going to be at Reinvent. A big presence at Reinvent. We're looking forward to seeing you there, again. >> Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening. This NextGen Cloud is here. We're seeing this evolve, kind of new economics, new value propositions kind of scaling up. Producing more. So you guys are doing a great job. Thanks for coming on the Cube and taking the time. Joel, great to see you at the check. >> Thanks for having us, John. >> Okay. Cube coverage here. The world's changing as APN comes together with the marketplace for a new partner organization at Amazon web services. The Cube's got it covered. This should be a very big, growing ecosystem as this continues. Billions of being sold through the marketplace. And of course the buyers are happy as well. So we've got it all covered. I'm John Furry. your host of the cube. Thanks for watching. (upbeat music)

Published Date : Oct 10 2022

SUMMARY :

You guys have the keys to the kingdom on the micro, you know, You're in the middle of it. you know, unique use cases. to the relationship you have. and how does it relate to And so we see customers, you know, And obviously the integration Is that the products... buying in the marketplace? And that is the problem that Databricks And this product, it's the difference between So how do you guys look at So it's not a subset, it's the Everything, the flagship, and then you can use So customers are driving. For sure. Hey, I'm going to just you know, multiple ISV spend here is that the alternative So the marketplace allows multiple ways So it doesn't change So you guys are actually incented It's the right thing to do for out there. the marketplace to get Databricks stood up I get the infrastructure side, you know, Databricks is doing the same thing And that's where you see And that is one of the things that aren't as open as you guys, down the road, if they go that provider is able to innovate. that desire to innovate begins to degrade. So extract rents versus innovation. Yeah, exactly. But in the open world, you know, And the open source the protocol stacked with proprietary. You know the rest. And so like, you know, that was, I call it the chessboard, you know, And if you look at what every customer's And so the tools of tomorrow And I would say that, you know, And access to the core value. to data centers or software, you know, How are you guys working that the partners bring to to reimagine this. And I think, you know, And that's going to be the Yeah. You're going to have high gross profits. that want that type of a service. I think being the way you guys are open, This is kind of like, And so I think there's, you know, So you have relationships And I think one of the great things And so as you build these because you know what it means? in the dark, you know, that want to build things themselves. to evolve around the platform. And the word innovation more of a, are you open? So the idea that you and the CapEx gift from AWS. Whether, you're Goldman for all of the investments across Joel, great to see you at the check. And of course the buyers

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
David NicholsonPERSON

0.99+

ChrisPERSON

0.99+

Lisa MartinPERSON

0.99+

JoelPERSON

0.99+

Jeff FrickPERSON

0.99+

PeterPERSON

0.99+

MonaPERSON

0.99+

Dave VellantePERSON

0.99+

David VellantePERSON

0.99+

KeithPERSON

0.99+

AWSORGANIZATION

0.99+

JeffPERSON

0.99+

KevinPERSON

0.99+

Joel MinickPERSON

0.99+

AndyPERSON

0.99+

RyanPERSON

0.99+

Cathy DallyPERSON

0.99+

PatrickPERSON

0.99+

GregPERSON

0.99+

Rebecca KnightPERSON

0.99+

StephenPERSON

0.99+

Kevin MillerPERSON

0.99+

MarcusPERSON

0.99+

Dave AlantePERSON

0.99+

EricPERSON

0.99+

AmazonORGANIZATION

0.99+

twoQUANTITY

0.99+

DanPERSON

0.99+

Peter BurrisPERSON

0.99+

Greg TinkerPERSON

0.99+

UtahLOCATION

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

RaleighLOCATION

0.99+

BrooklynLOCATION

0.99+

Carl KrupitzerPERSON

0.99+

LisaPERSON

0.99+

LenovoORGANIZATION

0.99+

JetBlueORGANIZATION

0.99+

2015DATE

0.99+

DavePERSON

0.99+

Angie EmbreePERSON

0.99+

Kirk SkaugenPERSON

0.99+

Dave NicholsonPERSON

0.99+

2014DATE

0.99+

SimonPERSON

0.99+

UnitedORGANIZATION

0.99+

Stu MinimanPERSON

0.99+

SouthwestORGANIZATION

0.99+

KirkPERSON

0.99+

FrankPERSON

0.99+

Patrick OsbornePERSON

0.99+

1984DATE

0.99+

ChinaLOCATION

0.99+

BostonLOCATION

0.99+

CaliforniaLOCATION

0.99+

SingaporeLOCATION

0.99+

Thomas Stocker, UiPath & Neeraj Mathur, VMware | UiPath FORWARD5


 

>> TheCUBE presents UI Path Forward Five brought to you by UI Path. >> Welcome back to UI Path Forward Five. You're watching The Cubes, Walter Wall coverage. This is day one, Dave Vellante, with my co-host Dave Nicholson. We're taking RPA to intelligence automation. We're going from point tools to platforms. Neeraj Mathur is here. He's the director of Intelligent Automation at VMware. Yes, VMware. We're not going to talk about vSphere or Aria, or maybe we are, (Neeraj chuckles) but he's joined by Thomas Stocker who's a principal product manager at UI Path. And we're going to talk about testing automation, automating the testing process. It's a new sort of big vector in the whole RPA automation space. Gentleman, welcome to theCUBE. Good to see you. >> Neeraj: Thank you very much. >> Thomas: Thank you. >> So Neeraj, as we were saying, Dave and I, you know, really like VMware was half our lives for a long time but we're going to flip it a little bit. >> Neeraj: Absolutely. >> And talk about sort of some of the inside baseball. Talk about your role and how you're applying automation at VMware. >> Absolutely. So, so as part of us really running the intelligent automation program at VMware, we have a quite matured COE for last, you know four to five years, we've been doing this automation across the enterprise. So what we have really done is, you know over 45 different business functions where we really automated quite a lot different processes and tasks on that. So as part of my role, I'm really responsible for making sure that we are, you know, bringing in the best practices, making sure that we are ready to scale across the enterprise but at the same time, how, you know, quickly we are able to deliver the value of this automation to our businesses as well. >> Thomas, as a product manager, you know the product, and the market inside and out, you know the competition, you know the pricing, you know how customers are using it, you know all the features. What's your area of - main area of focus? >> The main area of the UiPathT suite... >> For your role, I mean? >> For my role is the RPA testing. So meaning testing RPA workflows themselves. And the reason is RPA has matured over the last few years. We see that, and it has adopted a lot of best practices from the software development area. So what we see is RPA now becomes business critical. It's part of the main core business processes in corporation and testing it just makes sense. You have to continuously monitor and continuously test your automation to make sure it does not break in production. >> Okay. And you have a specific product for this? Is it a feature or it's a module? >> So RPA testing or the UiPath T Suite, as the name suggests it's a suite of products. It's actually part of the existing platform. So we use Orchestrator, which is the distribution engine. We use Studio, which is our idea to create automation. And on top of that, we build a new component, which is called the UiPath Test Manager. And this is a kind of analytics and management platform where you have an oversight on what happened, what went wrong, and what is the reason for automation to **bring. >> Okay. And so Neeraj, you're testing your robot code? >> Neeraj: Correct. >> Right. And you're looking for what? Governance, security, quality, efficiency, what are the things you're looking for? >> It's actually all of all of those but our main goal to really start this was two-front, right? So we were really looking at how do we, you know, deliver at a speed with the quality which we can really maintain and sustain for a longer period, right? So to improve our quality of delivery at a speed of delivery, which we can do it. So the way we look at testing automation is not just as an independent entity. We look at this as a pipeline of a continuous improvement for us, right? So how it is called industry as a CICD pipeline. So testing automation is one of the key component of that. But the way we were able to deliver on the speed is to really have that end to end automation done for us to also from developers to production and using that pipeline and our testing is one piece of that. And the way we were able to also improve on the quality of our delivery is to really have automated way of doing the code reviews, automated way of doing the testing using this platform as well. and then, you know, how you go through end to end for that purpose. >> Thomas, when I hear testing robots, (Thomas chuckles) I don't care if it's code or actual robots, it's terrifying. >> It's terrify, yeah. >> It's terrifying. Okay, great. You, you have some test suite that says look, Yeah, we've looked at >> The, why is that terrifying? >> What's, It's terrifying because if you have to let it interact with actual live systems in some way. Yeah. The only way to know if it's going to break something is either you let it loose or you have some sort of sandbox where, I mean, what do you do? Are you taking clones of environments and running actual tests against them? I mean, think it's >> Like testing disaster recovery in the old days. Imagine. >> So we are actually not running any testing in the production live environment, right? The way we build this actually to do a testing in the separate test environment on that as well by using very specific test data from business, which you know, we call that as a golden copy of that test data because we want to use that data for months and years to come. Okay. Right? Yeah. So not touching any production environmental Facebook. >> Yeah. All right. Cause you, you can imagine >> Absolutely >> It's like, oh yeah we've created a robotic changes baby diapers let's go ahead and test it on these babies. [Collective Laughter] Yeah >> I don't think so. No, no, But, but what's the, does it does it matter if there's a delta between the test data and the, the, the production data? How, how big is that delta? How do you manage that? >> It does matter. And that's where actually that whole, you know, angle of how much you can, can in real, in real life can test right? So there are cases where you would have, even in our cases where, you know, the production data might be slightly different than the test data itself. So the whole effort goes into making sure that the test data, which we are preparing here, is as close to the products and data itself, right? It may not be a hundred percent close but that's the sort of you know, boundary or risk you may have to take. >> Okay. So you're snapshotting, that moving it over, a little V motion? >> Neeraj: Yeah. >> Okay. So do you do this for citizen developers as well? Or is you guys pretty much center of excellence writing all the bots? >> No, right now we are doing only for the unattended, the COE driven bots only at this point of time, >> What are you, what are your thoughts on the future? Because I can see I can see some really sloppy citizen coders. >> Yeah. Yeah. So as part of our governance, which we are trying to build for our citizen developers as well, there there is a really similar consideration for that as well. But for us, we have really not gone that far to build that sort of automation right >> Now, narrowly, just if we talk about testing what's the business impact been on the testing? And I'm interested in overall, but the overall platform but specifically for the testing, when did that when did you start implementing that and, and what what has been the business benefit? >> So the benefit is really on the on the speed of the delivery, which means that we are able to actually deliver more projects and more automation as well. So since we adopted that, we have seen our you know, improvement, our speed is around 15%, right? So, so, you know, 15% better speed than previously. What we have also seen is, is that our success rate of our transactions in production environment has gone to 96% success rate, which is, again there is a direct implication on business, on, on that point of view that, you know, there's no more manual exception or manual interaction is required for those failure scenarios. >> So 15% better speed at what? At, at implementing the bots? At actually writing code? Or... >> End to end, Yes. So from building the code to test that code able to approve that and then deploy that into the production environment after testing it this is really has improved by 15%. >> Okay. And, and what, what what business processes outside of sort of testing have you sort of attacked with the platform? Can you talk to that? >> The business processes outside of testing? >> Dave: Yeah. You mean the one which we are not testing ourself? >> Yeah, no. So just the UI path platform, is it exclusively for, for testing? >> This testing is exclusively for the UI path bots which we have built, right? So we have some 400 plus automations of UI bots. So it's meant exclusively >> But are you using UI path in any other ways? >> No, not at this time. >> Okay, okay. Interesting. So you started with testing? >> No, we started by building the bots. So we already had roughly 400 bots in production. When we came with the testing automation, that's when we started looking at it. >> Dave: Okay. And then now building that whole testing-- >> Dave: What are those other bots doing? Let me ask it that way. >> Oh, there's quite a lot. I mean, we have many bots. >> Dave: Paint a picture if you want. Yeah. In, in finance, in auto management, HR, legal, IT, there's a lot of automations which are there. As I'm saying, there's more than 400 automations out there. Yeah. So so it's across the, you know, enterprise on that. >> Thomas. So, and you know, both of you have a have a view on this, but Thomas's views probably wider across other, other instances. What are the most common things that are revealed in tests that indicate something needs to be fixed? Yeah, so think of, think of a test, a test failure, an error. What are the, what are the most common things that happen? >> So when we started with building our product we conducted a, a survey among our customers. And without a surprise the main reason why automation breaks is change. >> David: Sure. >> And the problem here is RPA is a controlled process a controlled workflow but it runs in an uncontrollable environment. So typically RPA is developed by a C.O.E. Those are business and automation experts, but they operate in an environment that's driven by new patches new application changes ruled out by IT. And that's the main challenge here. You cannot control that. And so far, if you, if you do not proactively test what happens is you catch an issue in production when it already breaks, right? That's reactive, that's leads to maintenance to un-claim maintenance actually. And that was the goal right from the start from the taste suite to support our customers here and go over to proactive maintenance meaning testing before and finding those issues before the heat production. >> Yeah. Yeah, yeah. So I'm, I'm still not clear on, so you just gave a perfect example, changes in the environment. >> Yeah. >> So those changes are happening in the production environment. >> Thomas: Yeah. The robot that was happily doing its automation stuff before? >> Thomas: Yeah. Everyone was happy with it. Change happens. Robot breaks. >> Thomas: Yeah. >> Okay. You're saying you test before changes are implemented? To see if those changes will break the robot? >> Thomas: Yeah. >> Okay. How do you, how do you expose those changes that are in the, in a, that are going to be in a production environment to the robot? You must have a, Is is that part of the test environment? Does that mean that you have to have what fully running instances of like an ERP system? >> Thomas: Yeah. You know, a clone of an environment. How do you, how do you test that without having the live robot against the production environment? >> I think there's no big difference to standard software testing. Okay. The interesting thing is, the change actually happens earlier. You are affected on production side with it but the change happens on it side or on DevOps side. So you typically will test in a test environment that's similar to your production environment or probably in it in a pre-product environment. And the test itself is simply running your workflow that you want to test, but mark away any dependencies you don't want to invoke. You don't want to send a, a letter to a customer in a test environment, right? And then you verify that the result is what you actually expect, right? And as soon as this is not the case, you will be notified you will have a result, the fail result, and you can act before it breaks. So you can fix it, redeploy to production and you should be good now. >> But the, the main emphasis at VMware is testing your bots, correct? >> Neeraj: Testing your bots. Yes. Can I apply this to testing other software code? >> Yeah, yeah. You, you can, you can technically actually and Thomas can speak better than me on that to any software for that matter, but we have really not explored that aspect of it. >> David: You guys have pretty good coders, good engineers at VMware, but no, seriously Thomas what's that market looking like? Is that taking off? Are you, are you are you applying this capability or customers applying it for just more broadly testing software? >> Absolutely. So our goal was we want to test RPA and the application it relies on so that includes RPA testing as well as application testing. The main difference is typical functional application testing is a black box testing. So you don't know the inner implementation of of that application. And it works out pretty well. The big, the big opportunity that we have is not isolated Not isolated testing, isolated RPA but we talk about convergence of automation. So what we offer our customers is one automation platform. You create one, you create automation, not redundantly in different departments, but you create once probably for testing and then you reuse it for RPA. So that suddenly helps your, your test engineers to to move from a pure cost center to a value center. >> How, how unique is this capability in the industry relative to your competition and and what capabilities do you have that, that or, or or differentiators from the folks that we all know you're competing with? >> So the big advantage is the power of the entire platform that we have with UiPath. So we didn't start from scratch. We have that great automation layer. We have that great distribution layer. We have all that AI capabilities that so far were used for RPA. We can reuse them, repurpose them for testing. And that really differentiates us from the competition. >> Thomas, I I, I detect a hint of an accent. Is it, is it, is it German or >> It's actually Austrian. >> Austrian. Well, >> You know. Don't compare us with Germans. >> I understand. High German. Is that the proper, is that what's spoken in Austria? >> Yes, it is. >> So, so >> Point being? >> Point being exactly as I drift off point being generally German is considered to be a very very precise language with very specific words. It's very easy to be confused about between the difference the difference between two things automation testing and automating testing. >> Thomas: Yes. >> Because in this case, what you are testing are automations. >> Thomas: Yes. >> That's what you're talking about. >> Thomas: Yes. >> You're not talking about the automation of testing. Correct? >> Well, we talk about >> And that's got to be confusing when you go to translate that into >> Dave: But isn't it both? >> 50 other languages? >> Dave: It's both. >> Is it both? >> Thomas: It actually is both. >> Okay. >> And there's something we are exploring right now which is even, even the next step, the next layer which is autonomous testing. So, so far you had an expert an automation expert creating the automation once and it would be rerun over and over again. What we are now exploring is together with university to autonomously test, meaning a bot explores your application on the test and finds issues completely autonomously. >> Dave: So autonomous testing of automation? >> It's getting more and more complicated. >> It's more clear, it's getting clearer by the minute. >> Sorry for that. >> All right Neeraj, last question is: Where do you want to take this? What's your vision for, for VMware in the context of automation? >> Sure. So, so I think the first and the foremost thing for us is to really make it more mainstream for for our automation developer Excel, right? What I mean by that is, is to really, so so there is a shift now how we engage with our business users and SMEs. And I said previously they used to actually test it manually. Now the conversation changes that, hey can you tell us what test cases you want what you want us to test in an automated measure? Can you give us the test data for that so that we can keep on testing in a continuous manner for the months and years to come down? Right? The other part of the test it changes is that, hey it used to take eight weeks for us to build but now it's going to take nine weeks because we're going to spend an extra week just to automate that as well. But it's going to help you in the long run and that's the conversation. So to really make it as much more mainstream and then say that out of all these kinds of automation and bots which we are building, So we are not looking to have a test automation for every single bot which we are building. So we need to have a way to choose where their value is. Is it the quarter end processing one? Is it the most business critical one, or is it the one where we are expecting of frequent changes, right? That's where the value of the testing is. So really bring that as a part of our whole process and then, you know >> We're still fine too. That great. Guys, thanks so much. This has been really interesting conversation. I've been waiting to talk to a real life customer about testing and automation testing. Appreciate your time. >> Thank you very much. >> Thanks for everything. >> All right. Thank you for watching, keep it right there. Dave Nicholson and I will be back right after this short break. This is day one of theCUBE coverage of UI Path Forward Five. Be right back after this short break.

Published Date : Sep 29 2022

SUMMARY :

brought to you by UI Path. in the whole RPA automation space. So Neeraj, as we were some of the inside baseball. for making sure that we are, you know, and the market inside and And the reason is RPA has Is it a feature or it's a module? So RPA testing or the UiPath testing your robot code? And you're looking for what? So the way we look at testing automation I don't care if it's You, you have some test suite that says of sandbox where, I mean, what do you do? recovery in the old days. in the separate test Cause you, you can imagine it on these babies. between the test data and that the test data, which we that moving it over, So do you do this for What are you, what are But for us, we have really not gone that So the benefit is really on the At, at implementing the bots? the code to test that code of testing have you sort of You mean the one which we So just the UI path platform, for the UI path bots So you started with testing? So we already had roughly And then now building that whole testing-- Let me ask it that way. I mean, we have many bots. so it's across the, you know, both of you have a the main reason why from the taste suite to changes in the environment. in the production environment. The robot that was happily doing its Thomas: Yeah. You're saying you test before Does that mean that you against the production environment? the result is what you Can I apply this to testing for that matter, but we have really not So you don't know the So the big advantage is the power a hint of an accent. Well, compare us with Germans. Is that the proper, is that about between the difference what you are testing the automation of testing. on the test and finds issues getting clearer by the minute. But it's going to help you in the long run to a real life customer Thank you for

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ThomasPERSON

0.99+

DavidPERSON

0.99+

NeerajPERSON

0.99+

Dave NicholsonPERSON

0.99+

DavePERSON

0.99+

Neeraj MathurPERSON

0.99+

Dave VellantePERSON

0.99+

Thomas StockerPERSON

0.99+

nine weeksQUANTITY

0.99+

15%QUANTITY

0.99+

eight weeksQUANTITY

0.99+

96%QUANTITY

0.99+

fourQUANTITY

0.99+

bothQUANTITY

0.99+

FacebookORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

UiPathORGANIZATION

0.99+

firstQUANTITY

0.99+

five yearsQUANTITY

0.99+

more than 400 automationsQUANTITY

0.98+

ExcelTITLE

0.98+

50 other languagesQUANTITY

0.98+

AustriaLOCATION

0.98+

one pieceQUANTITY

0.97+

two-frontQUANTITY

0.97+

oneQUANTITY

0.97+

UI Path Forward FiveTITLE

0.97+

The CubesTITLE

0.96+

around 15%QUANTITY

0.96+

UiPath T SuiteTITLE

0.96+

UI PathORGANIZATION

0.96+

GermanOTHER

0.96+

AustrianOTHER

0.95+

hundred percentQUANTITY

0.95+

400 plus automationsQUANTITY

0.95+

TheCUBEORGANIZATION

0.92+

400 botsQUANTITY

0.92+

over 45 different business functionsQUANTITY

0.91+

GermansOTHER

0.91+

day oneQUANTITY

0.91+

UiPathTTITLE

0.9+

RPATITLE

0.9+

monthsQUANTITY

0.88+

UIORGANIZATION

0.86+

Horizon3.ai Signal | Horizon3.ai Partner Program Expands Internationally


 

hello I'm John Furrier with thecube and welcome to this special presentation of the cube and Horizon 3.ai they're announcing a global partner first approach expanding their successful pen testing product Net Zero you're going to hear from leading experts in their staff their CEO positioning themselves for a successful Channel distribution expansion internationally in Europe Middle East Africa and Asia Pacific in this Cube special presentation you'll hear about the expansion the expanse partner program giving Partners a unique opportunity to offer Net Zero to their customers Innovation and Pen testing is going International with Horizon 3.ai enjoy the program [Music] welcome back everyone to the cube and Horizon 3.ai special presentation I'm John Furrier host of thecube we're here with Jennifer Lee head of Channel sales at Horizon 3.ai Jennifer welcome to the cube thanks for coming on great well thank you for having me so big news around Horizon 3.aa driving Channel first commitment you guys are expanding the channel partner program to include all kinds of new rewards incentives training programs help educate you know Partners really drive more recurring Revenue certainly cloud and Cloud scale has done that you got a great product that fits into that kind of Channel model great Services you can wrap around it good stuff so let's get into it what are you guys doing what are what are you guys doing with this news why is this so important yeah for sure so um yeah we like you said we recently expanded our Channel partner program um the driving force behind it was really just um to align our like you said our Channel first commitment um and creating awareness around the importance of our partner ecosystems um so that's it's really how we go to market is is through the channel and a great International Focus I've talked with the CEO so you know about the solution and he broke down all the action on why it's important on the product side but why now on the go to market change what's the what's the why behind this big this news on the channel yeah for sure so um we are doing this now really to align our business strategy which is built on the concept of enabling our partners to create a high value high margin business on top of our platform and so um we offer a solution called node zero it provides autonomous pen testing as a service and it allows organizations to continuously verify their security posture um so we our company vision we have this tagline that states that our pen testing enables organizations to see themselves Through The Eyes of an attacker and um we use the like the attacker's perspective to identify exploitable weaknesses and vulnerabilities so we created this partner program from a perspective of the partner so the partner's perspective and we've built It Through The Eyes of our partner right so we're prioritizing really what the partner is looking for and uh will ensure like Mutual success for us yeah the partners always want to get in front of the customers and bring new stuff to them pen tests have traditionally been really expensive uh and so bringing it down in one to a service level that's one affordable and has flexibility to it allows a lot of capability so I imagine people getting excited by it so I have to ask you about the program What specifically are you guys doing can you share any details around what it means for the partners what they get what's in it for them can you just break down some of the mechanics and mechanisms or or details yeah yep um you know we're really looking to create business alignment um and like I said establish Mutual success with our partners so we've got two um two key elements that we were really focused on um that we bring to the partners so the opportunity the profit margin expansion is one of them and um a way for our partners to really differentiate themselves and stay relevant in the market so um we've restructured our discount model really um you know highlighting profitability and maximizing profitability and uh this includes our deal registration we've we've created deal registration program we've increased discount for partners who take part in our partner certification uh trainings and we've we have some other partner incentives uh that we we've created that that's going to help out there we've we put this all so we've recently Gone live with our partner portal um it's a Consolidated experience for our partners where they can access our our sales tools and we really view our partners as an extension of our sales and Technical teams and so we've extended all of our our training material that we use internally we've made it available to our partners through our partner portal um we've um I'm trying I'm thinking now back what else is in that partner portal here we've got our partner certification information so all the content that's delivered during that training can be found in the portal we've got deal registration uh um co-branded marketing materials pipeline management and so um this this portal gives our partners a One-Stop place to to go to find all that information um and then just really quickly on the second part of that that I mentioned is our technology really is um really disruptive to the market so you know like you said autonomous pen testing it's um it's still it's well it's still still relatively new topic uh for security practitioners and um it's proven to be really disruptive so um that on top of um just well recently we found an article that um that mentioned by markets and markets that reports that the global pen testing markets really expanding and so it's expected to grow to like 2.7 billion um by 2027. so the Market's there right the Market's expanding it's growing and so for our partners it's just really allows them to grow their revenue um across their customer base expand their customer base and offering this High profit margin while you know getting in early to Market on this just disruptive technology big Market a lot of opportunities to make some money people love to put more margin on on those deals especially when you can bring a great solution that everyone knows is hard to do so I think that's going to provide a lot of value is there is there a type of partner that you guys see emerging or you aligning with you mentioned the alignment with the partners I can see how that the training and the incentives are all there sounds like it's all going well is there a type of partner that's resonating the most or is there categories of partners that can take advantage of this yeah absolutely so we work with all different kinds of Partners we work with our traditional resale Partners um we've worked we're working with systems integrators we have a really strong MSP mssp program um we've got Consulting partners and the Consulting Partners especially with the ones that offer pen test services so we they use us as a as we act as a force multiplier just really offering them profit margin expansion um opportunity there we've got some technology partner partners that we really work with for co-cell opportunities and then we've got our Cloud Partners um you'd mentioned that earlier and so we are in AWS Marketplace so our ccpo partners we're part of the ISP accelerate program um so we we're doing a lot there with our Cloud partners and um of course we uh we go to market with uh distribution Partners as well gotta love the opportunity for more margin expansion every kind of partner wants to put more gross profit on their deals is there a certification involved I have to ask is there like do you get do people get certified or is it just you get trained is it self-paced training is it in person how are you guys doing the whole training certification thing because is that is that a requirement yeah absolutely so we do offer a certification program and um it's been very popular this includes a a seller's portion and an operator portion and and so um this is at no cost to our partners and um we operate both virtually it's it's law it's virtually but live it's not self-paced and we also have in person um you know sessions as well and we also can customize these to any partners that have a large group of people and we can just we can do one in person or virtual just specifically for that partner well any kind of incentive opportunities and marketing opportunities everyone loves to get the uh get the deals just kind of rolling in leads from what we can see if our early reporting this looks like a hot product price wise service level wise what incentive do you guys thinking about and and Joint marketing you mentioned co-sell earlier in pipeline so I was kind of kind of honing in on that piece sure and yes and then to follow along with our partner certification program we do incentivize our partners there if they have a certain number certified their discount increases so that's part of it we have our deal registration program that increases discount as well um and then we do have some um some partner incentives that are wrapped around meeting setting and um moving moving opportunities along to uh proof of value gotta love the education driving value I have to ask you so you've been around the industry you've seen the channel relationships out there you're seeing companies old school new school you know uh Horizon 3.ai is kind of like that new school very cloud specific a lot of Leverage with we mentioned AWS and all the clouds um why is the company so hot right now why did you join them and what's why are people attracted to this company what's the what's the attraction what's the vibe what do you what do you see and what what do you use what did you see in in this company well this is just you know like I said it's very disruptive um it's really in high demand right now and um and and just because because it's new to Market and uh a newer technology so we are we can collaborate with a manual pen tester um we can you know we can allow our customers to run their pen test um with with no specialty teams and um and and then so we and like you know like I said we can allow our partners can actually build businesses profitable businesses so we can they can use our product to increase their services revenue and um and build their business model you know around around our services what's interesting about the pen test thing is that it's very expensive and time consuming the people who do them are very talented people that could be working on really bigger things in the in absolutely customers so bringing this into the channel allows them if you look at the price Delta between a pen test and then what you guys are offering I mean that's a huge margin Gap between street price of say today's pen test and what you guys offer when you show people that they follow do they say too good to be true I mean what are some of the things that people say when you kind of show them that are they like scratch their head like come on what's the what's the catch here right so the cost savings is a huge is huge for us um and then also you know like I said working as a force multiplier with a pen testing company that offers the services and so they can they can do their their annual manual pen tests that may be required around compliance regulations and then we can we can act as the continuous verification of their security um um you know that that they can run um weekly and so it's just um you know it's just an addition to to what they're offering already and an expansion so Jennifer thanks for coming on thecube really appreciate you uh coming on sharing the insights on the channel uh what's next what can we expect from the channel group what are you thinking what's going on right so we're really looking to expand our our Channel um footprint and um very strategically uh we've got um we've got some big plans um for for Horizon 3.ai awesome well thanks for coming on really appreciate it you're watching thecube the leader in high tech Enterprise coverage [Music] [Music] hello and welcome to the Cube's special presentation with Horizon 3.ai with Raina Richter vice president of emea Europe Middle East and Africa and Asia Pacific APAC for Horizon 3 today welcome to this special Cube presentation thanks for joining us thank you for the invitation so Horizon 3 a guy driving Global expansion big international news with a partner first approach you guys are expanding internationally let's get into it you guys are driving this new expanse partner program to new heights tell us about it what are you seeing in the momentum why the expansion what's all the news about well I would say uh yeah in in international we have I would say a similar similar situation like in the US um there is a global shortage of well-educated penetration testers on the one hand side on the other side um we have a raising demand of uh network and infrastructure security and with our approach of an uh autonomous penetration testing I I believe we are totally on top of the game um especially as we have also now uh starting with an international instance that means for example if a customer in Europe is using uh our service node zero he will be connected to a node zero instance which is located inside the European Union and therefore he has doesn't have to worry about the conflict between the European the gdpr regulations versus the US Cloud act and I would say there we have a total good package for our partners that they can provide differentiators to their customers you know we've had great conversations here on thecube with the CEO and the founder of the company around the leverage of the cloud and how successful that's been for the company and honestly I can just Connect the Dots here but I'd like you to weigh in more on how that translates into the go to market here because you got great Cloud scale with with the security product you guys are having success with great leverage there I've seen a lot of success there what's the momentum on the channel partner program internationally why is it so important to you is it just the regional segmentation is it the economics why the momentum well there are it's there are multiple issues first of all there is a raising demand in penetration testing um and don't forget that uh in international we have a much higher level in number a number or percentage in SMB and mid-market customers so these customers typically most of them even didn't have a pen test done once a year so for them pen testing was just too expensive now with our offering together with our partners we can provide different uh ways how customers could get an autonomous pen testing done more than once a year with even lower costs than they had with with a traditional manual paint test so and that is because we have our uh Consulting plus package which is for typically pain testers they can go out and can do a much faster much quicker and their pain test at many customers once in after each other so they can do more pain tests on a lower more attractive price on the other side there are others what even the same ones who are providing um node zero as an mssp service so they can go after s p customers saying okay well you only have a couple of hundred uh IP addresses no worries we have the perfect package for you and then you have let's say the mid Market let's say the thousands and more employees then they might even have an annual subscription very traditional but for all of them it's all the same the customer or the service provider doesn't need a piece of Hardware they only need to install a small piece of a Docker container and that's it and that makes it so so smooth to go in and say okay Mr customer we just put in this this virtual attacker into your network and that's it and and all the rest is done and within within three clicks they are they can act like a pen tester with 20 years of experience and that's going to be very Channel friendly and partner friendly I can almost imagine so I have to ask you and thank you for calling the break calling out that breakdown and and segmentation that was good that was very helpful for me to understand but I want to follow up if you don't mind um what type of partners are you seeing the most traction with and why well I would say at the beginning typically you have the the innovators the early adapters typically Boutique size of Partners they start because they they are always looking for Innovation and those are the ones you they start in the beginning so we have a wide range of Partners having mostly even um managed by the owner of the company so uh they immediately understand okay there is the value and they can change their offering they're changing their offering in terms of penetration testing because they can do more pen tests and they can then add other ones or we have those ones who offer 10 tests services but they did not have their own pen testers so they had to go out on the open market and Source paint testing experts um to get the pen test at a particular customer done and now with node zero they're totally independent they can't go out and say okay Mr customer here's the here's the service that's it we turn it on and within an hour you're up and running totally yeah and those pen tests are usually expensive and hard to do now it's right in line with the sales delivery pretty interesting for a partner absolutely but on the other hand side we are not killing the pain testers business we do something we're providing with no tiers I would call something like the foundation work the foundational work of having an an ongoing penetration testing of the infrastructure the operating system and the pen testers by themselves they can concentrate in the future on things like application pen testing for example so those Services which we we're not touching so we're not killing the paint tester Market we're just taking away the ongoing um let's say foundation work call it that way yeah yeah that was one of my questions I was going to ask is there's a lot of interest in this autonomous pen testing one because it's expensive to do because those skills are required are in need and they're expensive so you kind of cover the entry level and the blockers that are in there I've seen people say to me this pen test becomes a blocker for getting things done so there's been a lot of interest in the autonomous pen testing and for organizations to have that posture and it's an overseas issue too because now you have that that ongoing thing so can you explain that particular benefit for an organization to have that continuously verifying an organization's posture yep certainly so I would say um typically you are you you have to do your patches you have to bring in new versions of operating systems of different Services of uh um operating systems of some components and and they are always bringing new vulnerabilities the difference here is that with node zero we are telling the customer or the partner package we're telling them which are the executable vulnerabilities because previously they might have had um a vulnerability scanner so this vulnerability scanner brought up hundreds or even thousands of cves but didn't say anything about which of them are vulnerable really executable and then you need an expert digging in one cve after the other finding out is it is it really executable yes or no and that is where you need highly paid experts which we have a shortage so with notes here now we can say okay we tell you exactly which ones are the ones you should work on because those are the ones which are executable we rank them accordingly to the risk level how easily they can be used and by a sudden and then the good thing is convert it or indifference to the traditional penetration test they don't have to wait for a year for the next pain test to find out if the fixing was effective they weren't just the next scan and say Yes closed vulnerability is gone the time is really valuable and if you're doing any devops Cloud native you're always pushing new things so pen test ongoing pen testing is actually a benefit just in general as a kind of hygiene so really really interesting solution really bring that global scale is going to be a new new coverage area for us for sure I have to ask you if you don't mind answering what particular region are you focused on or plan to Target for this next phase of growth well at this moment we are concentrating on the countries inside the European Union Plus the United Kingdom um but we are and they are of course logically I'm based into Frankfurt area that means we cover more or less the countries just around so it's like the total dark region Germany Switzerland Austria plus the Netherlands but we also already have Partners in the nordics like in Finland or in Sweden um so it's it's it it's rapidly we have Partners already in the UK and it's rapidly growing so I'm for example we are now starting with some activities in Singapore um um and also in the in the Middle East area um very important we uh depending on let's say the the way how to do business currently we try to concentrate on those countries where we can have um let's say um at least English as an accepted business language great is there any particular region you're having the most success with right now is it sounds like European Union's um kind of first wave what's them yes that's the first definitely that's the first wave and now we're also getting the uh the European instance up and running it's clearly our commitment also to the market saying okay we know there are certain dedicated uh requirements and we take care of this and and we're just launching it we're building up this one uh the instance um in the AWS uh service center here in Frankfurt also with some dedicated Hardware internet in a data center in Frankfurt where we have with the date six by the way uh the highest internet interconnection bandwidth on the planet so we have very short latency to wherever you are on on the globe that's a great that's a great call outfit benefit too I was going to ask that what are some of the benefits your partners are seeing in emea and Asia Pacific well I would say um the the benefits is for them it's clearly they can they can uh talk with customers and can offer customers penetration testing which they before and even didn't think about because it penetrates penetration testing in a traditional way was simply too expensive for them too complex the preparation time was too long um they didn't have even have the capacity uh to um to support a pain an external pain tester now with this service you can go in and say even if they Mr customer we can do a test with you in a couple of minutes within we have installed the docker container within 10 minutes we have the pen test started that's it and then we just wait and and I would say that is we'll we are we are seeing so many aha moments then now because on the partner side when they see node zero the first time working it's like this wow that is great and then they work out to customers and and show it to their typically at the beginning mostly the friendly customers like wow that's great I need that and and I would say um the feedback from the partners is that is a service where I do not have to evangelize the customer everybody understands penetration testing I don't have to say describe what it is they understand the customer understanding immediately yes penetration testing good about that I know I should do it but uh too complex too expensive now with the name is for example as an mssp service provided from one of our partners but it's getting easy yeah it's great and it's great great benefit there I mean I gotta say I'm a huge fan of what you guys are doing I like this continuous automation that's a major benefit to anyone doing devops or any kind of modern application development this is just a godsend for them this is really good and like you said the pen testers that are doing it they were kind of coming down from their expertise to kind of do things that should have been automated they get to focus on the bigger ticket items that's a really big point so we free them we free the pain testers for the higher level elements of the penetration testing segment and that is typically the application testing which is currently far away from being automated yeah and that's where the most critical workloads are and I think this is the nice balance congratulations on the international expansion of the program and thanks for coming on this special presentation really I really appreciate it thank you you're welcome okay this is thecube special presentation you know check out pen test automation International expansion Horizon 3 dot AI uh really Innovative solution in our next segment Chris Hill sector head for strategic accounts will discuss the power of Horizon 3.ai and Splunk in action you're watching the cube the leader in high tech Enterprise coverage foreign [Music] [Music] welcome back everyone to the cube and Horizon 3.ai special presentation I'm John Furrier host of thecube we're with Chris Hill sector head for strategic accounts and federal at Horizon 3.ai a great Innovative company Chris great to see you thanks for coming on thecube yeah like I said uh you know great to meet you John long time listener first time caller so excited to be here with you guys yeah we were talking before camera you had Splunk back in 2013 and I think 2012 was our first splunk.com and boy man you know talk about being in the right place at the right time now we're at another inflection point and Splunk continues to be relevant um and continuing to have that data driving Security in that interplay and your CEO former CTO of his plug as well at Horizon who's been on before really Innovative product you guys have but you know yeah don't wait for a breach to find out if you're logging the right data this is the topic of this thread Splunk is very much part of this new international expansion announcement uh with you guys tell us what are some of the challenges that you see where this is relevant for the Splunk and Horizon AI as you guys expand uh node zero out internationally yeah well so across so you know my role uh within Splunk it was uh working with our most strategic accounts and so I looked back to 2013 and I think about the sales process like working with with our small customers you know it was um it was still very siled back then like I was selling to an I.T team that was either using this for it operations um we generally would always even say yeah although we do security we weren't really designed for it we're a log management tool and we I'm sure you remember back then John we were like sort of stepping into the security space and and the public sector domain that I was in you know security was 70 of what we did when I look back to sort of uh the transformation that I was witnessing in that digital transformation um you know when I look at like 2019 to today you look at how uh the IT team and the security teams are being have been forced to break down those barriers that they used to sort of be silent away would not commute communicate one you know the security guys would be like oh this is my box I.T you're not allowed in today you can't get away with that and I think that the value that we bring to you know and of course Splunk has been a huge leader in that space and continues to do Innovation across the board but I think what we've we're seeing in the space and I was talking with Patrick Coughlin the SVP of uh security markets about this is that you know what we've been able to do with Splunk is build a purpose-built solution that allows Splunk to eat more data so Splunk itself is ulk know it's an ingest engine right the great reason people bought it was you could build these really fast dashboards and grab intelligence out of it but without data it doesn't do anything right so how do you drive and how do you bring more data in and most importantly from a customer perspective how do you bring the right data in and so if you think about what node zero and what we're doing in a horizon 3 is that sure we do pen testing but because we're an autonomous pen testing tool we do it continuously so this whole thought I'd be like oh crud like my customers oh yeah we got a pen test coming up it's gonna be six weeks the week oh yeah you know and everyone's gonna sit on their hands call me back in two months Chris we'll talk to you then right not not a real efficient way to test your environment and shoot we saw that with Uber this week right um you know and that's a case where we could have helped oh just right we could explain the Uber thing because it was a contractor just give a quick highlight of what happened so you can connect the doctor yeah no problem so um it was uh I got I think it was yeah one of those uh you know games where they would try and test an environment um and with the uh pen tester did was he kept on calling them MFA guys being like I need to reset my password we need to set my right password and eventually the um the customer service guy said okay I'm resetting it once he had reset and bypassed the multi-factor authentication he then was able to get in and get access to the building area that he was in or I think not the domain but he was able to gain access to a partial part of that Network he then paralleled over to what I would assume is like a VA VMware or some virtual machine that had notes that had all of the credentials for logging into various domains and So within minutes they had access and that's the sort of stuff that we do you know a lot of these tools like um you know you think about the cacophony of tools that are out there in a GTA architect architecture right I'm gonna get like a z-scale or I'm going to have uh octum and I have a Splunk I've been into the solar system I mean I don't mean to name names we have crowdstriker or Sentinel one in there it's just it's a cacophony of things that don't work together they weren't designed work together and so we have seen so many times in our business through our customer support and just working with customers when we do their pen tests that there will be 5 000 servers out there three are misconfigured those three misconfigurations will create the open door because remember the hacker only needs to be right once the defender needs to be right all the time and that's the challenge and so that's what I'm really passionate about what we're doing uh here at Horizon three I see this my digital transformation migration and security going on which uh we're at the tip of the spear it's why I joined sey Hall coming on this journey uh and just super excited about where the path's going and super excited about the relationship with Splunk I get into more details on some of the specifics of that but um you know well you're nailing I mean we've been doing a lot of things on super cloud and this next gen environment we're calling it next gen you're really seeing devops obviously devsecops has already won the it role has moved to the developer shift left is an indicator of that it's one of the many examples higher velocity code software supply chain you hear these things that means that it is now in the developer hands it is replaced by the new Ops data Ops teams and security where there's a lot of horizontal thinking to your point about access there's no more perimeter huge 100 right is really right on things one time you know to get in there once you're in then you can hang out move around move laterally big problem okay so we get that now the challenges for these teams as they are transitioning organizationally how do they figure out what to do okay this is the next step they already have Splunk so now they're kind of in transition while protecting for a hundred percent ratio of success so how would you look at that and describe the challenge is what do they do what is it what are the teams facing with their data and what's next what are they what are they what action do they take so let's use some vernacular that folks will know so if I think about devsecops right we both know what that means that I'm going to build security into the app it normally talks about sec devops right how am I building security around the perimeter of what's going inside my ecosystem and what are they doing and so if you think about what we're able to do with somebody like Splunk is we can pen test the entire environment from Soup To Nuts right so I'm going to test the end points through to its I'm going to look for misconfigurations I'm going to I'm going to look for um uh credential exposed credentials you know I'm going to look for anything I can in the environment again I'm going to do it at light speed and and what what we're doing for that SEC devops space is to you know did you detect that we were in your environment so did we alert Splunk or the Sim that there's someone in the environment laterally moving around did they more importantly did they log us into their environment and when do they detect that log to trigger that log did they alert on us and then finally most importantly for every CSO out there is going to be did they stop us and so that's how we we do this and I think you when speaking with um stay Hall before you know we've come up with this um boils but we call it fine fix verifying so what we do is we go in is we act as the attacker right we act in a production environment so we're not going to be we're a passive attacker but we will go in on credentialed on agents but we have to assume to have an assumed breach model which means we're going to put a Docker container in your environment and then we're going to fingerprint the environment so we're going to go out and do an asset survey now that's something that's not something that Splunk does super well you know so can Splunk see all the assets do the same assets marry up we're going to log all that data and think and then put load that into this long Sim or the smoke logging tools just to have it in Enterprise right that's an immediate future ad that they've got um and then we've got the fix so once we've completed our pen test um we are then going to generate a report and we can talk about these in a little bit later but the reports will show an executive summary the assets that we found which would be your asset Discovery aspect of that a fix report and the fixed report I think is probably the most important one it will go down and identify what we did how we did it and then how to fix that and then from that the pen tester or the organization should fix those then they go back and run another test and then they validate like a change detection environment to see hey did those fixes taste play take place and you know snehaw when he was the CTO of jsoc he shared with me a number of times about it's like man there would be 15 more items on next week's punch sheet that we didn't know about and it's and it has to do with how we you know how they were uh prioritizing the cves and whatnot because they would take all CBDs it was critical or non-critical and it's like we are able to create context in that environment that feeds better information into Splunk and whatnot that brings that brings up the efficiency for Splunk specifically the teams out there by the way the burnout thing is real I mean this whole I just finished my list and I got 15 more or whatever the list just can keeps growing how did node zero specifically help Splunk teams be more efficient like that's the question I want to get at because this seems like a very scale way for Splunk customers and teams service teams to be more so the question is how does node zero help make Splunk specifically their service teams be more efficient so so today in our early interactions we're building customers we've seen are five things um and I'll start with sort of identifying the blind spots right so kind of what I just talked about with you did we detect did we log did we alert did they stop node zero right and so I would I put that you know a more Layman's third grade term and if I was going to beat a fifth grader at this game would be we can be the sparring partner for a Splunk Enterprise customer a Splunk Essentials customer someone using Splunk soar or even just an Enterprise Splunk customer that may be a small shop with three people and just wants to know where am I exposed so by creating and generating these reports and then having um the API that actually generates the dashboard they can take all of these events that we've logged and log them in and then where that then comes in is number two is how do we prioritize those logs right so how do we create visibility to logs that that um are have critical impacts and again as I mentioned earlier not all cves are high impact regard and also not all or low right so if you daisy chain a bunch of low cves together boom I've got a mission critical AP uh CPE that needs to be fixed now such as a credential moving to an NT box that's got a text file with a bunch of passwords on it that would be very bad um and then third would be uh verifying that you have all of the hosts so one of the things that splunk's not particularly great at and they'll literate themselves they don't do asset Discovery so dude what assets do we see and what are they logging from that um and then for from um for every event that they are able to identify one of the cool things that we can do is actually create this low code no code environment so they could let you know Splunk customers can use Splunk sword to actually triage events and prioritize that event so where they're being routed within it to optimize the Sox team time to Market or time to triage any given event obviously reducing MTR and then finally I think one of the neatest things that we'll be seeing us develop is um our ability to build glass cables so behind me you'll see one of our triage events and how we build uh a Lockheed Martin kill chain on that with a glass table which is very familiar to the community we're going to have the ability and not too distant future to allow people to search observe on those iocs and if people aren't familiar with it ioc it's an instant of a compromise so that's a vector that we want to drill into and of course who's better at Drilling in the data and smoke yeah this is a critter this is an awesome Synergy there I mean I can see a Splunk customer going man this just gives me so much more capability action actionability and also real understanding and I think this is what I want to dig into if you don't mind understanding that critical impact okay is kind of where I see this coming got the data data ingest now data's data but the question is what not to log you know where are things misconfigured these are critical questions so can you talk about what it means to understand critical impact yeah so I think you know going back to the things that I just spoke about a lot of those cves where you'll see um uh low low low and then you daisy chain together and they're suddenly like oh this is high now but then your other impact of like if you're if you're a Splunk customer you know and I had it I had several of them I had one customer that you know terabytes of McAfee data being brought in and it was like all right there's a lot of other data that you probably also want to bring but they could only afford wanted to do certain data sets because that's and they didn't know how to prioritize or filter those data sets and so we provide that opportunity to say hey these are the critical ones to bring in but there's also the ones that you don't necessarily need to bring in because low cve in this case really does mean low cve like an ILO server would be one that um that's the print server uh where the uh your admin credentials are on on like a printer and so there will be credentials on that that's something that a hacker might go in to look at so although the cve on it is low is if you daisy chain with somebody that's able to get into that you might say Ah that's high and we would then potentially rank it giving our AI logic to say that's a moderate so put it on the scale and we prioritize those versus uh of all of these scanners just going to give you a bunch of CDs and good luck and translating that if I if I can and tell me if I'm wrong that kind of speaks to that whole lateral movement that's it challenge right print serve a great example looks stupid low end who's going to want to deal with the print server oh but it's connected into a critical system there's a path is that kind of what you're getting at yeah I use Daisy Chain I think that's from the community they came from uh but it's just a lateral movement it's exactly what they're doing in those low level low critical lateral movements is where the hackers are getting in right so that's the beauty thing about the uh the Uber example is that who would have thought you know I've got my monthly Factor authentication going in a human made a mistake we can't we can't not expect humans to make mistakes we're fallible right the reality is is once they were in the environment they could have protected themselves by running enough pen tests to know that they had certain uh exposed credentials that would have stopped the breach and they did not had not done that in their environment and I'm not poking yeah but it's an interesting Trend though I mean it's obvious if sometimes those low end items are also not protected well so it's easy to get at from a hacker standpoint but also the people in charge of them can be fished easily or spearfished because they're not paying attention because they don't have to no one ever told them hey be careful yeah for the community that I came from John that's exactly how they they would uh meet you at a uh an International Event um introduce themselves as a graduate student these are National actor States uh would you mind reviewing my thesis on such and such and I was at Adobe at the time that I was working on this instead of having to get the PDF they opened the PDF and whoever that customer was launches and I don't know if you remember back in like 2008 time frame there was a lot of issues around IP being by a nation state being stolen from the United States and that's exactly how they did it and John that's or LinkedIn hey I want to get a joke we want to hire you double the salary oh I'm gonna click on that for sure you know yeah right exactly yeah the one thing I would say to you is like uh when we look at like sort of you know because I think we did 10 000 pen tests last year is it's probably over that now you know we have these sort of top 10 ways that we think and find people coming into the environment the funniest thing is that only one of them is a cve related vulnerability like uh you know you guys know what they are right so it's it but it's it's like two percent of the attacks are occurring through the cves but yeah there's all that attention spent to that and very little attention spent to this pen testing side which is sort of this continuous threat you know monitoring space and and this vulnerability space where I think we play a such an important role and I'm so excited to be a part of the tip of the spear on this one yeah I'm old enough to know the movie sneakers which I loved as a you know watching that movie you know professional hackers are testing testing always testing the environment I love this I got to ask you as we kind of wrap up here Chris if you don't mind the the benefits to Professional Services from this Alliance big news Splunk and you guys work well together we see that clearly what are what other benefits do Professional Services teams see from the Splunk and Horizon 3.ai Alliance so if you're I think for from our our from both of our uh Partners uh as we bring these guys together and many of them already are the same partner right uh is that uh first off the licensing model is probably one of the key areas that we really excel at so if you're an end user you can buy uh for the Enterprise by the number of IP addresses you're using um but uh if you're a partner working with this there's solution ways that you can go in and we'll license as to msps and what that business model on msps looks like but the unique thing that we do here is this C plus license and so the Consulting plus license allows like a uh somebody a small to mid-sized to some very large uh you know Fortune 100 uh consulting firms use this uh by buying into a license called um Consulting plus where they can have unlimited uh access to as many IPS as they want but you can only run one test at a time and as you can imagine when we're going and hacking passwords and um checking hashes and decrypting hashes that can take a while so but for the right customer it's it's a perfect tool and so I I'm so excited about our ability to go to market with uh our partners so that we understand ourselves understand how not to just sell to or not tell just to sell through but we know how to sell with them as a good vendor partner I think that that's one thing that we've done a really good job building bring it into the market yeah I think also the Splunk has had great success how they've enabled uh partners and Professional Services absolutely you know the services that layer on top of Splunk are multi-fold tons of great benefits so you guys Vector right into that ride that way with friction and and the cool thing is that in you know in one of our reports which could be totally customized uh with someone else's logo we're going to generate you know so I I used to work in another organization it wasn't Splunk but we we did uh you know pen testing as for for customers and my pen testers would come on site they'd do the engagement and they would leave and then another release someone would be oh shoot we got another sector that was breached and they'd call you back you know four weeks later and so by August our entire pen testings teams would be sold out and it would be like well even in March maybe and they're like no no I gotta breach now and and and then when they do go in they go through do the pen test and they hand over a PDF and they pack on the back and say there's where your problems are you need to fix it and the reality is that what we're going to generate completely autonomously with no human interaction is we're going to go and find all the permutations of anything we found and the fix for those permutations and then once you've fixed everything you just go back and run another pen test it's you know for what people pay for one pen test they can have a tool that does that every every Pat patch on Tuesday and that's on Wednesday you know triage throughout the week green yellow red I wanted to see the colors show me green green is good right not red and one CIO doesn't want who doesn't want that dashboard right it's it's exactly it and we can help bring I think that you know I'm really excited about helping drive this with the Splunk team because they get that they understand that it's the green yellow red dashboard and and how do we help them find more green uh so that the other guys are in red yeah and get in the data and do the right thing and be efficient with how you use the data know what to look at so many things to pay attention to you know the combination of both and then go to market strategy real brilliant congratulations Chris thanks for coming on and sharing um this news with the detail around the Splunk in action around the alliance thanks for sharing John my pleasure thanks look forward to seeing you soon all right great we'll follow up and do another segment on devops and I.T and security teams as the new new Ops but and super cloud a bunch of other stuff so thanks for coming on and our next segment the CEO of horizon 3.aa will break down all the new news for us here on thecube you're watching thecube the leader in high tech Enterprise coverage [Music] yeah the partner program for us has been fantastic you know I think prior to that you know as most organizations most uh uh most Farmers most mssps might not necessarily have a a bench at all for penetration testing uh maybe they subcontract this work out or maybe they do it themselves but trying to staff that kind of position can be incredibly difficult for us this was a differentiator a a new a new partner a new partnership that allowed us to uh not only perform services for our customers but be able to provide a product by which that they can do it themselves so we work with our customers in a variety of ways some of them want more routine testing and perform this themselves but we're also a certified service provider of horizon 3 being able to perform uh penetration tests uh help review the the data provide color provide analysis for our customers in a broader sense right not necessarily the the black and white elements of you know what was uh what's critical what's high what's medium what's low what you need to fix but are there systemic issues this has allowed us to onboard new customers this has allowed us to migrate some penetration testing services to us from from competitors in the marketplace But ultimately this is occurring because the the product and the outcome are special they're unique and they're effective our customers like what they're seeing they like the routineness of it many of them you know again like doing this themselves you know being able to kind of pen test themselves parts of their networks um and the the new use cases right I'm a large organization I have eight to ten Acquisitions per year wouldn't it be great to have a tool to be able to perform a penetration test both internal and external of that acquisition before we integrate the two companies and maybe bringing on some risk it's a very effective partnership uh one that really is uh kind of taken our our Engineers our account Executives by storm um you know this this is a a partnership that's been very valuable to us [Music] a key part of the value and business model at Horizon 3 is enabling Partners to leverage node zero to make more revenue for themselves our goal is that for sixty percent of our Revenue this year will be originated by partners and that 95 of our Revenue next year will be originated by partners and so a key to that strategy is making us an integral part of your business models as a partner a key quote from one of our partners is that we enable every one of their business units to generate Revenue so let's talk about that in a little bit more detail first is that if you have a pen test Consulting business take Deloitte as an example what was six weeks of human labor at Deloitte per pen test has been cut down to four days of Labor using node zero to conduct reconnaissance find all the juicy interesting areas of the of the Enterprise that are exploitable and being able to go assess the entire organization and then all of those details get served up to the human to be able to look at understand and determine where to probe deeper so what you see in that pen test Consulting business is that node zero becomes a force multiplier where those Consulting teams were able to cover way more accounts and way more IPS within those accounts with the same or fewer consultants and so that directly leads to profit margin expansion for the Penn testing business itself because node 0 is a force multiplier the second business model here is if you're an mssp as an mssp you're already making money providing defensive cyber security operations for a large volume of customers and so what they do is they'll license node zero and use us as an upsell to their mssb business to start to deliver either continuous red teaming continuous verification or purple teaming as a service and so in that particular business model they've got an additional line of Revenue where they can increase the spend of their existing customers by bolting on node 0 as a purple team as a service offering the third business model or customer type is if you're an I.T services provider so as an I.T services provider you make money installing and configuring security products like Splunk or crowdstrike or hemio you also make money reselling those products and you also make money generating follow-on services to continue to harden your customer environments and so for them what what those it service providers will do is use us to verify that they've installed Splunk correctly improved to their customer that Splunk was installed correctly or crowdstrike was installed correctly using our results and then use our results to drive follow-on services and revenue and then finally we've got the value-added reseller which is just a straight up reseller because of how fast our sales Cycles are these vars are able to typically go from cold email to deal close in six to eight weeks at Horizon 3 at least a single sales engineer is able to run 30 to 50 pocs concurrently because our pocs are very lightweight and don't require any on-prem customization or heavy pre-sales post sales activity so as a result we're able to have a few amount of sellers driving a lot of Revenue and volume for us well the same thing applies to bars there isn't a lot of effort to sell the product or prove its value so vars are able to sell a lot more Horizon 3 node zero product without having to build up a huge specialist sales organization so what I'm going to do is talk through uh scenario three here as an I.T service provider and just how powerful node zero can be in driving additional Revenue so in here think of for every one dollar of node zero license purchased by the IT service provider to do their business it'll generate ten dollars of additional revenue for that partner so in this example kidney group uses node 0 to verify that they have installed and deployed Splunk correctly so Kitty group is a Splunk partner they they sell it services to install configure deploy and maintain Splunk and as they deploy Splunk they're going to use node 0 to attack the environment and make sure that the right logs and alerts and monitoring are being handled within the Splunk deployment so it's a way of doing QA or verifying that Splunk has been configured correctly and that's going to be internally used by kidney group to prove the quality of their services that they've just delivered then what they're going to do is they're going to show and leave behind that node zero Report with their client and that creates a resell opportunity for for kidney group to resell node 0 to their client because their client is seeing the reports and the results and saying wow this is pretty amazing and those reports can be co-branded where it's a pen testing report branded with kidney group but it says powered by Horizon three under it from there kidney group is able to take the fixed actions report that's automatically generated with every pen test through node zero and they're able to use that as the starting point for a statement of work to sell follow-on services to fix all of the problems that node zero identified fixing l11r misconfigurations fixing or patching VMware or updating credentials policies and so on so what happens is node 0 has found a bunch of problems the client often lacks the capacity to fix and so kidney group can use that lack of capacity by the client as a follow-on sales opportunity for follow-on services and finally based on the findings from node zero kidney group can look at that report and say to the customer you know customer if you bought crowdstrike you'd be able to uh prevent node Zero from attacking and succeeding in the way that it did for if you bought humano or if you bought Palo Alto networks or if you bought uh some privileged access management solution because of what node 0 was able to do with credential harvesting and attacks and so as a result kidney group is able to resell other security products within their portfolio crowdstrike Falcon humano Polito networks demisto Phantom and so on based on the gaps that were identified by node zero and that pen test and what that creates is another feedback loop where kidney group will then go use node 0 to verify that crowdstrike product has actually been installed and configured correctly and then this becomes the cycle of using node 0 to verify a deployment using that verification to drive a bunch of follow-on services and resell opportunities which then further drives more usage of the product now the way that we licensed is that it's a usage-based license licensing model so that the partner will grow their node zero Consulting plus license as they grow their business so for example if you're a kidney group then week one you've got you're going to use node zero to verify your Splunk install in week two if you have a pen testing business you're going to go off and use node zero to be a force multiplier for your pen testing uh client opportunity and then if you have an mssp business then in week three you're going to use node zero to go execute a purple team mssp offering for your clients so not necessarily a kidney group but if you're a Deloitte or ATT these larger companies and you've got multiple lines of business if you're Optive for instance you all you have to do is buy one Consulting plus license and you're going to be able to run as many pen tests as you want sequentially so now you can buy a single license and use that one license to meet your week one client commitments and then meet your week two and then meet your week three and as you grow your business you start to run multiple pen tests concurrently so in week one you've got to do a Splunk verify uh verify Splunk install and you've got to run a pen test and you've got to do a purple team opportunity you just simply expand the number of Consulting plus licenses from one license to three licenses and so now as you systematically grow your business you're able to grow your node zero capacity with you giving you predictable cogs predictable margins and once again 10x additional Revenue opportunity for that investment in the node zero Consulting plus license my name is Saint I'm the co-founder and CEO here at Horizon 3. I'm going to talk to you today about why it's important to look at your Enterprise Through The Eyes of an attacker the challenge I had when I was a CIO in banking the CTO at Splunk and serving within the Department of Defense is that I had no idea I was Secure until the bad guys had showed up am I logging the right data am I fixing the right vulnerabilities are my security tools that I've paid millions of dollars for actually working together to defend me and the answer is I don't know does my team actually know how to respond to a breach in the middle of an incident I don't know I've got to wait for the bad guys to show up and so the challenge I had was how do we proactively verify our security posture I tried a variety of techniques the first was the use of vulnerability scanners and the challenge with vulnerability scanners is being vulnerable doesn't mean you're exploitable I might have a hundred thousand findings from my scanner of which maybe five or ten can actually be exploited in my environment the other big problem with scanners is that they can't chain weaknesses together from machine to machine so if you've got a thousand machines in your environment or more what a vulnerability scanner will do is tell you you have a problem on machine one and separately a problem on machine two but what they can tell you is that an attacker could use a load from machine one plus a low from machine two to equal to critical in your environment and what attackers do in their tactics is they chain together misconfigurations dangerous product defaults harvested credentials and exploitable vulnerabilities into attack paths across different machines so to address the attack pads across different machines I tried layering in consulting-based pen testing and the issue is when you've got thousands of hosts or hundreds of thousands of hosts in your environment human-based pen testing simply doesn't scale to test an infrastructure of that size moreover when they actually do execute a pen test and you get the report oftentimes you lack the expertise within your team to quickly retest to verify that you've actually fixed the problem and so what happens is you end up with these pen test reports that are incomplete snapshots and quickly going stale and then to mitigate that problem I tried using breach and attack simulation tools and the struggle with these tools is one I had to install credentialed agents everywhere two I had to write my own custom attack scripts that I didn't have much talent for but also I had to maintain as my environment changed and then three these types of tools were not safe to run against production systems which was the the majority of my attack surface so that's why we went off to start Horizon 3. so Tony and I met when we were in Special Operations together and the challenge we wanted to solve was how do we do infrastructure security testing at scale by giving the the power of a 20-year pen testing veteran into the hands of an I.T admin a network engineer in just three clicks and the whole idea is we enable these fixers The Blue Team to be able to run node Zero Hour pen testing product to quickly find problems in their environment that blue team will then then go off and fix the issues that were found and then they can quickly rerun the attack to verify that they fixed the problem and the whole idea is delivering this without requiring custom scripts be developed without requiring credential agents be installed and without requiring the use of external third-party consulting services or Professional Services self-service pen testing to quickly Drive find fix verify there are three primary use cases that our customers use us for the first is the sock manager that uses us to verify that their security tools are actually effective to verify that they're logging the right data in Splunk or in their Sim to verify that their managed security services provider is able to quickly detect and respond to an attack and hold them accountable for their slas or that the sock understands how to quickly detect and respond and measuring and verifying that or that the variety of tools that you have in your stack most organizations have 130 plus cyber security tools none of which are designed to work together are actually working together the second primary use case is proactively hardening and verifying your systems this is when the I that it admin that network engineer they're able to run self-service pen tests to verify that their Cisco environment is installed in hardened and configured correctly or that their credential policies are set up right or that their vcenter or web sphere or kubernetes environments are actually designed to be secure and what this allows the it admins and network Engineers to do is shift from running one or two pen tests a year to 30 40 or more pen tests a month and you can actually wire those pen tests into your devops process or into your detection engineering and the change management processes to automatically trigger pen tests every time there's a change in your environment the third primary use case is for those organizations lucky enough to have their own internal red team they'll use node zero to do reconnaissance and exploitation at scale and then use the output as a starting point for the humans to step in and focus on the really hard juicy stuff that gets them on stage at Defcon and so these are the three primary use cases and what we'll do is zoom into the find fix verify Loop because what I've found in my experience is find fix verify is the future operating model for cyber security organizations and what I mean here is in the find using continuous pen testing what you want to enable is on-demand self-service pen tests you want those pen tests to find attack pads at scale spanning your on-prem infrastructure your Cloud infrastructure and your perimeter because attackers don't only state in one place they will find ways to chain together a perimeter breach a credential from your on-prem to gain access to your cloud or some other permutation and then the third part in continuous pen testing is attackers don't focus on critical vulnerabilities anymore they know we've built vulnerability Management Programs to reduce those vulnerabilities so attackers have adapted and what they do is chain together misconfigurations in your infrastructure and software and applications with dangerous product defaults with exploitable vulnerabilities and through the collection of credentials through a mix of techniques at scale once you've found those problems the next question is what do you do about it well you want to be able to prioritize fixing problems that are actually exploitable in your environment that truly matter meaning they're going to lead to domain compromise or domain user compromise or access your sensitive data the second thing you want to fix is making sure you understand what risk your crown jewels data is exposed to where is your crown jewels data is in the cloud is it on-prem has it been copied to a share drive that you weren't aware of if a domain user was compromised could they access that crown jewels data you want to be able to use the attacker's perspective to secure the critical data you have in your infrastructure and then finally as you fix these problems you want to quickly remediate and retest that you've actually fixed the issue and this fine fix verify cycle becomes that accelerator that drives purple team culture the third part here is verify and what you want to be able to do in the verify step is verify that your security tools and processes in people can effectively detect and respond to a breach you want to be able to integrate that into your detection engineering processes so that you know you're catching the right security rules or that you've deployed the right configurations you also want to make sure that your environment is adhering to the best practices around systems hardening in cyber resilience and finally you want to be able to prove your security posture over a time to your board to your leadership into your regulators so what I'll do now is zoom into each of these three steps so when we zoom in to find here's the first example using node 0 and autonomous pen testing and what an attacker will do is find a way to break through the perimeter in this example it's very easy to misconfigure kubernetes to allow an attacker to gain remote code execution into your on-prem kubernetes environment and break through the perimeter and from there what the attacker is going to do is conduct Network reconnaissance and then find ways to gain code execution on other machines in the environment and as they get code execution they start to dump credentials collect a bunch of ntlm hashes crack those hashes using open source and dark web available data as part of those attacks and then reuse those credentials to log in and laterally maneuver throughout the environment and then as they loudly maneuver they can reuse those credentials and use credential spraying techniques and so on to compromise your business email to log in as admin into your cloud and this is a very common attack and rarely is a CV actually needed to execute this attack often it's just a misconfiguration in kubernetes with a bad credential policy or password policy combined with bad practices of credential reuse across the organization here's another example of an internal pen test and this is from an actual customer they had 5 000 hosts within their environment they had EDR and uba tools installed and they initiated in an internal pen test on a single machine from that single initial access point node zero enumerated the network conducted reconnaissance and found five thousand hosts were accessible what node 0 will do under the covers is organize all of that reconnaissance data into a knowledge graph that we call the Cyber terrain map and that cyber Terrain map becomes the key data structure that we use to efficiently maneuver and attack and compromise your environment so what node zero will do is they'll try to find ways to get code execution reuse credentials and so on in this customer example they had Fortinet installed as their EDR but node 0 was still able to get code execution on a Windows machine from there it was able to successfully dump credentials including sensitive credentials from the lsas process on the Windows box and then reuse those credentials to log in as domain admin in the network and once an attacker becomes domain admin they have the keys to the kingdom they can do anything they want so what happened here well it turns out Fortinet was misconfigured on three out of 5000 machines bad automation the customer had no idea this had happened they would have had to wait for an attacker to show up to realize that it was misconfigured the second thing is well why didn't Fortinet stop the credential pivot in the lateral movement and it turned out the customer didn't buy the right modules or turn on the right services within that particular product and we see this not only with Ford in it but we see this with Trend Micro and all the other defensive tools where it's very easy to miss a checkbox in the configuration that will do things like prevent credential dumping the next story I'll tell you is attackers don't have to hack in they log in so another infrastructure pen test a typical technique attackers will take is man in the middle uh attacks that will collect hashes so in this case what an attacker will do is leverage a tool or technique called responder to collect ntlm hashes that are being passed around the network and there's a variety of reasons why these hashes are passed around and it's a pretty common misconfiguration but as an attacker collects those hashes then they start to apply techniques to crack those hashes so they'll pass the hash and from there they will use open source intelligence common password structures and patterns and other types of techniques to try to crack those hashes into clear text passwords so here node 0 automatically collected hashes it automatically passed the hashes to crack those credentials and then from there it starts to take the domain user user ID passwords that it's collected and tries to access different services and systems in your Enterprise in this case node 0 is able to successfully gain access to the Office 365 email environment because three employees didn't have MFA configured so now what happens is node 0 has a placement and access in the business email system which sets up the conditions for fraud lateral phishing and other techniques but what's especially insightful here is that 80 of the hashes that were collected in this pen test were cracked in 15 minutes or less 80 percent 26 of the user accounts had a password that followed a pretty obvious pattern first initial last initial and four random digits the other thing that was interesting is 10 percent of service accounts had their user ID the same as their password so VMware admin VMware admin web sphere admin web Square admin so on and so forth and so attackers don't have to hack in they just log in with credentials that they've collected the next story here is becoming WS AWS admin so in this example once again internal pen test node zero gets initial access it discovers 2 000 hosts are network reachable from that environment if fingerprints and organizes all of that data into a cyber Terrain map from there it it fingerprints that hpilo the integrated lights out service was running on a subset of hosts hpilo is a service that is often not instrumented or observed by security teams nor is it easy to patch as a result attackers know this and immediately go after those types of services so in this case that ILO service was exploitable and were able to get code execution on it ILO stores all the user IDs and passwords in clear text in a particular set of processes so once we gain code execution we were able to dump all of the credentials and then from there laterally maneuver to log in to the windows box next door as admin and then on that admin box we're able to gain access to the share drives and we found a credentials file saved on a share Drive from there it turned out that credentials file was the AWS admin credentials file giving us full admin authority to their AWS accounts not a single security alert was triggered in this attack because the customer wasn't observing the ILO service and every step thereafter was a valid login in the environment and so what do you do step one patch the server step two delete the credentials file from the share drive and then step three is get better instrumentation on privileged access users and login the final story I'll tell is a typical pattern that we see across the board with that combines the various techniques I've described together where an attacker is going to go off and use open source intelligence to find all of the employees that work at your company from there they're going to look up those employees on dark web breach databases and other forms of information and then use that as a starting point to password spray to compromise a domain user all it takes is one employee to reuse a breached password for their Corporate email or all it takes is a single employee to have a weak password that's easily guessable all it takes is one and once the attacker is able to gain domain user access in most shops domain user is also the local admin on their laptop and once your local admin you can dump Sam and get local admin until M hashes you can use that to reuse credentials again local admin on neighboring machines and attackers will start to rinse and repeat then eventually they're able to get to a point where they can dump lsas or by unhooking the anti-virus defeating the EDR or finding a misconfigured EDR as we've talked about earlier to compromise the domain and what's consistent is that the fundamentals are broken at these shops they have poor password policies they don't have least access privilege implemented active directory groups are too permissive where domain admin or domain user is also the local admin uh AV or EDR Solutions are misconfigured or easily unhooked and so on and what we found in 10 000 pen tests is that user Behavior analytics tools never caught us in that lateral movement in part because those tools require pristine logging data in order to work and also it becomes very difficult to find that Baseline of normal usage versus abnormal usage of credential login another interesting Insight is there were several Marquee brand name mssps that were defending our customers environment and for them it took seven hours to detect and respond to the pen test seven hours the pen test was over in less than two hours and so what you had was an egregious violation of the service level agreements that that mssp had in place and the customer was able to use us to get service credit and drive accountability of their sock and of their provider the third interesting thing is in one case it took us seven minutes to become domain admin in a bank that bank had every Gucci security tool you could buy yet in 7 minutes and 19 seconds node zero started as an unauthenticated member of the network and was able to escalate privileges through chaining and misconfigurations in lateral movement and so on to become domain admin if it's seven minutes today we should assume it'll be less than a minute a year or two from now making it very difficult for humans to be able to detect and respond to that type of Blitzkrieg attack so that's in the find it's not just about finding problems though the bulk of the effort should be what to do about it the fix and the verify so as you find those problems back to kubernetes as an example we will show you the path here is the kill chain we took to compromise that environment we'll show you the impact here is the impact or here's the the proof of exploitation that we were able to use to be able to compromise it and there's the actual command that we executed so you could copy and paste that command and compromise that cubelet yourself if you want and then the impact is we got code execution and we'll actually show you here is the impact this is a critical here's why it enabled perimeter breach affected applications will tell you the specific IPS where you've got the problem how it maps to the miter attack framework and then we'll tell you exactly how to fix it we'll also show you what this problem enabled so you can accurately prioritize why this is important or why it's not important the next part is accurate prioritization the hardest part of my job as a CIO was deciding what not to fix so if you take SMB signing not required as an example by default that CVSs score is a one out of 10. but this misconfiguration is not a cve it's a misconfig enable an attacker to gain access to 19 credentials including one domain admin two local admins and access to a ton of data because of that context this is really a 10 out of 10. you better fix this as soon as possible however of the seven occurrences that we found it's only a critical in three out of the seven and these are the three specific machines and we'll tell you the exact way to fix it and you better fix these as soon as possible for these four machines over here these didn't allow us to do anything of consequence so that because the hardest part is deciding what not to fix you can justifiably choose not to fix these four issues right now and just add them to your backlog and surge your team to fix these three as quickly as possible and then once you fix these three you don't have to re-run the entire pen test you can select these three and then one click verify and run a very narrowly scoped pen test that is only testing this specific issue and what that creates is a much faster cycle of finding and fixing problems the other part of fixing is verifying that you don't have sensitive data at risk so once we become a domain user we're able to use those domain user credentials and try to gain access to databases file shares S3 buckets git repos and so on and help you understand what sensitive data you have at risk so in this example a green checkbox means we logged in as a valid domain user we're able to get read write access on the database this is how many records we could have accessed and we don't actually look at the values in the database but we'll show you the schema so you can quickly characterize that pii data was at risk here and we'll do that for your file shares and other sources of data so now you can accurately articulate the data you have at risk and prioritize cleaning that data up especially data that will lead to a fine or a big news issue so that's the find that's the fix now we're going to talk about the verify the key part in verify is embracing and integrating with detection engineering practices so when you think about your layers of security tools you've got lots of tools in place on average 130 tools at any given customer but these tools were not designed to work together so when you run a pen test what you want to do is say did you detect us did you log us did you alert on us did you stop us and from there what you want to see is okay what are the techniques that are commonly used to defeat an environment to actually compromise if you look at the top 10 techniques we use and there's far more than just these 10 but these are the most often executed nine out of ten have nothing to do with cves it has to do with misconfigurations dangerous product defaults bad credential policies and it's how we chain those together to become a domain admin or compromise a host so what what customers will do is every single attacker command we executed is provided to you as an attackivity log so you can actually see every single attacker command we ran the time stamp it was executed the hosts it executed on and how it Maps the minor attack tactics so our customers will have are these attacker logs on one screen and then they'll go look into Splunk or exabeam or Sentinel one or crowdstrike and say did you detect us did you log us did you alert on us or not and to make that even easier if you take this example hey Splunk what logs did you see at this time on the VMware host because that's when node 0 is able to dump credentials and that allows you to identify and fix your logging blind spots to make that easier we've got app integration so this is an actual Splunk app in the Splunk App Store and what you can come is inside the Splunk console itself you can fire up the Horizon 3 node 0 app all of the pen test results are here so that you can see all of the results in one place and you don't have to jump out of the tool and what you'll show you as I skip forward is hey there's a pen test here are the critical issues that we've identified for that weaker default issue here are the exact commands we executed and then we will automatically query into Splunk all all terms on between these times on that endpoint that relate to this attack so you can now quickly within the Splunk environment itself figure out that you're missing logs or that you're appropriately catching this issue and that becomes incredibly important in that detection engineering cycle that I mentioned earlier so how do our customers end up using us they shift from running one pen test a year to 30 40 pen tests a month oftentimes wiring us into their deployment automation to automatically run pen tests the other part that they'll do is as they run more pen tests they find more issues but eventually they hit this inflection point where they're able to rapidly clean up their environment and that inflection point is because the red and the blue teams start working together in a purple team culture and now they're working together to proactively harden their environment the other thing our customers will do is run us from different perspectives they'll first start running an RFC 1918 scope to see once the attacker gained initial access in a part of the network that had wide access what could they do and then from there they'll run us within a specific Network segment okay from within that segment could the attacker break out and gain access to another segment then they'll run us from their work from home environment could they Traverse the VPN and do something damaging and once they're in could they Traverse the VPN and get into my cloud then they'll break in from the outside all of these perspectives are available to you in Horizon 3 and node zero as a single SKU and you can run as many pen tests as you want if you run a phishing campaign and find that an intern in the finance department had the worst phishing behavior you can then inject their credentials and actually show the end-to-end story of how an attacker fished gained credentials of an intern and use that to gain access to sensitive financial data so what our customers end up doing is running multiple attacks from multiple perspectives and looking at those results over time I'll leave you two things one is what is the AI in Horizon 3 AI those knowledge graphs are the heart and soul of everything that we do and we use machine learning reinforcement techniques reinforcement learning techniques Markov decision models and so on to be able to efficiently maneuver and analyze the paths in those really large graphs we also use context-based scoring to prioritize weaknesses and we're also able to drive collective intelligence across all of the operations so the more pen tests we run the smarter we get and all of that is based on our knowledge graph analytics infrastructure that we have finally I'll leave you with this was my decision criteria when I was a buyer for my security testing strategy what I cared about was coverage I wanted to be able to assess my on-prem cloud perimeter and work from home and be safe to run in production I want to be able to do that as often as I wanted I want to be able to run pen tests in hours or days not weeks or months so I could accelerate that fine fix verify loop I wanted my it admins and network Engineers with limited offensive experience to be able to run a pen test in a few clicks through a self-service experience and not have to install agent and not have to write custom scripts and finally I didn't want to get nickeled and dimed on having to buy different types of attack modules or different types of attacks I wanted a single annual subscription that allowed me to run any type of attack as often as I wanted so I could look at my Trends in directions over time so I hope you found this talk valuable uh we're easy to find and I look forward to seeing seeing you use a product and letting our results do the talking when you look at uh you know kind of the way no our pen testing algorithms work is we dynamically select uh how to compromise an environment based on what we've discovered and the goal is to become a domain admin compromise a host compromise domain users find ways to encrypt data steal sensitive data and so on but when you look at the the top 10 techniques that we ended up uh using to compromise environments the first nine have nothing to do with cves and that's the reality cves are yes a vector but less than two percent of cves are actually used in a compromise oftentimes it's some sort of credential collection credential cracking uh credential pivoting and using that to become an admin and then uh compromising environments from that point on so I'll leave this up for you to kind of read through and you'll have the slides available for you but I found it very insightful that organizations and ourselves when I was a GE included invested heavily in just standard vulnerability Management Programs when I was at DOD that's all disa cared about asking us about was our our kind of our cve posture but the attackers have adapted to not rely on cves to get in because they know that organizations are actively looking at and patching those cves and instead they're chaining together credentials from one place with misconfigurations and dangerous product defaults in another to take over an environment a concrete example is by default vcenter backups are not encrypted and so as if an attacker finds vcenter what they'll do is find the backup location and there are specific V sender MTD files where the admin credentials are parsippled in the binaries so you can actually as an attacker find the right MTD file parse out the binary and now you've got the admin credentials for the vcenter environment and now start to log in as admin there's a bad habit by signal officers and Signal practitioners in the in the Army and elsewhere where the the VM notes section of a virtual image has the password for the VM well those VM notes are not stored encrypted and attackers know this and they're able to go off and find the VMS that are unencrypted find the note section and pull out the passwords for those images and then reuse those credentials across the board so I'll pause here and uh you know Patrick love you get some some commentary on on these techniques and other things that you've seen and what we'll do in the last say 10 to 15 minutes is uh is rolled through a little bit more on what do you do about it yeah yeah no I love it I think um I think this is pretty exhaustive what I like about what you've done here is uh you know we've seen we've seen double-digit increases in the number of organizations that are reporting actual breaches year over year for the last um for the last three years and it's often we kind of in the Zeitgeist we pegged that on ransomware which of course is like incredibly important and very top of mind um but what I like about what you have here is you know we're reminding the audience that the the attack surface area the vectors the matter um you know has to be more comprehensive than just thinking about ransomware scenarios yeah right on um so let's build on this when you think about your defense in depth you've got multiple security controls that you've purchased and integrated and you've got that redundancy if a control fails but the reality is that these security tools aren't designed to work together so when you run a pen test what you want to ask yourself is did you detect node zero did you log node zero did you alert on node zero and did you stop node zero and when you think about how to do that every single attacker command executed by node zero is available in an attacker log so you can now see you know at the bottom here vcenter um exploit at that time on that IP how it aligns to minor attack what you want to be able to do is go figure out did your security tools catch this or not and that becomes very important in using the attacker's perspective to improve your defensive security controls and so the way we've tried to make this easier back to like my my my the you know I bleed Green in many ways still from my smoke background is you want to be able to and what our customers do is hey we'll look at the attacker logs on one screen and they'll look at what did Splunk see or Miss in another screen and then they'll use that to figure out what their logging blind spots are and what that where that becomes really interesting is we've actually built out an integration into Splunk where there's a Splunk app you can download off of Splunk base and you'll get all of the pen test results right there in the Splunk console and from that Splunk console you're gonna be able to see these are all the pen tests that were run these are the issues that were found um so you can look at that particular pen test here are all of the weaknesses that were identified for that particular pen test and how they categorize out for each of those weaknesses you can click on any one of them that are critical in this case and then we'll tell you for that weakness and this is where where the the punch line comes in so I'll pause the video here for that weakness these are the commands that were executed on these endpoints at this time and then we'll actually query Splunk for that um for that IP address or containing that IP and these are the source types that surface any sort of activity so what we try to do is help you as quickly and efficiently as possible identify the logging blind spots in your Splunk environment based on the attacker's perspective so as this video kind of plays through you can see it Patrick I'd love to get your thoughts um just seeing so many Splunk deployments and the effectiveness of those deployments and and how this is going to help really Elevate the effectiveness of all of your Splunk customers yeah I'm super excited about this I mean I think this these kinds of purpose-built integration snail really move the needle for our customers I mean at the end of the day when I think about the power of Splunk I think about a product I was first introduced to 12 years ago that was an on-prem piece of software you know and at the time it sold on sort of Perpetual and term licenses but one made it special was that it could it could it could eat data at a speed that nothing else that I'd have ever seen you can ingest massively scalable amounts of data uh did cool things like schema on read which facilitated that there was this language called SPL that you could nerd out about uh and you went to a conference once a year and you talked about all the cool things you were splunking right but now as we think about the next phase of our growth um we live in a heterogeneous environment where our customers have so many different tools and data sources that are ever expanding and as you look at the as you look at the role of the ciso it's mind-blowing to me the amount of sources Services apps that are coming into the ciso span of let's just call it a span of influence in the last three years uh you know we're seeing things like infrastructure service level visibility application performance monitoring stuff that just never made sense for the security team to have visibility into you um at least not at the size and scale which we're demanding today um and and that's different and this isn't this is why it's so important that we have these joint purpose-built Integrations that um really provide more prescription to our customers about how do they walk on that Journey towards maturity what does zero to one look like what does one to two look like whereas you know 10 years ago customers were happy with platforms today they want integration they want Solutions and they want to drive outcomes and I think this is a great example of how together we are stepping to the evolving nature of the market and also the ever-evolving nature of the threat landscape and what I would say is the maturing needs of the customer in that environment yeah for sure I think especially if if we all anticipate budget pressure over the next 18 months due to the economy and elsewhere while the security budgets are not going to ever I don't think they're going to get cut they're not going to grow as fast and there's a lot more pressure on organizations to extract more value from their existing Investments as well as extracting more value and more impact from their existing teams and so security Effectiveness Fierce prioritization and automation I think become the three key themes of security uh over the next 18 months so I'll do very quickly is run through a few other use cases um every host that we identified in the pen test were able to score and say this host allowed us to do something significant therefore it's it's really critical you should be increasing your logging here hey these hosts down here we couldn't really do anything as an attacker so if you do have to make trade-offs you can make some trade-offs of your logging resolution at the lower end in order to increase logging resolution on the upper end so you've got that level of of um justification for where to increase or or adjust your logging resolution another example is every host we've discovered as an attacker we Expose and you can export and we want to make sure is every host we found as an attacker is being ingested from a Splunk standpoint a big issue I had as a CIO and user of Splunk and other tools is I had no idea if there were Rogue Raspberry Pi's on the network or if a new box was installed and whether Splunk was installed on it or not so now you can quickly start to correlate what hosts did we see and how does that reconcile with what you're logging from uh finally or second to last use case here on the Splunk integration side is for every single problem we've found we give multiple options for how to fix it this becomes a great way to prioritize what fixed actions to automate in your soar platform and what we want to get to eventually is being able to automatically trigger soar actions to fix well-known problems like automatically invalidating passwords for for poor poor passwords in our credentials amongst a whole bunch of other things we could go off and do and then finally if there is a well-known kill chain or attack path one of the things I really wish I could have done when I was a Splunk customer was take this type of kill chain that actually shows a path to domain admin that I'm sincerely worried about and use it as a glass table over which I could start to layer possible indicators of compromise and now you've got a great starting point for glass tables and iocs for actual kill chains that we know are exploitable in your environment and that becomes some super cool Integrations that we've got on the roadmap between us and the Splunk security side of the house so what I'll leave with actually Patrick before I do that you know um love to get your comments and then I'll I'll kind of leave with one last slide on this wartime security mindset uh pending you know assuming there's no other questions no I love it I mean I think this kind of um it's kind of glass table's approach to how do you how do you sort of visualize these workflows and then use things like sore and orchestration and automation to operationalize them is exactly where we see all of our customers going and getting away from I think an over engineered approach to soar with where it has to be super technical heavy with you know python programmers and getting more to this visual view of workflow creation um that really demystifies the power of Automation and also democratizes it so you don't have to have these programming languages in your resume in order to start really moving the needle on workflow creation policy enforcement and ultimately driving automation coverage across more and more of the workflows that your team is seeing yeah I think that between us being able to visualize the actual kill chain or attack path with you know think of a of uh the soar Market I think going towards this no code low code um you know configurable sore versus coded sore that's going to really be a game changer in improve or giving security teams a force multiplier so what I'll leave you with is this peacetime mindset of security no longer is sustainable we really have to get out of checking the box and then waiting for the bad guys to show up to verify that security tools are are working or not and the reason why we've got to really do that quickly is there are over a thousand companies that withdrew from the Russian economy over the past uh nine months due to the Ukrainian War there you should expect every one of them to be punished by the Russians for leaving and punished from a cyber standpoint and this is no longer about financial extortion that is ransomware this is about punishing and destroying companies and you can punish any one of these companies by going after them directly or by going after their suppliers and their Distributors so suddenly your attack surface is no more no longer just your own Enterprise it's how you bring your goods to Market and it's how you get your goods created because while I may not be able to disrupt your ability to harvest fruit if I can get those trucks stuck at the border I can increase spoilage and have the same effect and what we should expect to see is this idea of cyber-enabled economic Warfare where if we issue a sanction like Banning the Russians from traveling there is a cyber-enabled counter punch which is corrupt and destroy the American Airlines database that is below the threshold of War that's not going to trigger the 82nd Airborne to be mobilized but it's going to achieve the right effect ban the sale of luxury goods disrupt the supply chain and create shortages banned Russian oil and gas attack refineries to call a 10x spike in gas prices three days before the election this is the future and therefore I think what we have to do is shift towards a wartime mindset which is don't trust your security posture verify it see yourself Through The Eyes of the attacker build that incident response muscle memory and drive better collaboration between the red and the blue teams your suppliers and Distributors and your information uh sharing organization they have in place and what's really valuable for me as a Splunk customer was when a router crashes at that moment you don't know if it's due to an I.T Administration problem or an attacker and what you want to have are different people asking different questions of the same data and you want to have that integrated triage process of an I.T lens to that problem a security lens to that problem and then from there figuring out is is this an IT workflow to execute or a security incident to execute and you want to have all of that as an integrated team integrated process integrated technology stack and this is something that I very care I cared very deeply about as both a Splunk customer and a Splunk CTO that I see time and time again across the board so Patrick I'll leave you with the last word the final three minutes here and I don't see any open questions so please take us home oh man see how you think we spent hours and hours prepping for this together that that last uh uh 40 seconds of your talk track is probably one of the things I'm most passionate about in this industry right now uh and I think nist has done some really interesting work here around building cyber resilient organizations that have that has really I think helped help the industry see that um incidents can come from adverse conditions you know stress is uh uh performance taxations in the infrastructure service or app layer and they can come from malicious compromises uh Insider threats external threat actors and the more that we look at this from the perspective of of a broader cyber resilience Mission uh in a wartime mindset uh I I think we're going to be much better off and and will you talk about with operationally minded ice hacks information sharing intelligence sharing becomes so important in these wartime uh um situations and you know we know not all ice acts are created equal but we're also seeing a lot of um more ad hoc information sharing groups popping up so look I think I think you framed it really really well I love the concept of wartime mindset and um I I like the idea of applying a cyber resilience lens like if you have one more layer on top of that bottom right cake you know I think the it lens and the security lens they roll up to this concept of cyber resilience and I think this has done some great work there for us yeah you're you're spot on and that that is app and that's gonna I think be the the next um terrain that that uh that you're gonna see vendors try to get after but that I think Splunk is best position to win okay that's a wrap for this special Cube presentation you heard all about the global expansion of horizon 3.ai's partner program for their Partners have a unique opportunity to take advantage of their node zero product uh International go to Market expansion North America channel Partnerships and just overall relationships with companies like Splunk to make things more comprehensive in this disruptive cyber security world we live in and hope you enjoyed this program all the videos are available on thecube.net as well as check out Horizon 3 dot AI for their pen test Automation and ultimately their defense system that they use for testing always the environment that you're in great Innovative product and I hope you enjoyed the program again I'm John Furrier host of the cube thanks for watching

Published Date : Sep 28 2022

SUMMARY :

that's the sort of stuff that we do you

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Patrick CoughlinPERSON

0.99+

Jennifer LeePERSON

0.99+

ChrisPERSON

0.99+

TonyPERSON

0.99+

2013DATE

0.99+

Raina RichterPERSON

0.99+

SingaporeLOCATION

0.99+

EuropeLOCATION

0.99+

PatrickPERSON

0.99+

FrankfurtLOCATION

0.99+

JohnPERSON

0.99+

20-yearQUANTITY

0.99+

hundredsQUANTITY

0.99+

AWSORGANIZATION

0.99+

20 yearsQUANTITY

0.99+

seven minutesQUANTITY

0.99+

95QUANTITY

0.99+

FordORGANIZATION

0.99+

2.7 billionQUANTITY

0.99+

MarchDATE

0.99+

FinlandLOCATION

0.99+

seven hoursQUANTITY

0.99+

sixty percentQUANTITY

0.99+

John FurrierPERSON

0.99+

SwedenLOCATION

0.99+

John FurrierPERSON

0.99+

six weeksQUANTITY

0.99+

seven hoursQUANTITY

0.99+

19 credentialsQUANTITY

0.99+

ten dollarsQUANTITY

0.99+

JenniferPERSON

0.99+

5 000 hostsQUANTITY

0.99+

Horizon 3TITLE

0.99+

WednesdayDATE

0.99+

30QUANTITY

0.99+

eightQUANTITY

0.99+

Asia PacificLOCATION

0.99+

American AirlinesORGANIZATION

0.99+

DeloitteORGANIZATION

0.99+

three licensesQUANTITY

0.99+

two companiesQUANTITY

0.99+

2019DATE

0.99+

European UnionORGANIZATION

0.99+

sixQUANTITY

0.99+

seven occurrencesQUANTITY

0.99+

70QUANTITY

0.99+

three peopleQUANTITY

0.99+

Horizon 3.aiTITLE

0.99+

ATTORGANIZATION

0.99+

Net ZeroORGANIZATION

0.99+

SplunkORGANIZATION

0.99+

UberORGANIZATION

0.99+

fiveQUANTITY

0.99+

less than two percentQUANTITY

0.99+

less than two hoursQUANTITY

0.99+

2012DATE

0.99+

UKLOCATION

0.99+

AdobeORGANIZATION

0.99+

four issuesQUANTITY

0.99+

Department of DefenseORGANIZATION

0.99+

next yearDATE

0.99+

three stepsQUANTITY

0.99+

node 0TITLE

0.99+

15 minutesQUANTITY

0.99+

hundred percentQUANTITY

0.99+

node zeroTITLE

0.99+

10xQUANTITY

0.99+

last yearDATE

0.99+

7 minutesQUANTITY

0.99+

one licenseQUANTITY

0.99+

second thingQUANTITY

0.99+

thousands of hostsQUANTITY

0.99+

five thousand hostsQUANTITY

0.99+

next weekDATE

0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022


 

>>Welcome back everyone to the cubes coverage here in Seattle, Washington, AWS's marketplace seller conference. It's the big news within the Amazon partner network, combining with marketplaces, forming the Amazon partner organization, part of a big reorg as they grow the next level NextGen cloud mid-game on the chessboard. Cube's got cover. I'm John fur, host of Cub, a great guests here from data bricks, both cube alumnis, Jack Anderson, GM of the and VP of the data bricks partnership team. For ADOS, you handle that relationship and Joel Minick vice president of product and partner marketing. You guys are the, have the keys to the kingdom with data, bricks, and AWS. Thanks for joining. Thanks for good to see you again. Thanks for >>Having us back. Yeah, John, great to be here. >>So I feel like we're at reinvent 2013 small event, no stage, but there's a real shift happening with procurement. Obviously it makes it's a no brainer on the micro, you know, people should be buying online self-service cloud scale, but Amazon's got billions being sold to their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website marketplace merge our partner to have more synergy and friction, less experiences so everyone can make more money and customer's gonna be happier. >>Yeah, that's right. >>I mean, you're run relationship. You're in the middle of it. >>Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do is to work with Amazon on these really, you know, unique use cases. >>You know, I interviewed Ali many times over the years. I remember many years ago, I think six, maybe six, seven years ago, we were talking. He's like, we're all in ons. Obviously. Now the success of data bricks, you've got multiple clouds. See that customers have choice, but I remember the strategy early on. It was like, we're gonna be deep. So this is speaks volumes to the, the relationship you have years. Jack take us through the relationship that data bricks has with AWS from a, from a partner perspective, Joel, and from a product perspective, because it's not like you got to Johnny come lately new to the new, to the scene, right? We've been there almost president creation of this wave. What's the relationship and has it relate to what's going on today? >>So, so most people may not know that data bricks was born on AWS. We actually did our first 100 million of revenue on Amazon. And today we're obviously available on multiple clouds, but we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon and marketplace broadens our reach. And so we think of marketplace in three different aspects. We've got the marketplace, private offer business, which we've been doing for a number of years. Matter of fact, we we're driving well over a hundred percent year over year growth in private offers and we have a nine figure business. So it's a very significant business. And when a customer uses a private offer that private offer counts against their private pricing agreement with AWS. So they get pricing power against their, their private pricing. >>So it's really important. It goes on their Amazon bill in may. We launched our pay as you go on demand offering. And in five short months, we have well over a thousand subscribers. And what this does is it really reduces the barriers to entry it's low friction. So anybody in an enterprise or startup or public sector company can start to use data bricks on AWS and pay consumption based model and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation pilots, POCs, they're, they're really learning the value of that first use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offers C P O super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with data bricks on behalf of customers. >>So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data coming in with, with the buyers you go. And obviously the integration piece all fitting in there. Exactly. Exactly. Okay. So that's that those are the offers that's current and what's in marketplace today. Is that the products, what are, what are people buying? I mean, I guess what's the Joel, what are, what are people buying in the marketplace and what does it mean for >>Them? So fundamentally what they're buying is the ability to take silos out of their organization. And that's, that is the problem that data bricks is out there to solve, which is when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data, and your teams are trying to use all of this data to solve really complicated problems. And as data bricks as the lake house company, what we're helping customers do is how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find through the marketplace, those rapid adoption use cases where they can get rid of these data, warehousing data lake silos they've had in the past, get their unstructured and structured data onto one data platform and open data platform that is no longer adherent to any proprietary formats and standards and something. >>They can very much, very easily integrate into the rest of their data environment, apply one common data governance layer on top of that. So that from the time they ingest that data to the time they use that data to the time they share that data inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform with that common governance solution, they'd being able to bring all of those use cases together across their real time, streaming their data engineering, their BI, their AI, all of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges. They just couldn't solve before a good example of this, you know, one of the world's now largest data streaming platforms runs on data bricks with AWS. >>And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses, that they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data from what the customers are doing with their platform and the recommendations they wanna push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex, but by building it on data bricks, they were able to release it in record time and have grown at, at record pace >>To not be that's product platform that's impacting product development. Absolutely. I mean, this is like the difference between lagging months of product development to like days. Yes. Pretty much what you're getting at. Yeah. So total agility. I got that. Okay. Now I'm a customer I wanna buy in the marketplace, but I also, you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from a Davis's leadership, Chris was up there speaking and, and, and moment I was, Hey, he's a CRO conference, chief revenue officer conversation, which means someone's getting compensated. So if I'm the sales rep at data bricks, what's my motion to the customer. Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Is there or an audio lift? >>Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, the entire data bricks offering and >>The flagship, all the, the top, >>Everything, the flagship, the complete offering. So it's not, it's not segmented. It's not a sub segment. It's it's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we, we, we view this two, two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend data bricks for the right situation. Same thing with data bricks. Our Salesforce wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get data bricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars, you have $5 million of spend, you put that spend through the flywheel with AWS marketplace. And then you can use that in your negotiations with AWS to get better pricing overall. So that's how we, >>We do it. So customers are driving. This sounds like, correct. For sure. So they're looking at this as saying, Hey, I'm gonna just get purchasing power with all my relationships because it's a solution architectural market, right? >>Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider, if they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing >>Power, okay, Jill, we're gonna date ourselves. At least I will. So back in the old days, it used to be, do a Barney deal with someone, Hey, let's go to market together. You gotta get paper, you do a biz dev deal. And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for data bricks or any company is to go find those partners and do deals versus now Amazon is the center point for the customer so that you can still do those joint deals. But this seems to be flipping the script a little bit. >>Well, it is, but we still have VAs and consulting partners that are doing implementation work very valuable work advisory work that can actually work with marketplace through the C PPO offering. So the marketplace allows multiple ways to procure your >>Solution. So it doesn't change your business structure. It just makes it more efficient. That's >>Correct. >>That's a great way to say it. Yeah, >>That's great. So that's so that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >>Yes, >>Absolutely. Economically. Yeah. >>E economically it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon, especially when it comes back to co-selling right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution and our teams are working backwards from those use cases, you know, to collaborate, land them. >>Yeah. I want, I wanna get that out there. Go ahead, Joel. >>So one of the other things I might add to that too, you know, and why this is advantageous for, for companies like data bricks to, to work through the marketplace, is it makes it so much easier for customers to deploy a solution. It's, it's very, literally one click through the marketplace to get data bricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. You >>Know, it's interesting. I wanna bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself EDI back in the old days, you know, all that craziness. Now this is all the, all the internet, basically through the console, I get the infrastructure side, you know, spin up and provision. Some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage little nuance, most enterprises aren't yet there of integrating tech on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrator's dilemma, not an innovator's dilemma. So like, I want to integrate, so now I have integration points with data bricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you gotta build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or, or am I off because no, one's gonna be buying software. Like they used to, they buy software to integrate it. >>Yeah, >>No, I, cause everything's integrated. >>I think AWS has done a great job at creating a partner ecosystem, right. To give customers the right tools for the right jobs. And those might be with third parties, data bricks is doing the same thing with our partner connect program. Right. We've got customer, customer partners like five tra and D V T that, you know, augment and enhance our platform. And so you, you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >>Yeah. It's almost like, you know, bundling and unbundling. I was talking about this with, with Dave ante about Supercloud, which is why wouldn't a customer want the best solution in their architecture period. And it's class. If someone's got API security or an API gateway. Well, you know, I don't wanna be forced to buy something because it's part of a suite and that's where you see things get suboptimized where someone dominates a category and they have, oh, you gotta buy my version of this. Yeah. >>Joel, Joel. And that's Joel and I were talking, we're actually saying what what's really important about Databricks is that customers control the data. Right? You wanna comment on that? >>Yeah. I was say the, you know what you're pushing on there we think is extraordinarily, you know, the way the market is gonna go is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically I think really strong places, data, bricks, and AWS have lined up is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the lake house, one thing we've always been extremely committed to as a company is building the data platform on an open foundation. And we do that primarily through Delta lake and making sure that to Jack's point with data bricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with all the other data tools out there, because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >>When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what's the, what am I foreclosing? If I go with something else that's not as open what what's the customer's downside as you think about what's around the corner in the industry. Cuz if you believe it's gonna be open, open source, which I think opens our software is the software industry and integration is a big deal, cuz software's gonna be plentiful. Let's face it. It's a good time to be in software business, but cloud's booming. So what's the downside from your data bricks perspective, you see a buyer clicking on data bricks versus that alternative what's potentially is should they be a nervous about down the road if they go with a more proprietary or locked in approach? Well, >>I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also then beholden to the pace at which that provider is able to innovate. And I think we've seen lots of times over history where, you know, a proprietary format may run ahead for a while on a lot of innovation. But as that market control begins to solidify that desire to innovate begins to, to degrade, whereas in the open format. So >>Extract rents versus innovation. Exactly. >>Yeah, exactly. >>But >>I'll say it in the open world, you know, you have to continue to innovate. Yeah. And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you were always going to be at the forefront of what is the >>Latest, you know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked for proprietary. Yeah. You know, SNA at IBM deck net was digital, you know, the rest is, and then TCP, I P was part of the open systems, interconnect, revolutionary Oly, a big part of that as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. Yeah. But that helped interoperate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid game. Then you got the end game and we're not there. The end game yet cloud the cloud. >>There's, there's always some form of lock in, right. Andy jazzy will, will address it, you know, when making a decision. But if you're gonna make a decision you want to reduce as you don't wanna be limited. Right. So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now is an AI driven business. Right? And so it has to do with, can you get that data outta silos? Can you, can you organize it and secure it? And then can you work with data scientists to feed those models? Yeah. In a, in a very consistent manner. And so the tools of tomorrow will to Joel's point will be open and we want interoperability with those >>Tools and, and choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds is that they have to compete directly too. Redshift competes directly with a lot of other stuff, but they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're gonna be they're onto it. This is >>The Amazon's credit by having these, these solutions that may compete with native services in marketplace, they are providing customers with choice, low >>Price and access to the S and access to the core value. Exactly. Which the >>Hardware, which is their platform. Okay. So I wanna get you guys thought on something else. I, I see emerging, this is again kind of cube rumination moment. So on stage Chris unpacked, a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing compensation, workflows services behind the curtain. And one of the things he mentioned was they talk about resellers or channel partners, depending upon what you talk about. We believe Dave and I believe on the cube that the entire indirect sales channel of the industry is gonna be disrupted radically because those players were selling hardware in the old days and software, that game is gonna change. You know, you mentioned you guys have a program, want to get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in which means that the old reseller channels are gonna be rewritten. They're gonna be refactored with this new kinds of access. Cuz you've got scale, you've got money and you've got product and you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, value added reseller or V or business, >>You've gotta evolve. >>You gotta, you gotta be here. Yes. How are you guys working with those partners? Cuz you say you have a part in your marketplace there. How do I make money? If I'm a reseller with data bricks with eight Amazon, take me through that use case. >>Well I'll let Joel comment, but I think it's, it's, it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations, they need expertise from consulting and SI partners. If those consulting SI partners happen to resell the solution as well. Well, that's another aspect of their business, but I really think it is the expertise that the partners bring to help customers get outcomes. >>Joel, channel big opportunity for re re Amazon to reimagine this. >>For sure. Yeah. And I think, you know, to your comment about how to resellers take advantage of that, I think what Jack was pushing on is spot on, which is it's becoming more about more and more about the expertise you bring to the table and not just transacting the software, but now actually helping customers make the right choices. And we're seeing, you know, both SI begin to be able to resell solutions and finding a lot of opportunity in that. Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's gonna be the evolution that >>This gets at the end of the day. It's about services for sure, for sure. You've got a great service. You're gonna have high gross profits. And >>I think that the managed service provider business is alive and well, right? Because there are a number of customers that want that, that type of a service. >>I think that's gonna be a really hot, hot button for you guys. I think being the way you guys are open this channel partner services model coming in to the fold really kind of makes for kind of that super cloudlike experience where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on within data bricks for sure. On top of this ecosystem, how does that work? This is kinda like hasn't been written up in business school and case studies yet this is new. What is this? >>I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around the data platforms and that's gonna be one of the big kind of new horizons for us as we think about what drives ecosystems it's going to be around. Well, what is the, what's the data platform that I'm using and then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around data bricks in its Lakehouse platform, as well as customers are looking at well, if I'm standing these Lakehouse up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >>I mean you think about ecosystem theory, we're living a whole nother dream and I'm, and I'm not kidding. It hasn't yet been written up and for business school case studies is that we're now in a whole nother connective tissue ecology thing happening where you have dependencies and value proposition economics connectedness. So you have relationships in these ecosystems. >>And I think one of the great things about relationships with these ecosystems is that there's a high degree of overlap. Yeah. So you're seeing that, you know, the way that the cloud business is evolving, the, the ecosystem partners of data bricks are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for >>You. Joel, Jack, I love it because you know what it means the best ecosystem will win. If you keep it open. Sure. You can see everything. If you're gonna do it in the dark, you know, you don't know the outcome. I mean, this is really kind we're talking about. >>And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? Yeah. And so there's a platform decision being made and that ecosystem gonna evolve around the >>Platform. Yeah. And I totally agree. And, and, and the word innovation get kicks around. That's why, you know, when we had our super cloud panel was called the innovators dilemma with a slash through it called the integrated dilemma, innovation is the digital transformation. So absolutely like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating if APIs are the connective tissue, what's automation, what's the service message look like. I mean, a whole nother set of kind of thinking goes on and these new ecosystems and these new products >>And that, and that thinking is, has been born in Delta sharing. Right? So the idea that you can have a multi-cloud implementation of data bricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud >>Solution. Well, data bricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great grow ecosystem. And again, I think in a shining example of what every enterprise is going to do, build on top of something operating model, get that operating model, driving revenue. >>Yeah. >>Well we, whether whether you're Goldman Sachs or capital one or XYZ corporation >>S and P global NASDAQ, right. We've got, you know, these, the biggest verticals in the world are solving tough problems with data breaks. I think we'd be remiss cuz if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions, whether it's the relationship we have with our engineering and service teams. Yeah. Our marketing teams, you know, product development and we're gonna be at reinvent the big presence of reinvent. We're looking forward to seeing you there again. >>Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening this next gen cloud is here. We're seeing this evolve kind of new economics, new value propositions kind of scaling up, producing more so you guys are doing a great job. Thanks for coming on the Cuban, taking time. Chill. Great to see you at the check. Thanks for having us. Thanks. Going. Okay. Cube coverage here. The world's changing as APN comes to give the marketplace for a new partner organization at Amazon web services, the Cube's got a covered. This should be a very big growing ecosystem as this continues, billions of being sold through the marketplace. Of course the buyers are happy as well. So we've got it all covered. I'm John furry, your host of the cube. Thanks for watching.

Published Date : Sep 21 2022

SUMMARY :

Thanks for good to see you again. Yeah, John, great to be here. Obviously it makes it's a no brainer on the micro, you know, You're in the middle of it. you know, unique use cases. So this is speaks volumes to the, the relationship you have years. And when you look at what the APN allows us to do, And so we see customers, you know, doing rapid experimentation pilots, POCs, So you got the big contracts with the private offer. And that's, that is the problem that data bricks is out there to solve, They just couldn't solve before a good example of this, you know, And if you think about what does it take to set that up? So how do you guys look at this? Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering And that really helps customers because if you get data bricks So they're looking at this as saying, you know, multiple ISV spend through that same primary provider, you get pricing And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So the marketplace allows multiple ways to procure your So it doesn't change your business structure. Yeah, So you guys are actually incented to Yeah. It's the right thing to do for our relationship with Amazon, So one of the other things I might add to that too, you know, and why this is advantageous for, I get the infrastructure side, you know, spin up and provision. you know, augment and enhance our platform. you know, I don't wanna be forced to buy something because it's part of a suite and the data. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with When you see other solutions out there that aren't as open as you guys, you guys are very open by the I think the challenge with proprietary ecosystems is you become beholden to the Exactly. I'll say it in the open world, you know, you have to continue to innovate. I call it the chessboard, you know, you got opening game and mid game. And so it has to do with, can you get that data outta silos? And I would say that, you know, the argument for why I think Amazon Price and access to the S and access to the core value. So I wanna get you guys thought on something else. You gotta, you gotta be here. If those consulting SI partners happen to resell the solution as well. And we're seeing, you know, both SI begin to be This gets at the end of the day. I think that the managed service provider business is alive and well, right? I think being the way you guys are open this channel I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around So you have relationships in And so as you build these platforms out into the cloud, you're able to really take advantage you don't know the outcome. And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, That's why, you know, when we had our super cloud panel So the idea that you can have a multi-cloud implementation of data bricks, and actually share data But you guys have done a great job taking that building differentiation into the product. We're looking forward to seeing you there again. Great to see you at the check.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ChrisPERSON

0.99+

Joel MinickPERSON

0.99+

AWSORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

JohnPERSON

0.99+

JoelPERSON

0.99+

AliPERSON

0.99+

Jack AndersonPERSON

0.99+

DavePERSON

0.99+

$5 millionQUANTITY

0.99+

JackPERSON

0.99+

twoQUANTITY

0.99+

Goldman SachsORGANIZATION

0.99+

XYZORGANIZATION

0.99+

Joel MinnickPERSON

0.99+

Jack AndersenPERSON

0.99+

Andy jazzyPERSON

0.99+

third aspectQUANTITY

0.99+

John furPERSON

0.99+

NASDAQORGANIZATION

0.99+

BarneyORGANIZATION

0.99+

bothQUANTITY

0.99+

five short monthsQUANTITY

0.99+

OneQUANTITY

0.99+

APOORGANIZATION

0.99+

todayDATE

0.99+

IBMORGANIZATION

0.99+

first 100 millionQUANTITY

0.98+

tomorrowDATE

0.98+

oneQUANTITY

0.98+

billionsQUANTITY

0.98+

JohnnyPERSON

0.97+

DavisPERSON

0.97+

a million dollarsQUANTITY

0.96+

SalesforceORGANIZATION

0.96+

data bricksORGANIZATION

0.95+

each ISVQUANTITY

0.95+

Seattle, WashingtonLOCATION

0.95+

two different waysQUANTITY

0.95+

one data platformQUANTITY

0.95+

seven years agoDATE

0.94+

Rob Emsley, Dell Technologies


 

>>Welcome back to a blueprint for trusted infrastructure. We're here with Rob Emsley. Who's the director of product marketing for data protection and cyber security. Rob. Good to see you a new role. >>Yeah. Good to be back, Dave. Good to see you. Yeah, it's been a while since we chatted last and you know, one of the changes in, in my world is that I've expanded my responsibilities beyond data protection marketing, to also focus on cyber security marketing specifically for our infrastructure solutions group. So certainly that's, you know, something that really has driven us to, you know, to come and have this conversation with you today. >>So data protection obviously has become an increasingly important component of the cyber security space. I, I don't think necessarily of, you know, traditional backup and recovery as security it's to me, it's an adjacency. I know some companies have said, oh yeah, now we're a security company. They're kind of chasing the valuation for sure. Bubble. Dell's interesting because you, you have, you know, data protection in the form of backup and recovery and data management, but you also have security, you know, direct security capabilities. So you're sort of bringing those two worlds together and it sounds like your responsibility is to, to connect those, those dots. Is that right? >>Absolutely. Yeah. I mean, I think that the reality is, is that security is a, a multi-layer discipline. I think the, the days of thinking that it's one or another technology that you can use or process that you can use to make your organization secure long gone. I mean, certainly you actually correct. If you think about the backup and recovery space, I mean, people have been doing that for years, you know, certainly backup and recovery. It's all about the recovery. It's all about getting yourself backup and running when bad things happen. And one of the realities, unfortunately today is that one of the worst things that can happen is cyber attacks. You know, ransomware, malware are all things that are top of mind for all organizations today. And that's why you see a lot of technology and a lot of innovation going into the backup and recovery space, because if you have a copy, a good copy of your data, then that is really the, the first place you go to recover from a cyber attack. >>And that's why it's so important. The reality is is that unfortunately the cyber criminals keep on getting smarter. I don't know how it happens, but one of the things that is happening is that the days of them just going after your production data are no longer the only challenge that you have, they go after your, your backup data as well. So over the last half a decade, Dell technologies with its backup and recovery portfolio has introduced the concept of isolated cyber recovery volts. And that is really the, you know, we've had many conversations about that over the years. Yeah. And that's really a big tenant of what we do in the debt protection portfolio. >>So this idea of, of cybersecurity resilience, that definition is evolving. What does it mean to you? >>Yeah, I think the, the analyst team over at Gartner, they wrote a, a very insightful paper called you will be hacked, embraced the breach. And the whole basis of this analysis is so much money's been spent on prevention is that what's outta balance is the amount of budget that companies have spent on cyber resilience and cyber resilience is based upon the premise that you will be hacked. You have to embrace that fact and be ready and prepared to bring yourself back into business. You know, and that's really where cyber resiliency is very, very different than cyber security and prevention, you know, and I think that balance of get your security disciplines well funded, get your defenses as good as you can get them, but make sure that if the inevitable happens and you find yourself compromised that you have a great recovery plan and certainly a great recovery plan, it's really the basis of any good solid data protection backup from recovery philosophy. >>So if I had to do a SWOT analysis, we don't have to do the w OT, but let's focus on the S what would you say are Dell's strengths in this, you know, cybersecurity space, as it relates to data protection. >>One is we've been doing it a long time. You know, we talk a lot about Dell's data protection being proven and modern. You know, certainly the experience that we've had over literally three decades of providing enterprise scale data protection solutions to our customers has really allowed us to have a lot of insight into what works and what doesn't, as I mentioned to you, one of the unique differentiators of our solution is the cyber recovery vaulting solution that we introduce a little over five years ago, five, six years, power protect cyber recovery is something which has become a unique capability for customers to adopt on top of their investment in Dell technologies, data protection, you know, the, the unique elements of our solution already threefold, and it's, we call them the three eyes. It's isolation, it's a mutability and its intelligence. And the, the isolation part is really so important because you need to reduce the attack surface of your good known copies of data. >>You know, you need to put it in a location that the bad actors can't get to it. And that really is the, the, the, the essence of a cyber recovery vault. Interestingly enough, you're starting to see the market throw out that word, you know, from many other places, but really it comes down to having a real discipline that you don't allow the security of your cyber recovery vault to be compromised insofar as allowing it to be controlled from outside of the vault, you know, allowing it to be controlled by your backup application. Our cyber recovery vaulting technology is independent of the backup infrastructure. It uses it, but it controls its own security. And that is so, so important. It's like having a, a vault that the only way to open it is from the inside, you know, and think about that. If you think about, you know, vaults in banks or vaults in your home, normally you have a keypad on the outside, think of our cyber recovery vault as having its security controlled from inside of the vault. >>So nobody can get in, nothing can get in unless it's already in. And if it's already in, then it's trusted. Exactly. Yeah, exactly. Yeah. So isolation's the key. And then you, you mentioned immutability is the second piece. >>Yeah, so I, mutability is, is also something which has been around for a long time. People talk about backup mutability or immutable backup copies. So immutability is just the, the, the additional technology that allows the data that's inside of the vault to be unchangeable, you know, but again, that immutability, you know, your mileage varies, you know, when you look across the, the different offers that are out there in the market, especially in the backup industry, you make a very valid point earlier that the backup vendors in the market seem to be security, washing their marketing messages. I mean, everybody is leaning into the ever present danger of cyber security, not a bad thing, but the reality is is that you have to have the technology to back it up, you know, quite literally, >>Yeah, yeah, no pun intended. Right. And then actually pun intended. Now what about the intelligence piece of it? That's that's AI ML, where does that fit >>For sure. So the intelligence piece is delivered by a solution called cyber sense. And cyber sense for us is what really gives you the confidence that what you have in your cyber recovery volt is a good clean copy of data. So it's looking at the backup copies that get driven into the cyber volt, and it's looking for anomalies. So it's not looking for signatures of malware. You know, that's what your antivirus software does. That's what your endpoint protection software does. That's on the prevention side of the equation. But what we're looking for is we're looking to ensure that the data that you need when all hell breaks loose is good, and that when you get a request to restore and recover your business, you go right, let's go and do it. And you don't have any concern that what you have in the vault has been compromised. >>So cyber sense is really a, a unique analytics solution in the market, based upon the fact that it, it, isn't looking at at cursory indicators of, of, of, of, of malware infection or, or, or ransomware introduction it's doing full content analytics, you know, looking at, you know, has the data in any way changed, has it suddenly become encrypted? Has it suddenly become different to how it was in the previous scan? So that anomaly detection is very, very different. It's looking for, you know, like different characteristics that really are an indicator that something is going on. And of course, if it sees it, you immediately get flagged. But the good news is, is that you always have in the vault, the previous copy of good known data, which now becomes your restore point. >>So we're talking to Rob Emsley about how data protection fits into what Dell calls DT, I, Dell trusted infrastructure. And, and I'm, I want to come back Rob to this notion of, and, or cuz I think a lot of people are skeptical. Like how can I have great security and not introduce friction into my organization? Is that an automation play? How, how does Dell tackle that problem? >>I mean, I think a lot of it is across our infrastructure is, is security has to be built in, I mean, intrinsic security within our servers, within our storage devices, within our elements of our backup infrastructure. I mean, security, multifactor authentication, you know, elements that make the overall infrastructure secure. You know, we have capabilities that, you know, allow us to identify whether or not configurations have changed. You know, we'll probably be talking about that a little bit more to you later in the segment, but the, the essence is, is security is not, not a Bolton. It has to be part of the overall infrastructure. And that's so true, certainly in the data protection space, >>Give us the, the, the bottom line on, on how you see Dell's key differentiators. Maybe you could talk about Dell, of course always talks about its portfolio, but, but why should customers, you know, lead in to Dell in, in this whole cyber resilience space, >>You know, staying on the data protection space. As I mentioned, the, the, the work we've been doing to introduce this cyber resiliency solution for debt protection is in our opinion, as good as it gets, you know, the, you know, you've spoken to a number of our, of our best customers, whether it be Bob bender from founders, federal, or more recently at Delta arches world, you spoke to Tony Bryson yep. From the town of Gilbert. And these are customers that we've had for many years that have implemented cyber recovery volts. And at the end of the day, they can now sleep at night. You know, that's really the, the peace of mind that they have is that the insurance that a data protection from Dell cyber recovery vault a para protect cyber recovery solution, gives them, you know, really allows them to, you know, just have the assurance that they don't have to pay a ransom if they have a, an insider threat issue. And you know, all the way down to data deletion is they know that what's in the cyber recovery vault is good and ready for them to recover from. >>Great, well, Rob, congratulations on the new scope of responsibility. I like how you know, your organization is expanding as the threat surface is expanding. As we said, data protection becoming an adjacency to, to security, not security in and of itself. A key component of a comprehensive security strategy. Rob Emsley. Thank you for coming back in the cube. Good to see you again. >>You too, Dave. Thanks. >>All right. In a moment, I'll be back to wrap up a blueprint for trusted infrastructure. You watching the cube.

Published Date : Sep 20 2022

SUMMARY :

Good to see you a new role. something that really has driven us to, you know, to come and have this conversation with you today. but you also have security, you know, direct security capabilities. recovery space, I mean, people have been doing that for years, you know, certainly backup and recovery. And that is really the, you know, What does it mean to you? that if the inevitable happens and you find yourself you say are Dell's strengths in this, you know, cybersecurity space, And the, the isolation part is really so important because you need is from the inside, you know, and think about that. you mentioned immutability is the second piece. you know, but again, that immutability, you know, your mileage varies, And then actually pun intended. And you don't have any concern that what you have in the vault has been compromised. you know, looking at, you know, has the data in any way So we're talking to Rob Emsley about how data protection fits into what Dell calls DT, You know, we have capabilities that, you know, allow us to identify whether or not you know, lead in to Dell in, in this whole cyber resilience space, as good as it gets, you know, the, you know, you've spoken to a number of I like how you know, In a moment, I'll be back to wrap up a blueprint for trusted infrastructure.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Tony BrysonPERSON

0.99+

Rob EmsleyPERSON

0.99+

RobPERSON

0.99+

DellORGANIZATION

0.99+

GartnerORGANIZATION

0.99+

second pieceQUANTITY

0.99+

Bob benderPERSON

0.99+

Dell TechnologiesORGANIZATION

0.99+

todayDATE

0.99+

GilbertLOCATION

0.99+

three decadesQUANTITY

0.98+

fiveQUANTITY

0.98+

firstQUANTITY

0.98+

oneQUANTITY

0.97+

two worldsQUANTITY

0.96+

three eyesQUANTITY

0.96+

OneQUANTITY

0.94+

last half a decadeDATE

0.92+

DeltaORGANIZATION

0.89+

five years agoDATE

0.83+

yearsQUANTITY

0.82+

overDATE

0.72+

six yearsQUANTITY

0.54+

Parasar Kodati, Dell Technologies


 

okay we're back digging into trusted infrastructure with paris are good at he's a senior consultant for product marketing and storage at dell technologies pastor welcome to the cube good to see you great to be with you dave yeah coming from hyderabad awesome so i really appreciate you uh coming on the program let's start with talking about your point of view on what cyber security resilience means to to dell generally but storage specifically yeah so for something like storage you know we are talking about the data layer name and if you look at cyber security it's all about securing your data applications and infrastructure it has been a very mature field at the network and application layers and there are a lot of great technologies right from you know enabling zero trust uh advanced authentications uh identity management systems and so on and and in fact you know with the advent of you know the the use of artificial intelligence and machine learning really these detection tools for cyber securities have really evolved in the network and application spaces so for storage what it means is how can you bring them to the data layer right how can you bring you know the principles of zero trust to the data layer uh how can you leverage artificial intelligence and machine learning to look at you know access patterns and make intelligent decisions about maybe an indicator of a compromise and identify them ahead of time just like you know how it's happening and other of of applications and when it comes to cyber resilience it's it's basically a strategy which assumes that a threat is imminent and it's a good assumption with the severity and the frequency of the attacks that are happening and the question is how do we fortify the infrastructure in this rich infrastructure to withstand those attacks and have a plan a response plan where we can recover the data and make sure the business continuity is not affected so that's uh really cyber security and cyber resiliency at storage layer and of course there are technologies like you know network isolation um immutability and all these principles need to be applied at the storage level as well let me have a follow up on that if i may the intelligence that you talked about that ai and machine learning is that do you do you build that into the infrastructure or is that sort of a separate software module that that points at various you know infrastructure components how does that work both dave right at the data storage level we have come up with various data characteristics depending on the nature of data we developed a lot of signals to see what could be a good indicator of a compromise um and there are also additional applications like cloud iq is the best example which is like an infrastructure wide health monitoring system for dell infrastructure and now we have elevated that to include cyber security as well so these signals are being gathered at cloud iq level and other applications as well so that we can make those decisions about compromise and we can either cascade that intelligence and alert stream upstream for uh security teams um so that they can take actions in platforms like sign systems xtr systems and so on but when it comes to which layer the intelligence is it has to be at every layer where it makes sense where we have the information to make a decision and being closest to the data we have we are basically monitoring you know the various parallels data access who is accessing um are they crossing across any geo fencing is there any mass deletion that is happening or a mass encryption that is happening and we are able to uh detect uh those uh patterns and flag them as indicators of compromise and in allowing automated response manual control and so on for i.t teams yeah thank you for that explanation so at dell technologies world we were there in may it was one of the first you know live shows that that we did in the spring certainly one of the largest and i interviewed shannon champion and my huge takeaway from the storage side was the degree to which you guys uh emphasized security uh within the operating systems i mean really i mean power max more than half i think of the features were security related but also the rest of the portfolio so can you talk about the the security aspects of the dell storage portfolio specifically yeah yeah so when it comes to data security and broadly data availability right in the context of cyber resiliency um dell storage uh this you know these elements have been at the core of our um a core strength for the portfolio and a source of differentiation for the storage portfolio you know with almost decades of collective experience of building highly resilient architectures for mission critical data something like power max system which is the most secure storage platform for high-end enterprises um and now with the increased focus on cyber security we are extending those core technologies of high availability and adding modern detection systems modern data isolation techniques to offer a comprehensive solution to the customer so that they don't have to piece together multiple things to ensure data security or data resiliency but a well-designed and well-architected solution by design is uh delivered to them to ensure cyber protection at the data layer got it um you know we were talking earlier to steve kenniston and pete gear about this notion of dell trusted infrastructure how does storage fit into that as a component of that sort of overall you know theme yeah and you know and let me say this if you could adjust because a lot of people might be skeptical that i can actually have security and at the same time not constrict my organizational agility that's old you know not an or it's an and how do you actually do that if you could address both of those that would be great definitely so for dell trusted infrastructure cyber resiliency is a key component of that and just as i mentioned you know uh air gap isolation it really started with you know power protect cyber recovery you know that was the solution more than three years ago we launched and that was first in the industry which paved way to you know kind of data isolation being a core element of data management and you know for data infrastructure and since then we have implemented these technologies within different storage platforms as well so the customers have the flexibility depending on their data landscape they can approach they can do the right data isolation architecture right either natively from the storage platform or consolidate things into the backup platform and isolate from there and and the other key thing we focus in trusted infrastructure delta dell trusted infrastructure is you know the goal of simplifying security for the customers so one good example here is uh you know risk being able to respond to these cyber threats or indicators of compromise is one thing but an i.t security team may not be looking at the dashboard of the storage systems constantly right storage administration admins may be looking at it so how can we build this intelligence and provide this upstream platforms so that they have a single pane of glass to understand security landscape across applications across networks firewalls as well as storage infrastructure and and compute infrastructure so that's one of the key ways where how we are helping simplify the um kind of the ability to uh respond ability to detect and respond these threads uh in real time for security teams and you mentioned you know about zero trust and how it's a balance of you know not uh kind of restricting users or put heavy burden on you know multi-factor authentication and so on and this really starts with you know what we are doing is provide all the tools you know when it comes to advanced authentication uh supporting external identity management systems multi-factor authentication encryption all these things are intrinsically built into these platforms now the question is the customers are actually one of the key steps is to identify uh what are the most critical parts of their business or what are the applications uh that the most critical business operations depend on and similarly identify uh mission critical data where part of your response plan where it cannot be compromised where you need to have a way to recover once you do this identification then the level of security can be really determined uh by uh by the security teams by the infrastructure teams and you know another you know intelligence that gives a lot of flexibility for for even developers to do this is today we have apis um that so you can not only track these alerts at the data infrastructure level but you can use our apis to take concrete actions like blocking a certain user or increasing the level of authentication based on the threat level that has been perceived at the application layer or at the network layer so there is a lot of flexibility that is built into this by design so that depending on the criticality of the data criticality of the application number of users affected these decisions have to be made from time to time and it's as you mentioned it's it's a balance right and sometimes you know if if an organization had a recent attack you know the level of awareness is very high uh against cyber attacks so for a time you know these these settings may be a bit difficult to deal with but then it's a decision that has to be made by security teams as well got it so you're surfacing what may be hidden kpis that are being buried inside for instance the storage system through apis upstream into a dashboard so that somebody you know dig into the storage tunnel extract that data and then somehow you know populate that dashboard you're saying you're automating that that that workflow that's a great example and you may have others but is that the correct understanding absolutely and it's a two-way integration let's say a detector an attack has been detected at a completely different layer right in the application layer or at a firewall we can respond to those as well so it's a two-way integration we can cascade things up as well as uh respond to threats that have been detected elsewhere uh through the api that's great all right api for power skill is the best example for that uh excellent so thank you appreciate that give us the last word put a bow on this and and bring this segment home please absolutely so a dell uh storage portfolio um using advanced data isolation um with air gap having machine learning based algorithms to detect uh indicators of compromise and having ripple mechanisms um with granular snapshots being able to recover data and restore applications to maintain business continuity is what we deliver to customers uh and these are areas where a lot of innovation is happening a lot of product focus as well as you know if you look at the professional services all the way from engineering to professional services the way we build these systems the very we configure and architect these systems cyber security and protection is a key focus uh for all these activities and dell.com securities is where you can learn a lot about these initiatives that's great thank you you know at the recent uh reinforce uh event in in boston we heard a lot uh from aws about you know detent and response and devops and machine learning and some really cool stuff we heard a little bit about ransomware but i'm glad you brought up air gaps because we heard virtually nothing in the keynotes about air gaps that's an example of where you know this the cso has to pick up from where the cloud leaves off but as i was in front and so number one and number two we didn't hear a ton about how the cloud is making the life of the cso simpler and that's really my takeaway is is in part anyway your job and companies like dell so paris i really appreciate the insights thank you for coming on thecube thank you very much dave it's always great to be in these uh conversations all right keep it right there we'll be right back with rob emsley to talk about data protection strategies and what's in the dell portfolio you're watching the cube [Music] you

Published Date : Sep 20 2022

SUMMARY :

is provide all the tools you know when

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
rob emsleyPERSON

0.99+

hyderabadLOCATION

0.99+

bostonLOCATION

0.99+

Dell TechnologiesORGANIZATION

0.99+

two-wayQUANTITY

0.99+

bothQUANTITY

0.97+

steve kennistonPERSON

0.96+

firstQUANTITY

0.95+

parisORGANIZATION

0.95+

dell.comORGANIZATION

0.94+

oneQUANTITY

0.94+

more than three years agoDATE

0.93+

todayDATE

0.93+

cloud iqTITLE

0.92+

more than halfQUANTITY

0.92+

dell technologiesORGANIZATION

0.91+

davePERSON

0.87+

a lot of peopleQUANTITY

0.84+

one of the key waysQUANTITY

0.83+

single pane of glassQUANTITY

0.79+

zeroQUANTITY

0.77+

pete gearPERSON

0.73+

one thingQUANTITY

0.73+

Parasar KodatiPERSON

0.73+

deltaORGANIZATION

0.71+

dell so parisORGANIZATION

0.68+

zero trustQUANTITY

0.67+

mayDATE

0.66+

uhEVENT

0.62+

shannon championTITLE

0.61+

decadesQUANTITY

0.59+

cloudTITLE

0.59+

almostQUANTITY

0.57+

twoQUANTITY

0.53+

layerQUANTITY

0.5+

stepsQUANTITY

0.49+

dellORGANIZATION

0.43+

springDATE

0.37+

AMD Oracle Partnership Elevates MySQLHeatwave


 

(upbeat music) >> For those of you who've been following the cloud database space, you know that MySQL HeatWave has been on a technology tear over the last 24 months with Oracle claiming record breaking benchmarks relative to other database platforms. So far, those benchmarks remain industry leading as competitors have chosen not to respond, perhaps because they don't feel the need to, or maybe they don't feel that doing so would serve their interest. Regardless, the HeatWave team at Oracle has been very aggressive about its performance claims, making lots of noise, challenging the competition to respond, publishing their scripts to GitHub. But so far, there are no takers, but customers seem to be picking up on these moves by Oracle and it's likely the performance numbers resonate with them. Now, the other area we want to explore, which we haven't thus far, is the engine behind HeatWave and that is AMD. AMD's epic processors have been the powerhouse on OCI, running MySQL HeatWave since day one. And today we're going to explore how these two technology companies are working together to deliver these performance gains and some compelling TCO metrics. In fact, a recent Wikibon analysis from senior analyst Marc Staimer made some TCO comparisons in OLAP workloads relative to AWS, Snowflake, GCP, and Azure databases, you can find that research on wikibon.com. And with that, let me introduce today's guest, Nipun Agarwal senior vice president of MySQL HeatWave and Kumaran Siva, who's the corporate vice president for strategic business development at AMD. Welcome to theCUBE gentlemen. >> Welcome. Thank you. >> Thank you, Dave. >> Hey Nipun, you and I have talked a lot about this. You've been on theCUBE a number of times talking about MySQL HeatWave. But for viewers who may not have seen those episodes maybe you could give us an overview of HeatWave and how it's different from competitive cloud database offerings. >> Sure. So MySQL HeatWave is a fully managed MySQL database service offering from Oracle. It's a single database, which can be used to run transactional processing, analytics and machine learning workloads. So, in the past, MySQL has been designed and optimized for transaction processing. So customers of MySQL when they had to run, analytics machine learning, would need to extract the data out of MySQL, into some other database or service, to run analytics or machine learning. MySQL HeatWave offers a single database for running all kinds of workloads so customers don't need to extract data into some of the database. In addition to having a single database, MySQL HeatWave is also very performant compared to one up databases and also it is very price competitive. So the advantages are; single database, very performant, and very good price performance. >> Yes. And you've published some pretty impressive price performance numbers against competitors. Maybe you could describe those benchmarks and highlight some of the results, please. >> Sure. So one thing to notice that the performance of any database is going to like vary, the performance advantage is going to vary based on, the size of the data and the specific workloads, so the mileage varies, that's the first thing to know. So what we have done is, we have published multiple benchmarks. So we have benchmarks on PPCH or PPCDS and we have benchmarks on different data sizes because based on the customer's workload, the mileage is going to vary, so we want to give customers a broad range of comparisons so that they can decide for themselves. So in a specific case, where we are running on a 30 terabyte PPCH workload, HeatWave is about 18 times better price performance compared to Redshift. 18 times better compared to Redshift, about 33 times better price performance, compared to Snowflake, and 42 times better price performance compared to Google BigQuery. So, this is on 30 Terabyte PPCH. Now, if the data size is different, or the workload is different, the characteristics may vary slightly but this is just to give a flavor of the kind of performance advantage MySQL HeatWave offers. >> And then my last question before we bring in Kumaran. We've talked about the secret sauce being the tight integration between hardware and software, but would you add anything to that? What is that secret sauce in HeatWave that enables you to achieve these performance results and what does it mean for customers? >> So there are three parts to this. One is HeatWave has been designed with a scale out architecture in mind. So we have invented and implemented new algorithms for skill out query processing for analytics. The second aspect is that HeatWave has been really optimized for cloud, commodity cloud, and that's where AMD comes in. So for instance, many of the partitioning schemes we have for processing HeatWave, we optimize them for the L3 cache of the AMD processor. The thing which is very important to our customers is not just the sheer performance but the price performance, and that's where we have had a very good partnership with AMD because not only does AMD help us provide very good performance, but the price performance, right? And that all these numbers which I was showing, big part of it is because we are running on AMD which provides very good price performance. So that's the second aspect. And the third aspect is, MySQL autopilot, which provides machine learning based automation. So it's really these three things, a combination of new algorithms, design for scale out query processing, optimized for commodity cloud hardware, specifically AMD processors, and third, MySQL auto pilot which gives us this performance advantage. >> Great, thank you. So that's a good segue for AMD and Kumaran. So Kumaran, what is AMD bringing to the table? What are the, like, for instance, relevance specs of the chips that are used in Oracle cloud infrastructure and what makes them unique? >> Yeah, thanks Dave. That's a good question. So, OCI is a great customer of ours. They use what we call the top of stack devices meaning that they have the highest core count and they also are very, very fast cores. So these are currently Zen 3 cores. I think the HeatWave product is right now deployed on Zen 2 but will shortly be also on the Zen 3 core as well. But we provide in the case of OCI 64 cores. So that's the largest devices that we build. What actually happens is, because these large number of CPUs in a single package and therefore increasing the density of the node, you end up with this fantastic TCO equation and the cost per performance, the cost per for deployed services like HeatWave actually ends up being extraordinarily competitive and that's a big part of the contribution that we're bringing in here. >> So Zen 3 is the AMD micro architecture which you introduced, I think in 2017, and it's the basis for EPIC, which is sort of the enterprise grade that you really attacked the enterprise with. Maybe you could elaborate a little bit, double click on how your chips contribute specifically to HeatWave's, price performance results. >> Yeah, absolutely. So in the case of HeatWave, so as Nipun alluded to, we have very large L3 caches, right? So in our very, very top end parts just like the Milan X devices, we can go all the way up to like 768 megabytes of L3 cache. And that gives you just enormous performance and performance gains. And that's part of what we're seeing with HeatWave today and that not that they're currently on the second generation ROM based product, 'cause it's a 7,002 based product line running with the 64 cores. But as time goes on, they'll be adopting the next generation Milan as well. And the other part of it too is, as our chip led architecture has evolved, we know, so from the first generation Naples way back in 2017, we went from having multiple memory domains and a sort of NUMA architecture at the time, today we've really optimized that architecture. We use a common I/O Die that has all of the memory channels attached to it. And what that means is that, these scale out applications like HeatWave, are able to really scale very efficiently as they go from a small domain of CPUs to, for example the entire chip, all 64 cores that scaling, is been a key focus for AMD and being able to design and build architectures that can take advantage of that and then have applications like HeatWave that scale so well on it, has been, a key aim of ours. >> And Gen 3 moving up the Italian countryside. Nipun, you've taken the somewhat unusual step of posting the benchmark parameters, making them public on GitHub. Now, HeatWave is relatively new. So people felt that when Oracle gained ownership of MySQL it would let it wilt on the vine in favor of Oracle database, so you lost some ground and now, you're getting very aggressive with HeatWave. What's the reason for publishing those benchmark parameters on GitHub? >> So, the main reason for us to publish price performance numbers for HeatWave is to communicate to our customers a sense of what are the benefits they're going to get when they use HeatWave. But we want to be very transparent because as I said the performance advantages for the customers may vary, based on the data size, based on the specific workloads. So one of the reasons for us to publish, all these scripts on GitHub is for transparency. So we want customers to take a look at the scripts, know what we have done, and be confident that we stand by the numbers which we are publishing, and they're very welcome, to try these numbers themselves. In fact, we have had customers who have downloaded the scripts from GitHub and run them on our service to kind of validate. The second aspect is in some cases, they may be some deviations from what we are publishing versus what the customer would like to run in the production deployments so it provides an easy way, for customers to take the scripts, modify them in some ways which may suit their real world scenario and run to see what the performance advantages are. So that's the main reason, first, is transparency, so the customers can see what we are doing, because of the comparison, and B, if they want to modify it to suit their needs, and then see what is the performance of HeatWave, they're very welcome to do so. >> So have customers done that? Have they taken the benchmarks? And I mean, if I were a competitor, honestly, I wouldn't get into that food fight because of the impressive performance, but unless I had to, I mean, have customers picked up on that, Nipun? >> Absolutely. In fact, we have had many customers who have benchmarked the performance of MySQL HeatWave, with other services. And the fact that the scripts are available, gives them a very good starting point, and then they've also tweaked those queries in some cases, to see what the Delta would be. And in some cases, customers got back to us saying, hey the performance advantage of HeatWave is actually slightly higher than what was published and what is the reason. And the reason was, when the customers were trying, they were trying on the latest version of the service, and our benchmark results were posted let's say, two months back. So the service had improved in those two to three months and customers actually saw better performance. So yes, absolutely. We have seen customers download the scripts, try them and also modify them to some extent and then do the comparison of HeatWave with other services. >> Interesting. Maybe a question for both of you how is the competition responding to this? They haven't said, "Hey, we're going to come up "with our own benchmarks." Which is very common, you oftentimes see that. Although, for instance, Snowflake hasn't responded to data bricks, so that's not their game, but if the customers are actually, putting a lot of faith in the benchmarks and actually using that for buying decisions, then it's inevitable. But how have you seen the competition respond to the MySQL HeatWave and AMD combo? >> So maybe I can take the first track from the database service standpoint. When customers have more choice, it is invariably advantages for the customer because then the competition is going to react, right? So the way we have seen the reaction is that we do believe, that the other database services are going to take a closer eye to the price performance, right? Because if you're offering such good price performance, the vendors are already looking at it. And, you know, instances where they have offered let's say discount to the customers, to kind of at least like close the gap to some extent. And the second thing would be in terms of the capability. So like one of the things which I should have mentioned even early on, is that not only does MySQL HeatWave on AMD, provide very good price performance, say on like a small cluster, but it's all the way up to a cluster size of 64 nodes, which has about 1000 cores. So the point is, that HeatWave performs very well, both on a small system, as well as a huge scale out. And this is again, one of those things which is a differentiation compared to other services so we expect that even other database services will have to improve their offerings to provide the same good scale factor, which customers are now starting to expectancy, with MySQL HeatWave. >> Kumaran, anything you'd add to that? I mean, you guys are an arms dealer, you love all your OEMs, but at the same time, you've got chip competitors, Silicon competitors. How do you see the competitive-- >> I'd say the broader answer and the big picture for AMD, we're very maniacally focused on our customers, right? And OCI and Oracle are huge and important customers for us, and this particular use cases is extremely interesting both in that it takes advantage, very well of our architecture and it pulls out some of the value that AMD bring. I think from a big picture standpoint, our aim is to execute, to build to bring out generations of CPUs, kind of, you know, do what we say and say, sorry, say what we do and do what we say. And from that point of view, we're hitting, the schedules that we say, and being able to bring out the latest technology and bring it in a TCO value proposition that generationally keeps OCI and HeatWave ahead. That's the crux of our partnership here. >> Yeah, the execution's been obvious for the last several years. Kumaran, staying with you, how would you characterize the collaboration between, the AMD engineers and the HeatWave engineering team? How do you guys work together? >> No, I'd say we're in a very, very deep collaboration. So, there's a few aspects where, we've actually been working together very closely on the code and being able to optimize for both the large L3 cache that AMD has, and so to be able to take advantage of that. And then also, to be able to take advantage of the scaling. So going between, you know, our architecture is chip like based, so we have these, the CPU cores on, we call 'em CCDs and the inter CCD communication, there's opportunities to optimize an application level and that's something we've been engaged with. In the broader engagement, we are going back now for multiple generations with OCI, and there's a lot of input that now, kind of resonates in the product line itself. And so we value this very close collaboration with HeatWave and OCI. >> Yeah, and the cadence, Nip, and you and I have talked about this quite a bit. The cadence has been quite rapid. It's like this constant cycle every couple of months I turn around, is something new on HeatWave. But for question again, for both of you, what new things do you think that organizations, customers, are going to be able to do with MySQL HeatWave if you could look out next 12 to 18 months, is there anything you can share at this time about future collaborations? >> Right, look, 12 to 18 months is a long time. There's going to be a lot of innovation, a lot of new capabilities coming out on in MySQL HeatWave. But even based on what we are currently offering, and the trend we are seeing is that customers are bringing, more classes of workloads. So we started off with OLTP for MySQL, then it went to analytics. Then we increased it to mixed workloads, and now we offer like machine learning as alike. So one is we are seeing, more and more classes of workloads come to MySQL HeatWave. And the second is a scale, that kind of data volumes people are using HeatWave for, to process these mixed workloads, analytics machine learning OLTP, that's increasing. Now, along the way we are making it simpler to use, we are making it more cost effective use. So for instance, last time, when we talked about, we had introduced this real time elasticity and that's something which is a very, very popular feature because customers want the ability to be able to scale out, or scale down very efficiently. That's something we provided. We provided support for compression. So all of these capabilities are making it more efficient for customers to run a larger part of their workloads on MySQL HeatWave, and we will continue to make it richer in the next 12 to 18 months. >> Thank you. Kumaran, anything you'd add to that, we'll give you the last word as we got to wrap it. >> No, absolutely. So, you know, next 12 to 18 months we will have our Zen 4 CPUs out. So this could potentially go into the next generation of the OCI infrastructure. This would be with the Genoa and then Bergamo CPUs taking us to 96 and 128 cores with 12 channels at DDR five. This capability, you know, when applied to an application like HeatWave, you can see that it'll open up another order of magnitude potentially of use cases, right? And we're excited to see what customers can do do with that. It certainly will make, kind of the, this service, and the cloud in general, that this cloud migration, I think even more attractive. So we're pretty excited to see how things evolve in this period of time. >> Yeah, the innovations are coming together. Guys, thanks so much, we got to leave it there really appreciate your time. >> Thank you. >> All right, and thank you for watching this special Cube conversation, this is Dave Vellante, and we'll see you next time. (soft calm music)

Published Date : Sep 14 2022

SUMMARY :

and it's likely the performance Thank you. and how it's different from So the advantages are; single and highlight some of the results, please. the first thing to know. We've talked about the secret sauce So for instance, many of the relevance specs of the chips that are used and that's a big part of the contribution and it's the basis for EPIC, So in the case of HeatWave, of posting the benchmark parameters, So one of the reasons for us to publish, So the service had improved how is the competition responding to this? So the way we have seen the but at the same time, and the big picture for AMD, for the last several years. and so to be able to Yeah, and the cadence, and the trend we are seeing is we'll give you the last and the cloud in general, Yeah, the innovations we'll see you next time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Marc StaimerPERSON

0.99+

Dave VellantePERSON

0.99+

NipunPERSON

0.99+

OracleORGANIZATION

0.99+

2017DATE

0.99+

DavePERSON

0.99+

OCIORGANIZATION

0.99+

Zen 3COMMERCIAL_ITEM

0.99+

7,002QUANTITY

0.99+

KumaranPERSON

0.99+

second aspectQUANTITY

0.99+

Nipun AgarwalPERSON

0.99+

AMDORGANIZATION

0.99+

12QUANTITY

0.99+

64 coresQUANTITY

0.99+

768 megabytesQUANTITY

0.99+

twoQUANTITY

0.99+

MySQLTITLE

0.99+

third aspectQUANTITY

0.99+

12 channelsQUANTITY

0.99+

Kumaran SivaPERSON

0.99+

HeatWaveORGANIZATION

0.99+

96QUANTITY

0.99+

18 timesQUANTITY

0.99+

BergamoORGANIZATION

0.99+

three partsQUANTITY

0.99+

DeltaORGANIZATION

0.99+

three monthsQUANTITY

0.99+

MySQL HeatWaveTITLE

0.99+

42 timesQUANTITY

0.99+

bothQUANTITY

0.99+

18 monthsQUANTITY

0.99+

Zen 2COMMERCIAL_ITEM

0.99+

oneQUANTITY

0.99+

GitHubORGANIZATION

0.99+

OneQUANTITY

0.98+

second generationQUANTITY

0.98+

single databaseQUANTITY

0.98+

128 coresQUANTITY

0.98+

18 monthsQUANTITY

0.98+

three thingsQUANTITY

0.98+

Bharath Chari, Confluent & Sam Kassoumeh, SecurityScorecard | AWS Startup Showcase S2 E4


 

>>Hey everyone. Welcome to the cubes presentation of the AWS startup showcase. This is season two, episode four of our ongoing series. That's featuring exciting startups within the AWS ecosystem. This theme, cybersecurity protect and detect against threats. I'm your host. Lisa Martin. I've got two guests here with me. Please. Welcome back to the program. Sam Kam, a COO and co-founder of security scorecard and bar Roth. Charri team lead solutions marketing at confluent guys. It's great to have you on the program talking about cybersecurity. >>Thanks for having us, Lisa, >>Sam, let's go ahead and kick off with you. You've been on the queue before, but give the audience just a little bit of context about security scorecard or SSC as they're gonna hear it referred to. >>Yeah. AB absolutely. Thank you for that. Well, the easiest way to, to put it is when people wanna know about their credit risk, they consult one of the major credit scoring companies. And when companies wanna know about their cybersecurity risk, they turn to security scorecard to get that holistic view of, of, of the security posture. And the way it works is SSC is continuously 24 7 collecting signals from across the entire internet. I entire IPV four space and they're doing it to identify vulnerable and misconfigured digital assets. And we were just looking back over like a three year period. We looked from 2019 to 2022. We, we, we assessed through our techniques over a million and a half organizations and found that over half of them had at least one open critical vulnerability exposed to the internet. What was even more shocking was 20% of those organizations had amassed over a thousand vulnerabilities each. >>So SSC we're in the business of really building solutions for customers. We mine the data from dozens of digital sources and help discover the risks and the flaws that are inherent to their business. And that becomes increasingly important as companies grow and find new sources of risk and new threat vectors that emerge on the internet for themselves and for their vendor and business partner ecosystem. The last thing I'll mention is the platform that we provide. It relies on data collection and processing to be done in an extremely accurate and real time way. That's a key for that's allowed us to scale. And in order to comp, in order for us to accomplish this security scorecard engineering teams, they used a really novel combination of confluent cloud and confluent platform to build a really, really robust data for streaming pipelines and the data streaming pipelines enabled by confluent allow us at security scorecard to collect the data from a lot of various sources for risk analysis. Then they get feer further analyzed and provided to customers as a easy to understand summary of analytics. >>Rob, let's bring you into the conversation, talk about confluent, give the audience that overview and then talk about what you're doing together with SSC. >>Yeah, and I wanted to say Sam did a great job of setting up the context about what confluent is. So, so appreciate that, but a really simple way to think about it. Lisa is confident as a data streaming platform that is pioneering a fundamentally new category of data infrastructure that is at the core of what SSE does. Like Sam said, the key is really collect data accurately at scale and in real time. And that's where our cloud native offering really empowers organizations like SSE to build great customer experiences for their customers. And the other thing we do is we also help organizations build a sophisticated real time backend operations. And so at a high level, that's the best way to think about comfort. >>Got it. But I'll talk about data streaming, how it's being used in cyber security and what the data streaming pipelines enable enabled by confluent allow SSE to do for its customers. >>Yeah, I think Sam can definitely share his thoughts on this, but one of the things I know we are all sort of experiencing is the, is the rise of cyber threats, whether it's online from a business B2B perspective or as consumers just be our data and, and the data that they're generating and the companies that have access to it. So as the, the need to protect the data really grows companies and organizations really need to effectively detect, respond and protect their environments. And the best way to do this is through three ways, scale, speed, and cost. And so going back to the points I brought up earlier with conference, you can really gain real time data ingestion and enable those analytics that Sam talked about previously while optimizing for cost scale. So those are so doing all of this at the same time, as you can imagine, is, is not easy and that's where we Excel. >>And so the entire premise of data streaming is built on the concepts. That data is not static, but constantly moving across your organization. And that's why we call it data streams. And so at its core, we we've sort of built or leveraged that open source foundation of APA sheet Kafka, but we have rearchitected it for the cloud with a totally new cloud native experience. And ultimately for customers like SSE, we have taken a away the need to manage a lot of those operational tasks when it comes to Apache Kafka. The other thing we've done is we've added a ton of proprietary IP, including security features like role based access control. I mean, some prognosis talking about, and that really allows you to securely connect to any data no matter where it resides at scale at speed. And it, >>Can you talk about bar sticking with you, but some of the improvements, and maybe this is a actually question for Sam, some of the improvements that have been achieved on the SSC side as a result of the confluent partnership, things are much faster and you're able to do much more understand, >>Can I, can Sam take it away? I can maybe kick us off and then breath feel, feel free to chime in Lisa. The, the, the, the problem that we're talking about has been for us, it was a longstanding challenge. We're about a nine year old company. We're a high growth startup and data collection has always been in, in our DNA. It's at it's at the core of what we do and getting, getting the insights, the, and analytics that we synthesize from that data into customer's hands as quickly as possible is the, is the name of the game because they're trying to make decisions and we're empowering them to make those decisions faster. We always had challenges in, in the arena because we, well partners like confluent didn't didn't exist when we started scorecard when, when we we're a customer. But we, we, we think of it as a partnership when we found confluent technology and you can hear it from Barth's description. >>Like we, we shared a common vision and they understood some of the pain points that we were experiencing on a very like visceral and intimate level. And for us, that was really exciting, right? Just to have partners that are there saying, we understand your problem. This is exactly the problem that we're solving. We're, we're here to help what the technology has done for us since then is it's not only allowed us to process the data faster and get the analytics to the customer, but it's also allowed us to create more value for customers, which, which I'll talk about in a bit, including new products and new modules that we didn't have the capabilities to deliver before. >>And we'll talk about those new products in a second exciting stuff coming out there from SSC, bro. Talk about the partnership from, from confluence perspective, how has it enabled confluence to actually probably enhance its technology as a result of seeing and learning what SSC is able to do with the technology? >>Yeah, first of all, I, I completely agree with Sam it's, it's more of a partnership because like Sam said, we sort of shared the same vision and that is to really make sure that organizations have access to the data. Like I said earlier, no matter where it resides so that you can scan and identify the, the potential security security threads. I think from, from our perspective, what's really helped us from the perspective of partnering with SSE is just looking at the data volumes that they're working with. So I know a stat that we talked about recently was around scanning billions of records, thousands of ports on a daily basis. And so that's where, like I, like I mentioned earlier, our technology really excels because you can really ingest and amplify the volumes of data that you're processing so that you can scan and, and detect those threats in real time. >>Because I mean, especially the amount of volume, the data volume that's increasing on a year by basis, that aspect in order to be able to respond quickly, that is paramount. And so what's really helped us is just seeing what SSE is doing in terms of scanning the, the web ports or the data systems that are at are at potential risk. Being able to support their use cases, whether it's data sharing between their different teams internally are being able to empower customers, to be able to detect and scan their data systems. And so the learning for us is really seeing how those millions and billions of records get processed. >>Got it sounds like a really synergistic partnership that you guys have had there for the last year or so, Sam, let's go back over to you. You mentioned some new products. I see SSC just released a tax surface intelligence product. That's detecting thousands of vulnerabilities per minute. Talk to us about that, the importance of that, and another release that you're making. >>There are some really exciting products that we have released recently and are releasing at security scorecard. When we think about, when we think about ratings and risk, we think about it not just for our companies or our third parties, but we think about it in a, in a broader sense of an, of an ecosystem, because it's important to have data on third parties, but we also want to have the data on their third parties as well. No, nobody's operating in a vacuum. Everybody's operating in this hyper connected ecosystem and the risk can live not just in the third parties, but they might be storing processing data in a myriad of other technological solutions, which we want to understand, but it's really hard to get that visibility because today the way it's done is companies ask their third parties. Hey, send me a list of your third parties, where my data is stored. >>It's very manual, it's very labor intensive, and it's a trust based exercise that makes it really difficult to validate. What we've done is we've developed a technology called a V D automatic vendor detection. And what a V D does is it goes out and for any company, your own company or another business partner that you work with, it will go detect all of the third party connections that we see that have a live network connection or data connection to an organization. So that's like an awareness and discovery tool because now we can see and pull the veil back and see what the bigger ecosystem and connectivity looks like. Thus allowing the customers to go hold accountable, not just the third parties, but their fourth parties, fifth parties really end parties. And they, and they can only do that by using scorecard. The attack surface intelligence tool is really exciting for us because well, be before security scorecard people thought what we were doing was fairly, I impossible. >>It was really hard to get instant visibility on any company and any business partner. And at the same time, it was of critical importance to have that instant visibility into the risk because companies are trying to make faster decisions and they need the risk data to steer those decisions. So when I think about, when I think about that problem in, in managing sort of this evolving landscape, what it requires is it requires insightful and actionable, real time security data. And that relies on a couple things, talent and tech on the talent side, it starts with people. We have an amazing R and D team. We invest heavily. It's the heartbeat of what we do. That team really excels in areas of data collection analysis and scaling large data sets. And then we know on the tech side, well, we figured out some breakthrough techniques and it also requires partners like confluent to help with the real time streaming. >>What we realized was those capabilities are very desired in the market. And we created a new product from it called the tech surface intelligence. A tech surface intelligence focuses less on the rating. There's, there's a persona on users that really value the rating. It's easy to understand. It's a bridge language between technical and non-technical stakeholders. That's on one end of the spectrum on the other end of the spectrum. There's customers and users, very technical customers and users that may not have as much interest in a layman's rating, but really want a deep dive into the strong threat Intel data and capabilities and insights that we're producing. So we produced ASI, which stands for attack surface intelligence that allows customers to look at the surface area of attack all of the digital assets for any organization and see all of the threats, vulnerabilities, bad actors, including sometimes discoveries of zero day vulnerabilities that are, that are out in the wild and being exploited by bad guys. So we have a really strong pulse on what's happening on the internet, good and bad. And we created that product to help service a market that was interested in, in going deep into the data. >>So it's >>So critical. Go >>Ahead to jump in there real quick, because I think the points that Sam brought up, we had a great, great discussion recently while we were building on the case study that I think brings this to life, going back to the AVD product that Sam talked about and, and Sam can probably do a better job of walking through the story, but the way I understand it, one of security scorecards customers approached them and told them that they had an issue to resolve and what they ended up. So this customer was using an AVD product at the time. And so they said that, Hey, the car SSE, they said, Hey, your product shows that we used, you were using HubSpot, but we stopped using that age server. And so I think when SSE investigated, they did find a very recent HubSpot ping being used by the marketing team in this instance. And as someone who comes from that marketing background, I can raise my hand and said, I've been there, done that. So, so yeah, I mean, Sam can probably share his thoughts on this, but that's, I think the great story that sort of brings this all to life in terms of how actually customers go about using SSCs products. >>And Sam, go ahead on that. It sounds like, and one of the things I'm hearing that is a benefit is reduction in shadow. It, I'm sure that happens so frequently with your customers about Mar like a great example that you gave of, of the, the it folks saying we don't use HubSpot, have it in years marketing initiates an instance. Talk about that as some of the benefits in it for customers reducing shadow it, there's gotta be many more benefits from a security perspective. >>Yeah, the, there's a, there's a big challenge today because the market moved to the cloud and that makes it really easy for anybody in an organization to go sign, sign up, put in a credit card, or get a free trial to, to any product. And that product can very easily connect into the corporate system and access the data. And because of the nature of how cloud products work and how easy they are to sign up a byproduct of that is they sort of circumvent a traditional risk assessment process that, that organizations go through and organizations invest a, a lot of money, right? So there's a lot of time and money and energy that are invested in having good procurement risk management life cycles, and making sure that contracts are buttoned up. So on one side you have companies investing loads of energy. And then on the other side, any employee can circumvent that process by just going and with a few clicks, signing up and purchasing a product. >>And that's, and, and, and then that causes a, a disparity and Delta between what the technology and security team's understanding is of the landscape and, and what reality is. And we're trying to close that gap, right? We wanna close and reduce any windows of time or opportunity where a hacker can go discover some misconfigured cloud asset that somebody signed up for and maybe forgot to turn off. I mean, it's a lot of it is just human error and it, and it happens the example that Barra gave, and this is why understanding the third parties are so important. A customer contacted us and said, Hey, you're a V D detection product has an error. It's showing we're using a product. I think it was HubSpot, but we stopped using that. Right. And we don't understand why you're still showing it. It has to be a false positive. >>So we investigated and found that there was a very recent live HubSpot connection, ping being made. Sure enough. When we went back to the customer said, we're very confident the data's accurate. They looked into it. They found that the marketing team had started experimenting with another instance of HubSpot on the side. They were putting in real customer data in that instance. And it, it, you know, it triggered a security assessment. So we, we see all sorts of permutations of it, large multinational companies spin up a satellite office and a contractor setting up the network equipment. They misconfigure it. And inadvertently leave an administrator portal to the Cisco router exposed on the public internet. And they forget to turn off the administrative default credentials. So if a hacker stumbles on that, they can ha they have direct access to the network. We're trying to catch those things and surface them to the client before the hackers find it. >>So we're giving 'em this, this hacker's eye view. And without the continuous data analysis, without the stream processing, the customer wouldn't have known about those risks. But if you can automatically know about the risks as they happen, what that does is that prevents a million shoulder taps because the customer doesn't have to go tap on the marketing team's shoulder and go tap on employees and manually interview them. They have the data already, and that can be for their company. That can be for any company they're doing business with where they're storing and processing data. That's a huge time savings and a huge risk reduction, >>Huge risk reduction. Like you're taking blinders off that they didn't even know were there. And I can imagine Sam tune in the last couple of years, as SAS skyrocketed the use of collaboration tools, just to keep the lights on for organizations to be able to communicate. There's probably a lot of opportunity in your customer base and perspective customer base to engage with you and get that really full 360 degree view of their entire organization. Third parties, fourth parties, et cetera. >>Absolutely. Absolutely. CU customers are more engaged than they've ever been because that challenge of the market moving to the cloud, it hasn't stopped. We've been talking about it for a long time, but there's still a lot of big organizations that are starting to dip their toe in the pool and starting to cut over from what was traditionally an in-house data center in the basement of the headquarters. They're, they're moving over to the cloud. And then on, on top of that cloud providers like Azure, AWS, especially make it so easy for any company to go sign up, get access, build a product, and launch that product to the market. We see more and more organizations sitting on AWS, launching products and software. The, the barrier to entry is very, very low. And the value in those products is very, very high. So that's drawing the attention of organizations to go sign up and engage. >>The challenge then becomes, we don't know who has control over this data, right? We don't have know who has control and visibility of our data. We're, we're bringing that to surface and for vendors themselves like, especially companies that sit in AWS, what we see them doing. And I think Lisa, this is what you're alluding to. When companies engage in their own scorecard, there's a bit of a social aspect to it. When they look good in our platform, other companies are following them, right? So now all of the sudden they can make one motion to go look good, make their scorecard buttoned up. And everybody who's looking at them now sees that they're doing the right things. We actually have a lot of vendors who are customers, they're winning more competitive bakeoffs and deals because they're proving to their clients faster that they can trust them to store the data. >>So it's a bit of, you know, we're in a, two-sided kind of market. You have folks that are assessing other folks. That's fun to look at others and see how they're doing and hold them accountable. But if you're on the receiving end, that can be stressful. So what we've done is we've taken the, that situation and we've turned it into a really positive and productive environment where companies, whether they're looking at someone else or they're looking at themselves to prove to their clients, to prove to the board, it turns into a very productive experience for them >>One. Oh >>Yeah. That validation. Go ahead, bro. >>Really. I was gonna ask Sam his thoughts on one particular aspect. So in terms of the industry, Sam, that you're seeing sort of really moving to the cloud and like this need for secure data, making sure that the data can be trusted. Are there specific like verticals that are doing that better than the others? Or do you see that across the board? >>I think some industries have it easier and some industries have it harder, definitely in industries that are, I think, health, healthcare, financial services, a absolutely. We see heavier activity there on, on both sides, right? They they're, they're certainly becoming more and more proactive in their investments, but the attacks are not stopping against those, especially healthcare because the data is so valuable and historically healthcare was under, was an underinvested space, right. Hospitals. And we're always strapped for it folks. Now, now they're starting to wake up and pay very close attention and make heavier investments. >>That's pretty interesting. >>Tremendous opportunity there guys. I'm sorry. We are out of time, but this is such an interesting conversation. You see, we keep going, wanna ask you both where can, can prospective interested customers go to learn more on the SSC side, on the confluence side, through the AWS marketplace? >>I let some go first. >>Sure. Oh, thank thank, thank you. Thank you for on the security scorecard side. Well look, security scorecard is with the help of Colu is, has made it possible to instantly rate the security posture of any company in the world. We have 12 million organizations rated today and, and that, and that's going up every day. We invite any company in the world to try security scorecard for free and experience how, how easy it is to get your rating and see the security rating of, of any company and any, any company can claim their score. There's no, there's no charge. They can go to security, scorecard.com and we have a special, actually a special URL security scorecard.com/free-account/aws marketplace. And even better if someone's already on AWS, you know, you can view our security posture with the AWS marketplace, vendor insights, plugin to quickly and securely procure your products. >>Awesome. Guys, this has been fantastic information. I'm sorry, bro. Did you wanna add one more thing? Yeah. >>I just wanted to give quick call out leads. So anyone who wants to learn more about data streaming can go to www confluent IO. There's also an upcoming event, which has a separate URL. That's coming up in October where you can learn all about data streaming and that URL is current event.io. So those are the two URLs I just wanted to quickly call out. >>Awesome guys. Thanks again so much for partnering with the cube on season two, episode four of our AWS startup showcase. We appreciate your insights and your time. And for those of you watching, thank you so much. Keep it right here for more action on the, for my guests. I am Lisa Martin. We'll see you next time.

Published Date : Sep 7 2022

SUMMARY :

It's great to have you on the program talking about cybersecurity. You've been on the queue before, but give the audience just a little bit of context about And the way it works the flaws that are inherent to their business. Rob, let's bring you into the conversation, talk about confluent, give the audience that overview and then talk about what a fundamentally new category of data infrastructure that is at the core of what what the data streaming pipelines enable enabled by confluent allow SSE to do for And so going back to the points I brought up earlier with conference, And so the entire premise of data streaming is built on the concepts. It's at it's at the core of what we do and getting, Just to have partners that are there saying, we understand your problem. Talk about the partnership from, from confluence perspective, how has it enabled confluence to So I know a stat that we talked about And so the learning for us is really seeing how those millions and billions Talk to us about that, the importance of that, and another release that you're making. and the risk can live not just in the third parties, Thus allowing the customers to go hold accountable, not just the third parties, And at the same time, it was of critical importance to have that instant visibility into the risk because And we created a new product from it called the tech surface intelligence. So critical. to resolve and what they ended up. Talk about that as some of the benefits in it for customers reducing shadow it, And because of the nature I mean, it's a lot of it is just human error and it, and it happens the example that Barra gave, And they forget to turn off the administrative default credentials. a million shoulder taps because the customer doesn't have to go tap on the marketing team's shoulder and go tap just to keep the lights on for organizations to be able to communicate. because that challenge of the market moving to the cloud, it hasn't stopped. So now all of the sudden they can make one motion to go look to prove to the board, it turns into a very productive experience for them Go ahead, bro. need for secure data, making sure that the data can be trusted. Now, now they're starting to wake up and pay very close attention and make heavier investments. learn more on the SSC side, on the confluence side, through the AWS marketplace? They can go to security, scorecard.com and we have a special, Did you wanna add one more thing? can go to www confluent IO. And for those of you watching,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SamPERSON

0.99+

Lisa MartinPERSON

0.99+

Sam KamPERSON

0.99+

LisaPERSON

0.99+

Sam KassoumehPERSON

0.99+

OctoberDATE

0.99+

20%QUANTITY

0.99+

2019DATE

0.99+

SSEORGANIZATION

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

two guestsQUANTITY

0.99+

SSCORGANIZATION

0.99+

360 degreeQUANTITY

0.99+

RobPERSON

0.99+

HubSpotORGANIZATION

0.99+

ExcelTITLE

0.99+

CiscoORGANIZATION

0.99+

DeltaORGANIZATION

0.99+

2022DATE

0.99+

last yearDATE

0.99+

fifth partiesQUANTITY

0.99+

Bharath ChariPERSON

0.99+

both sidesQUANTITY

0.99+

SASORGANIZATION

0.99+

thousandsQUANTITY

0.98+

over a million and a half organizationsQUANTITY

0.98+

three yearQUANTITY

0.98+

APATITLE

0.98+

todayDATE

0.98+

billions of recordsQUANTITY

0.98+

thousands of portsQUANTITY

0.97+

secondQUANTITY

0.97+

oneQUANTITY

0.97+

bothQUANTITY

0.97+

ColuORGANIZATION

0.97+

fourth partiesQUANTITY

0.96+

two URLsQUANTITY

0.96+

over a thousand vulnerabilitiesQUANTITY

0.96+

www confluent IOOTHER

0.95+

zero dayQUANTITY

0.95+

BarthPERSON

0.95+

IntelORGANIZATION

0.93+

scorecard.comOTHER

0.93+

one more thingQUANTITY

0.91+

SSETITLE

0.89+

firstQUANTITY

0.89+

BarraORGANIZATION

0.88+

24 7QUANTITY

0.87+

12 million organizationsQUANTITY

0.85+

Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst


 

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jess BorgmanPERSON

0.99+

RichardPERSON

0.99+

20 centsQUANTITY

0.99+

sixQUANTITY

0.99+

JustinPERSON

0.99+

Richard JarvisPERSON

0.99+

OracleORGANIZATION

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

JessPERSON

0.99+

pythonsTITLE

0.99+

seven yearsQUANTITY

0.99+

TodayDATE

0.99+

JavasTITLE

0.99+

TeradataORGANIZATION

0.99+

VMwareORGANIZATION

0.98+

millionsQUANTITY

0.98+

EVAsORGANIZATION

0.98+

JAKPERSON

0.98+

StarburstORGANIZATION

0.98+

bothQUANTITY

0.97+

10DATE

0.97+

12 years agoDATE

0.97+

StarbustTITLE

0.96+

todayDATE

0.95+

Apache icebergORGANIZATION

0.94+

GoogleORGANIZATION

0.93+

12 yearsQUANTITY

0.92+

single pointQUANTITY

0.92+

two worldsQUANTITY

0.92+

10QUANTITY

0.91+

HuduLOCATION

0.91+

UnixTITLE

0.9+

one thingQUANTITY

0.87+

trillions of recordsQUANTITY

0.83+

first data lakeQUANTITY

0.82+

StarburstTITLE

0.8+

PJIORGANIZATION

0.79+

years agoDATE

0.76+

IETITLE

0.75+

Lie 2TITLE

0.72+

many years agoDATE

0.72+

over a couple timesQUANTITY

0.7+

TCOORGANIZATION

0.7+

ParqueORGANIZATION

0.67+

Number twoQUANTITY

0.64+

KubernetesORGANIZATION

0.59+

a decadeQUANTITY

0.58+

plus yearsDATE

0.57+

AzureTITLE

0.57+

S3TITLE

0.55+

DeltaTITLE

0.54+

20QUANTITY

0.49+

lastDATE

0.48+

MohanPERSON

0.44+

ORCORGANIZATION

0.27+

Starburst The Data Lies FULL V2b


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Starburst The Data Lies FULL V1


 

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 20 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Dave LantaPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

TheresaPERSON

0.99+

Justin BorgmanPERSON

0.99+

TeresaPERSON

0.99+

Jeff OckerPERSON

0.99+

Richard JarvisPERSON

0.99+

Dave ValantePERSON

0.99+

Justin BoardmanPERSON

0.99+

sixQUANTITY

0.99+

DaniPERSON

0.99+

MassachusettsLOCATION

0.99+

20 centsQUANTITY

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JammaPERSON

0.99+

UKLOCATION

0.99+

FINRAORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

Kurt MonashPERSON

0.99+

20%QUANTITY

0.99+

twoQUANTITY

0.99+

fiveQUANTITY

0.99+

JessPERSON

0.99+

2011DATE

0.99+

StarburstORGANIZATION

0.99+

10QUANTITY

0.99+

AccentureORGANIZATION

0.99+

seven yearsQUANTITY

0.99+

thousandsQUANTITY

0.99+

pythonsTITLE

0.99+

BostonLOCATION

0.99+

GDPRTITLE

0.99+

TodayDATE

0.99+

two modelsQUANTITY

0.99+

Zolando ComcastORGANIZATION

0.99+

GemmaPERSON

0.99+

StarbustORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JavasTITLE

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

millionsQUANTITY

0.99+

first lieQUANTITY

0.99+

10DATE

0.99+

12 yearsQUANTITY

0.99+

one placeQUANTITY

0.99+

TomorrowDATE

0.99+

Breaking Analysis Further defining Supercloud W/ tech leaders VMware, Snowflake, Databricks & others


 

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante at our inaugural super cloud 22 event we further refined the concept of a super cloud iterating on the definition the salient attributes and some examples of what is and what is not a super cloud welcome to this week's wikibon cube insights powered by etr you know snowflake has always been what we feel is one of the strongest examples of a super cloud and in this breaking analysis from our studios in palo alto we unpack our interview with benoit de javille co-founder and president of products at snowflake and we test our super cloud definition on the company's data cloud platform and we're really looking forward to your feedback first let's examine how we defl find super cloudant very importantly one of the goals of super cloud 22 was to get the community's input on the definition and iterate on previous work super cloud is an emerging computing architecture that comprises a set of services which are abstracted from the underlying primitives of hyperscale clouds we're talking about services such as compute storage networking security and other native tooling like machine learning and developer tools to create a global system that spans more than one cloud super cloud as shown on this slide has five essential properties x number of deployment models and y number of service models we're looking for community input on x and y and on the first point as well so please weigh in and contribute now we've identified these five essential elements of a super cloud let's talk about these first the super cloud has to run its services on more than one cloud leveraging the cloud native tools offered by each of the cloud providers the builder of the super cloud platform is responsible for optimizing the underlying primitives of each cloud and optimizing for the specific needs be it cost or performance or latency or governance data sharing security etc but those primitives must be abstracted such that a common experience is delivered across the clouds for both users and developers the super cloud has a metadata intelligence layer that can maximize efficiency for the specific purpose of the super cloud i.e the purpose that the super cloud is intended for and it does so in a federated model and it includes what we call a super pass this is a prerequisite that is a purpose-built component and enables ecosystem partners to customize and monetize incremental services while at the same time ensuring that the common experiences exist across clouds now in terms of deployment models we'd really like to get more feedback on this piece but here's where we are so far based on the feedback we got at super cloud 22. we see three deployment models the first is one where a control plane may run on one cloud but supports data plane interactions with more than one other cloud the second model instantiates the super cloud services on each individual cloud and within regions and can support interactions across more than one cloud with a unified interface connecting those instantiations those instances to create a common experience and the third model superimposes its services as a layer or in the case of snowflake they call it a mesh on top of the cloud on top of the cloud providers region or regions with a single global instantiation a single global instantiation of those services which spans multiple cloud providers this is our understanding from a comfort the conversation with benoit dejaville as to how snowflake approaches its solutions and for now we're going to park the service models we need to more time to flesh that out and we'll propose something shortly for you to comment on now we peppered benoit dejaville at super cloud 22 to test how the snowflake data cloud aligns to our concepts and our definition let me also say that snowflake doesn't use the term data cloud they really want to respect and they want to denigrate the importance of their hyperscale partners nor do we but we do think the hyperscalers today anyway are building or not building what we call super clouds but they are but but people who bar are building super clouds are building on top of hyperscale clouds that is a prerequisite so here are the questions that we tested with snowflake first question how does snowflake architect its data cloud and what is its deployment model listen to deja ville talk about how snowflake has architected a single system play the clip there are several ways to do this you know uh super cloud as as you name them the way we we we picked is is to create you know one single system and that's very important right the the the um [Music] there are several ways right you can instantiate you know your solution uh in every region of a cloud and and you know potentially that region could be a ws that region could be gcp so you are indeed a multi-cloud solution but snowflake we did it differently we are really creating cloud regions which are superposed on top of the cloud provider you know region infrastructure region so we are building our regions but but where where it's very different is that each region of snowflake is not one in instantiation of our service our service is global by nature we can move data from one region to the other when you land in snowflake you land into one region but but you can grow from there and you can you know exist in multiple clouds at the same time and that's very important right it's not one single i mean different instantiation of a system is one single instantiation which covers many cloud regions and many cloud providers snowflake chose the most advanced level of our three deployment models dodgeville talked about too presumably so it could maintain maximum control and ensure that common experience like the iphone model next we probed about the technical enablers of the data cloud listen to deja ville talk about snow grid he uses the term mesh and then this can get confusing with the jamaicani's data mesh concept but listen to benoit's explanation well as i said you know first we start by building you know snowflake regions we have today furry region that spawn you know the world so it's a worldwide worldwide system with many regions but all these regions are connected together they are you know meshed together with our technology we name it snow grid and that makes it hard because you know regions you know azure region can talk to a ws region or gcp regions and and as a as a user of our cloud you you don't see really these regional differences that you know regions are in different you know potentially clown when you use snowflake you can exist your your presence as an organization can be in several regions several clouds if you want geographic and and and both geographic and cloud provider so i can share data irrespective of the the cloud and i'm in the snowflake data cloud is that correct i can do that today exactly and and that's very critical right what we wanted is to remove data silos and and when you instantiate a system in one single region and that system is locked in that region you cannot communicate with other parts of the world you are locking the data in one region right and we didn't want to do that we wanted you know data to be distributed the way customer wants it to be distributed across the world and potentially sharing data at world scale now maybe there are many ways to skin the other cat meaning perhaps if a platform does instantiate in multiple places there are ways to share data but this is how snowflake chose to approach the problem next question how do you deal with latency in this big global system this is really important to us because while snowflake has some really smart people working as engineers and and the like we don't think they've solved for the speed of light problem the best people working on it as we often joke listen to benoit deja ville's comments on this topic so yes and no the the way we do it it's very expensive to do that because generally if you want to join you know data which is in which are in different regions and different cloud it's going to be very expensive because you need to move you know data every time you join it so the way we do it is that you replicate the subset of data that you want to access from one region from other regions so you can create this data mesh but data is replicated to make it very cheap and very performant too and is the snow grid does that have the metadata intelligence yes to actually can you describe that a little bit yeah snow grid is both uh a way to to exchange you know metadata about so each region of snowflake knows about all the other regions of snowflake every time we create a new region diary you know the metadata is distributed over our data cloud not only you know region knows all the regions but knows you know every organization that exists in our clouds where this organization is where data can be replicated by this organization and then of course it's it's also used as a way to uh uh exchange data right so you can exchange you know beta by scale of data size and we just had i was just receiving an email from one of our customers who moved more than four petabytes of data cross-region cross you know cloud providers in you know few days and you know it's a lot of data so it takes you know some time to move but they were able to do that online completely online and and switch over you know to the diff to the other region which is failover is very important also so yes and no probably means typically no he says yes and no probably means no so it sounds like snowflake is selectively pulling small amounts of data and replicating it where necessary but you also heard him talk about the metadata layer which is one of the essential aspects of super cloud okay next we dug into security it's one of the most important issues and we think one of the hardest parts related to deploying super cloud so we've talked about how the cloud has become the first line of defense for the cso but now with multi-cloud you have multiple first lines of defense and that means multiple shared responsibility models and multiple tool sets from different cloud providers and an expanded threat surface so listen to benoit's explanation here please play the clip this is a great question uh security has always been the most important aspect of snowflake since day one right this is the question that every customer of ours has you know how you can you guarantee the security of my data and so we secure data really tightly in region we have several layers of security it starts by by encrypting it every data at rest and that's very important a lot of customers are not doing that right you hear these attacks for example on on cloud you know where someone left you know their buckets uh uh open and then you know you can access the data because it's a non-encrypted uh so we are encrypting everything at rest we are encrypting everything in transit so a region is very secure now you know you never from one region you never access data from another region in snowflake that's why also we replicate data now the replication of that data across region or the metadata for that matter is is really highly secure so snow grits ensure that everything is encrypted everything is you know we have multiple you know encryption keys and it's you know stored in hardware you know secure modules so we we we built you know snow grids such that it's secure and it allows very secure movement of data so when we heard this explanation we immediately went to the lowest common denominator question meaning when you think about how aws for instance deals with data in motion or data and rest it might be different from how another cloud provider deals with it so how does aws uh uh uh differences for example in the aws maturity model for various you know cloud capabilities you know let's say they've got a faster nitro or graviton does it do do you have to how does snowflake deal with that do they have to slow everything else down like imagine a caravan cruising you know across the desert so you know every truck can keep up let's listen it's a great question i mean of course our software is abstracting you know all the cloud providers you know infrastructure so that when you run in one region let's say aws or azure it doesn't make any difference as far as the applications are concerned and and this abstraction of course is a lot of work i mean really really a lot of work because it needs to be secure it needs to be performance and you know every cloud and it has you know to expose apis which are uniform and and you know cloud providers even though they have potentially the same concept let's say blob storage apis are completely different the way you know these systems are secure it's completely different the errors that you can get and and the retry you know mechanism is very different from one cloud to the other performance is also different we discovered that when we were starting to port our software and and and you know we had to completely rethink how to leverage blob storage in that cloud versus that cloud because just of performance too so we had you know for example to you know stripe data so all this work is work that's you know you don't need as an application because our vision really is that applications which are running in our data cloud can you know be abstracted of all this difference and and we provide all the services all the workload that this application need whether it's transactional access to data analytical access to data you know managing you know logs managing you know metrics all of these is abstracted too such that they are not you know tied to one you know particular service of one cloud and and distributing this application across you know many regions many cloud is very seamless so from that answer we know that snowflake takes care of everything but we really don't understand the performance implications in you know in that specific case but we feel pretty certain that the promises that snowflake makes around governance and security within their data sharing construct construct will be kept now another criterion that we've proposed for super cloud is a super pass layer to create a common developer experience and an enabler for ecosystem partners to monetize please play the clip let's listen we build it you know a custom build because because as you said you know what exists in one cloud might not exist in another cloud provider right so so we have to build you know on this all these this components that modern application mode and that application need and and and and that you know goes to machine learning as i say transactional uh analytical system and the entire thing so such that they can run in isolation basically and the objective is the developer experience will be identical across those clouds yes right the developers doesn't need to worry about cloud provider and actually our system we have we didn't talk about it but the marketplace that we have which allows actually to deliver we're getting there yeah okay now we're not going to go deep into ecosystem today we've talked about snowflakes strengths in this regard but snowflake they pretty much ticked all the boxes on our super cloud attributes and definition we asked benoit dejaville to confirm that this is all shipping and available today and he also gave us a glimpse of the future play the clip and we are still developing it you know the transactional you know unistore as we call it was announced in last summit so so they are still you know working properly but but but that's the vision right and and and that's important because we talk about the infrastructure right you mentioned a lot about storage and compute but it's not only that right when you think about application they need to use the transactional database they need to use an analytical system they need to use you know machine learning so you need to provide also all these services which are consistent across all the cloud providers so you can hear deja ville talking about expanding beyond taking advantage of the core infrastructure storage and networking et cetera and bringing intelligence to the data through machine learning and ai so of course there's more to come and there better be at this company's valuation despite the recent sharp pullback in a tightening fed environment okay so i know it's cliche but everyone's comparing snowflakes and data bricks databricks has been pretty vocal about its open source posture compared to snowflakes and it just so happens that we had aligotsy on at super cloud 22 as well he wasn't in studio he had to do remote because i guess he's presenting at an investor conference this week so we had to bring him in remotely now i didn't get to do this interview john furrier did but i listened to it and captured this clip about how data bricks sees super cloud and the importance of open source take a listen to goatzee yeah i mean let me start by saying we just we're big fans of open source we think that open source is a force in software that's going to continue for you know decades hundreds of years and it's going to slowly replace all proprietary code in its way we saw that you know it could do that with the most advanced technology windows you know proprietary operating system very complicated got replaced with linux so open source can pretty much do anything and what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse you know data lake machine learning real-time stack in open source and we're excited to be part of it for us delta lake is a very important project that really helps you standardize how you lay out your data in the cloud and with it comes a really important protocol called delta sharing that enables you in an open way actually for the first time ever share large data sets between organizations but it uses an open protocol so the great thing about that is you don't need to be a database customer you don't even like databricks you just need to use this open source project and you can now securely share data sets between organizations across clouds and it actually does so really efficiently just one copy of the data so you don't have to copy it if you're within the same cloud so the implication of ellie gotzi's comments is that databricks with delta sharing as john implied is playing a long game now i don't know if enough about the databricks architecture to comment in detail i got to do more research there so i reached out to my two analyst friends tony bear and sanji mohan to see what they thought because they cover these companies pretty closely here's what tony bear said quote i've viewed the divergent lake house strategies of data bricks and snowflake in the context of their roots prior to delta lake databrick's prime focus was the compute not the storage layer and more specifically they were a compute engine not a database snowflake approached from the opposite end of the pool as they originally fit the mold of the classic database company rather than a specific compute engine per se the lake house pushes both companies outside of their original comfort zones data bricks to storage snowflake to compute engine so it makes perfect sense for databricks to embrace the open source narrative at the storage layer and for snowflake to continue its walled garden approach but in the long run their strategies are already overlapping databricks is not a 100 open source company its practitioner experience has always been proprietary and now so is its sql query engine likewise snowflake has had to open up with the support of iceberg for open data lake format the question really becomes how serious snowflake will be in making iceberg a first-class citizen in its environment that is not necessarily officially branding a lake house but effectively is and likewise can databricks deliver the service levels associated with walled gardens through a more brute force approach that relies heavily on the query engine at the end of the day those are the key requirements that will matter to data bricks and snowflake customers end quote that was some deep thought by by tony thank you for that sanjay mohan added the following quote open source is a slippery slope people buy mobile phones based on open source android but it's not fully open similarly databricks delta lake was not originally fully open source and even today its photon execution engine is not we are always going to live in a hybrid world snowflake and databricks will support whatever model works best for them and their customers the big question is do customers care as deeply about which vendor has a higher degree of openness as we technology people do i believe customers evaluation criteria is far more nuanced than just to decipher each vendor's open source claims end quote okay so i had to ask dodgeville about their so-called wall garden approach and what their strategy is with apache iceberg here's what he said iceberg is is very important so just to to give some context iceberg is an open you know table format right which was you know first you know developed by netflix and netflix you know put it open source in the apache community so we embrace that's that open source standard because because it's widely used by by many um many you know companies and also many companies have you know really invested a lot of effort in building you know big data hadoop solution or data like solution and they want to use snowflake and they couldn't really use snowflake because all their data were in open you know formats so we are embracing icebergs to help these companies move through the cloud but why we have been relentless with direct access to data direct access to data is a little bit of a problem for us and and the reason is when you direct access to data now you have direct access to storage now you have to understand for example the specificity of one cloud versus the other so as soon as you start to have direct access to data you lose your you know your cloud diagnostic layer you don't access data with api when you have direct access to data it's very hard to secure data because you need to grant access direct access to tools which are not you know protected and you see a lot of you know hacking of of data you know because of that so so that was not you know direct access to data is not serving well our customers and that's why we have been relented to do that because it's it's cr it's it's not cloud diagnostic it's it's you you have to code that you have to you you you need a lot of intelligence while apis access so we want open apis that's that's i guess the way we embrace you know openness is is by open api versus you know you access directly data here's my take snowflake is hedging its bets because enough people care about open source that they have to have some open data format options and it's good optics and you heard benoit deja ville talk about the risks of directly accessing the data and the complexities it brings now is that maybe a little fud against databricks maybe but same can be said for ollie's comments maybe flooding the proprietaryness of snowflake but as both analysts pointed out open is a spectrum hey i remember unix used to equal open systems okay let's end with some etr spending data and why not compare snowflake and data bricks spending profiles this is an xy graph with net score or spending momentum on the y-axis and pervasiveness or overlap in the data set on the x-axis this is data from the january survey when snowflake was holding above 80 percent net score off the charts databricks was also very strong in the upper 60s now let's fast forward to this next chart and show you the july etr survey data and you can see snowflake has come back down to earth now remember anything above 40 net score is highly elevated so both companies are doing well but snowflake is well off its highs and data bricks has come down somewhat as well databricks is inching to the right snowflake rocketed to the right post its ipo and as we know databricks wasn't able to get to ipo during the covet bubble ali gotzi is at the morgan stanley ceo conference this week they got plenty of cash to withstand a long-term recession i'm told and they've started the message that they're a billion dollars in annualized revenue i'm not sure exactly what that means i've seen some numbers on their gross margins i'm not sure what that means i've seen some numbers on their net retention revenue or net revenue retention again i'll reserve judgment until we see an s1 but it's clear both of these companies have momentum and they're out competing in the market well as always be the ultimate arbiter different philosophies perhaps is it like democrats and republicans well it could be but they're both going after a solving data problem both companies are trying to help customers get more value out of their data and both companies are highly valued so they have to perform for their investors to paraphrase ralph nader the similarities may be greater than the differences okay that's it for today thanks to the team from palo alto for this awesome super cloud studio build alex myerson and ken shiffman are on production in the palo alto studios today kristin martin and sheryl knight get the word out to our community rob hoff is our editor-in-chief over at siliconangle thanks to all please check out etr.ai for all the survey data remember these episodes are all available as podcasts wherever you listen just search breaking analysis podcasts i publish each week on wikibon.com and siliconangle.com and you can email me at david.vellante at siliconangle.com or dm me at devellante or comment on my linkedin posts and please as i say etr has got some of the best survey data in the business we track it every quarter and really excited to be partners with them this is dave vellante for the cube insights powered by etr thanks for watching and we'll see you next time on breaking analysis [Music] you

Published Date : Aug 14 2022

SUMMARY :

and and the retry you know mechanism is

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
netflixORGANIZATION

0.99+

john furrierPERSON

0.99+

palo altoORGANIZATION

0.99+

tony bearPERSON

0.99+

bostonLOCATION

0.99+

sanji mohanPERSON

0.99+

ken shiffmanPERSON

0.99+

bothQUANTITY

0.99+

todayDATE

0.99+

ellie gotziPERSON

0.99+

VMwareORGANIZATION

0.99+

SnowflakeORGANIZATION

0.99+

siliconangle.comOTHER

0.99+

more than four petabytesQUANTITY

0.99+

first pointQUANTITY

0.99+

kristin martinPERSON

0.99+

both companiesQUANTITY

0.99+

first questionQUANTITY

0.99+

rob hoffPERSON

0.99+

more than oneQUANTITY

0.99+

second modelQUANTITY

0.98+

alex myersonPERSON

0.98+

third modelQUANTITY

0.98+

one regionQUANTITY

0.98+

one copyQUANTITY

0.98+

one regionQUANTITY

0.98+

five essential elementsQUANTITY

0.98+

androidTITLE

0.98+

100QUANTITY

0.98+

first lineQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

sherylPERSON

0.98+

more than one cloudQUANTITY

0.98+

firstQUANTITY

0.98+

iphoneCOMMERCIAL_ITEM

0.98+

super cloud 22EVENT

0.98+

each cloudQUANTITY

0.98+

eachQUANTITY

0.97+

sanjay mohanPERSON

0.97+

johnPERSON

0.97+

republicansORGANIZATION

0.97+

this weekDATE

0.97+

hundreds of yearsQUANTITY

0.97+

siliconangleORGANIZATION

0.97+

each weekQUANTITY

0.97+

data lake houseORGANIZATION

0.97+

one single regionQUANTITY

0.97+

januaryDATE

0.97+

dave vellantePERSON

0.96+

each regionQUANTITY

0.96+

oneQUANTITY

0.96+

dave vellantePERSON

0.96+

tonyPERSON

0.96+

above 80 percentQUANTITY

0.95+

more than one cloudQUANTITY

0.95+

more than one cloudQUANTITY

0.95+

data lakeORGANIZATION

0.95+

five essential propertiesQUANTITY

0.95+

democratsORGANIZATION

0.95+

first timeQUANTITY

0.95+

julyDATE

0.94+

linuxTITLE

0.94+

etrORGANIZATION

0.94+

devellanteORGANIZATION

0.93+

dodgevilleORGANIZATION

0.93+

each vendorQUANTITY

0.93+

super cloud 22ORGANIZATION

0.93+

delta lakeORGANIZATION

0.92+

three deployment modelsQUANTITY

0.92+

first linesQUANTITY

0.92+

dejavilleLOCATION

0.92+

day oneQUANTITY

0.92+

Super Data Cloud | Supercloud22


 

(electronic music) >> Welcome back to our studios in Palo Alto, California. My name is Dave Vellante, I'm here with John Furrier, who is taking a quick break. You know, in one of the early examples that we used of so called super cloud was Snowflake. We called it a super data cloud. We had, really, a lot of fun with that. And we've started to evolve our thinking. Years ago, we said that data was going to form in the cloud around industries and ecosystems. And Benoit Dogeville is a many time guest of theCube. He's the co-founder and president of products at Snowflake. Benoit, thanks for spending some time with us, at Supercloud 22, good to see you. >> Thank you, thank you, Dave. >> So, you know, like I said, we've had some fun with this meme. But it really is, we heard on the previous panel, everybody's using Snowflake as an example. Somebody how builds on top of hyper scale infrastructure. You're not building your own data centers. And, so, are you building a super data cloud? >> We don't call it exactly that way. We don't like the super word, it's a bit dismissive. >> That's our term. >> About our friends, cloud provider friends. But we call it a data cloud. And the vision, really, for the data cloud is, indeed, it's a cloud which overlays the hyper scaler cloud. But there is a big difference, right? There are several ways to do this super cloud, as you name them. The way we picked is to create one single system, and that's very important, right? There are several ways, right. You can instantiate your solution in every region of the cloud and, you know, potentially that region could be AWS, that region could be GCP. So, you are, indeed, a multi-cloud solution. But Snowflake, we did it differently. We are really creating cloud regions, which are superimposed on top of the cloud provider region, infrastructure region. So, we are building our regions. But where it's very different is that each region of Snowflake is not one instantiation of our service. Our service is global, by nature. We can move data from one region to the other. When you land in Snowflake, you land into one region. But you can grow from there and you can, you know, exist in multiple cloud at the same time. And that's very important, right? It's not different instantiation of a system, it's one single instantiation which covers many cloud regions and many cloud provider. >> So, we used Snowflake as an example. And we're trying to understand what the salient aspects are of your data cloud, what we call super cloud. In fact, you've used the word instantiate. Kit Colbert, just earlier today, laid out, he said, there's sort of three levels. You can run it on one cloud and communicate with the other cloud, you can instantiate on the clouds, or you can have the same service running 24/7 across clouds, that's the hardest example. >> Yeah. >> The most mature. You just described, essentially, doing that. How do you enable that? What are the technical enablers? >> Yeah, so, as I said, first we start by building, you know, Snowflake regions, we have today 30 regions that span the world, so it's a world wide system, with many regions. But all these regions are connected together. They are meshed together with our technology, we name it Snow Grid, and that makes it hard because, you know, Azure region can talk to a WS region, or GCP regions, and as a user for our cloud, you don't see, really, these regional differences, that regions are in different potentially cloud. When you use Snowflake, you can exist, your presence as an organization can be in several regions, several clouds, if you want, geographic, both geographic and cloud provider. >> So, I can share data irrespective of the cloud. And I'm in the Snowflake data cloud, is that correct? I can do that today? >> Exactly, and that's very critical, right? What we wanted is to remove data silos. And when you insociate a system in one single region, and that system is locked in that region, you cannot communicate with other parts of the world, you are locking data in one region. Right, and we didn't want to do that. We wanted data to be distributed the way customer wants it to be distributed across the world. And potentially sharing data at world scales. >> Does that mean if I'm in one region and I want to run a query, if I'm in AWS in one region, and I want to run a query on data that happens to be in an Azure cloud, I can actually execute that? >> So, yes and no. The way we do it is very expensive to do that. Because, generally, if you want to join data which are in different region and different cloud, it's going to be very expensive because you need to move data every time you join it. So, the way we do it is that you replicate the subset of data that you want to access from one region from other region. So, you can create this data mesh, but data is replicated to make it very cheap and very performing too. >> And is the Snow Grid, does that have the metadata intelligence to actually? >> Yes, yes. >> Can you describe that a little? >> Yeah, Snow Grid is both a way to exchange metadata. So, each region of Snowflake knows about all the other regions of Snowflake. Every time we create a new region, the metadata is distributed over our data cloud, not only region knows all the region, but knows every organization that exists in our cloud, where this organization is, where data can be replicated by this organization. And then, of course, it's also used as a way to exchange data, right? So, you can exchange data by scale of data size. And I was just receiving an email from one of our customers who moved more than four petabytes of data, cross region, cross cloud providers in, you know, few days. And it's a lot of data, so it takes some time to move. But they were able to do that online, completely online, and switch over to the other region, which is very important also. >> So, one of the hardest parts about super cloud that I'm still trying to struggling through is the security model. Because you've got the cloud as your sort of first line of defense. And now we've got multiple clouds, with multiple first lines of defense, I've got a shared responsibility model across those clouds, I've got different tools in each of those clouds. Do you take care of that? Where do you pick up from the cloud providers? Do you abstract that security layer? Do you bring in partners? It's a very complicated. >> No, this is a great question. Security has always been the most important aspect of Snowflake sense day one, right? This is the question that every customer of ours has. You know, how can you guarantee the security of my data? And, so, we secure data really tightly in region. We have several layers of security. It starts by creating every data at rest. And that's very important. A lot of customers are not doing that, right? You hear of these attacks, for example, on cloud, where someone left their buckets. And then, you know, you can access the data because it's a non-encrypted. So, we are encrypting everything at rest. We are encrypting everything in transit. So, a region is very secure. Now, you know, from one region, you never access data from another region in Snowflake. That's why, also, we replicate data. Now the replication of that data across region, or the metadata, for that matter, is really our least secure, so Snow Grid ensures that everything is encrypted, everything is, we have multiple encryption keys, and it's stored in hardware secure modules, so, we bit Snow Grid such that it's secure and it allows very secure movement of data. >> Okay, so, I know we kind of, getting into the technology here a lot today, but because super cloud is the future, we actually have to have an architectural foundation on which to build. So, you mentioned a bucket, like an S3 bucket. Okay, that's storage, but you also, for instance, taking advantage of new semi-conductor technology. Like Graviton, as an example, that drives efficiency. You guys talk about how you pass that on to your customers. Even if it means less revenue for you, so, awesome, we love that, you'll make it up in volume. And, so. >> Exactly. >> How do you deal with the lowest common denominator problem? I was talking to somebody the other day and this individual brought up what I thought was a really good point. What if we, let's say, AWS, have the best, silicon. And we can run the fastest and the least expensive, and the lowest power. But another cloud provider hasn't caught up yet. How do you deal with that delta? Do you just take the best of and try to respect that? >> No, it's a great question. I mean, of course, our software is extracting all the cloud providers infrastructure so that when you run in one region, let's say AWS, or Azure, it doesn't make any difference, as far as the applications are concerned. And this abstraction, of course, is a lot of work. I mean, really, a lot of work. Because it needs to be secure, it needs to be performance, and every cloud, and it has to expose APIs which are uniform. And, you know, cloud providers, even though they have potentially the same concept, let's say block storage, APIs are completely different. The way these systems are secure, it's completely different. There errors that you can get. And the retry mechanism is very different from one cloud to the other. The performance is also different. We discovered that when we starting to port our software. And we had to completely rethink how to leverage block storage in that cloud versus that cloud, because just off performance too. And, so, we had, for example, to stripe data. So, all this work is work that you don't need as an application because our vision, really, is that application, which are running in our data cloud, can be abstracted for this difference. And we provide all the services, all the workload that this application need. Whether it's transactional access to data, analytical access to data, managing logs, managing metrics, all of this is abstracted too, so that they are not tied to one particular service of one cloud. And distributing this application across many region, many cloud, is very seamless. >> So, Snowflake has built, your team has built a true abstraction layer across those clouds that's available today? It's actually shipping? >> Yes, and we are still developing it. You know, transactional, Unistore, as we call it, was announced last summit. So, they are still, you know, work in progress. >> You're not done yet. >> But that's the vision, right? And that's important, because we talk about the infrastructure, right. You mention a lot about storage and compute. But it's not only that, right. When you think about application, they need to use the transactional database. They need to use an analytical system. They need to use machine learning. So, you need to provide, also, all these services which are consistent across all the cloud providers. >> So, let's talk developers. Because, you know, you think Snowpark, you guys announced a big application development push at the Snowflake summit recently. And we have said that a criterion of super cloud is a super paz layer, people wince when I say that, but okay, we're just going to go with it. But the point is, it's a purpose built application development layer, specific to your particular agenda, that supports your vision. >> Yes. >> Have you essentially built a purpose built paz layer? Or do you just take them off the shelf, standard paz, and cobble it together? >> No, we build it a custom build. Because, as you said, what exist in one cloud might not exist in another cloud provider, right. So, we have to build in this, all these components that a multi-application need. And that goes to machine learning, as I said, transactional analytical system, and the entire thing. So that it can run in isolation physically. >> And the objective is the developer experience will be identical across those clouds? >> Yes, the developers doesn't need to worry about cloud provider. And, actually, our system will have, we didn't talk about it, but a marketplace that we have, which allows, actually, to deliver. >> We're getting there. >> Yeah, okay. (both laughing) I won't divert. >> No, no, let's go there, because the other aspect of super cloud that we've talked about is the ecosystem. You have to enable an ecosystem to add incremental value, it's not the power of many versus the capabilities of one. So, talk about the challenges of doing that. Not just the business challenges but, again, I'm interested in the technical and architectural challenges. >> Yeah, yeah, so, it's really about, I mean, the way we enable our ecosystem and our partners to create value on top of our data cloud, is via the marketplace. Where you can put shared data on the marketplace. Provide listing on this marketplace, which are data sets. But it goes way beyond data. It's all the way to application. So, you can think of it as the iPhone. A little bit more, all right. Your iPhone is great. Not so much because the hardware is great, or because of the iOS, but because of all the applications that you have. And all these applications are not necessarily developed by Apple, basically. So, we are, it's the same model with our marketplace. We foresee an environment where providers and partners are going to build these applications. We call it native application. And we are going to help them distribute these applications across cloud, everywhere in the world, potentially. And they don't need to worry about that. They don't need to worry about how these applications are going to be instantiated. We are going to help them to monetize these applications. So, that unlocks, you know, really, all the partner ecosystem that you have seen, you know, with something like the iPhone, right? It has created so many new companies that have developed these applications. >> Your detractors have criticized you for being a walled garden. I've actually used that term. I used terms like defacto standard, which are maybe less sensitive to you, but, nonetheless, we've seen defacto standards actually deliver value. I've talked to Frank Slootman about this, and he said, Dave, we deliver value, that's what we're all about. At the same time, he even said to me, and I want your thoughts on this, is, look, we have to embrace open source where it makes sense. You guys announced Apache Iceberg. So, what are your thoughts on that? Is that to enable a developer ecosystem? Why did you do Iceberg? >> Yeah, Iceberg is very important. So, just to give some context, Iceberg is an open table format. >> Right. >> Which was first developed by Netflix. And Netflix put it open source in the Apache community. So, we embraced that open source standard because it's widely used by many companies. And, also, many companies have really invested a lot of effort in building big data, Hadoop Solutions, or DataX Solution, and they want to use Snowflake. And they couldn't really use Snowflake, because all their data were in open format. So, we are embracing Iceberg to help these companies move through the cloud. But why we have been reluctant with direct access to data, direct access to data is a little bit of a problem for us. And the reason is when you direct access to data, now you have direct access to storage. Now you have to understand, for example, the specificity of one cloud versus the other. So, as soon as you start to have direct access to data, you lose your cloud data sync layer. You don't access data with API. When you have direct access to data, it's very hard to sync your data. Because you need to grant access, direct access to tools which are not protected. And you see a lot of hacking of data because of that. So, direct access to data is not serving well our customers, and that's why we have been reluctant to do that. Because it is not cloud diagnostic. You have to code that, you need a lot of intelligence, why APIs access, so we want open APIs. That's, I guess, the way we embrace openness, is by open API versus you access, directly, data. >> iPhone. >> Yeah, yeah, iPhone, APIs, you know. We define a set of APIs because APIs, you know, the implementation of the APIs can change, can improve. You can improve compression of data, for example. If you open direct access to data now, you cannot evolve. >> My point is, you made a promise, from governed, security, data sharing ecosystem. It works the same way, so that's the path that you've chosen. Benoit Dogeville, thank you so much for coming on theCube and participating in Supercloud 22, really appreciate that. >> Thank you, Dave. It was a great pleasure. >> All right, keep it right there, we'll be right back with our next segment, right after this short break. (electronic music)

Published Date : Aug 9 2022

SUMMARY :

You know, in one of the So, you know, like I said, We don't like the super and you can, you know, or you can have the same How do you enable that? we start by building, you know, And I'm in the Snowflake And when you insociate a So, the way we do it is that you replicate So, you can exchange data So, one of the hardest And then, you know, So, you mentioned a and the least expensive, so that when you run in one So, they are still, you know, So, you need to provide, Because, you know, you think Snowpark, And that goes to machine a marketplace that we have, I won't divert. So, talk about the of all the applications that you have. At the same time, he even said to me, So, just to give some context, You have to code that, you because APIs, you know, so that's the path that you've chosen. It was a great pleasure. with our next segment, right

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Frank SlootmanPERSON

0.99+

BenoitPERSON

0.99+

DavePERSON

0.99+

AppleORGANIZATION

0.99+

AWSORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

John FurrierPERSON

0.99+

Kit ColbertPERSON

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

Benoit DogevillePERSON

0.99+

one regionQUANTITY

0.99+

iOSTITLE

0.99+

30 regionsQUANTITY

0.99+

more than four petabytesQUANTITY

0.99+

SnowflakeEVENT

0.99+

first lineQUANTITY

0.99+

SnowparkORGANIZATION

0.99+

SnowflakeORGANIZATION

0.98+

todayDATE

0.98+

ApacheORGANIZATION

0.98+

bothQUANTITY

0.98+

eachQUANTITY

0.98+

oneQUANTITY

0.97+

UnistoreORGANIZATION

0.97+

Supercloud 22EVENT

0.97+

first linesQUANTITY

0.97+

DataX SolutionORGANIZATION

0.97+

each regionQUANTITY

0.96+

Snow GridTITLE

0.96+

one cloudQUANTITY

0.96+

one single regionQUANTITY

0.96+

firstQUANTITY

0.96+

one regionQUANTITY

0.96+

Hadoop SolutionsORGANIZATION

0.95+

WSORGANIZATION

0.94+

Supercloud 22ORGANIZATION

0.93+

three levelsQUANTITY

0.93+

SnowflakeTITLE

0.93+

one single systemQUANTITY

0.92+

IcebergTITLE

0.89+

one single instantiationQUANTITY

0.89+

theCubeORGANIZATION

0.86+

AzureORGANIZATION

0.85+

Years agoDATE

0.83+

earlier todayDATE

0.82+

one instantiationQUANTITY

0.82+

Super Data CloudORGANIZATION

0.81+

S3COMMERCIAL_ITEM

0.8+

one cloudQUANTITY

0.76+

deltaORGANIZATION

0.76+

AzureTITLE

0.75+

one of our customersQUANTITY

0.72+

day oneQUANTITY

0.72+

Supercloud22EVENT

0.66+

Ali Ghodsi, Databricks | Supercloud22


 

(light hearted music) >> Okay, welcome back to Supercloud '22. I'm John Furrier, host of theCUBE. We got Ali Ghodsi here, co-founder and CEO of Databricks. Ali, Great to see you. Thanks for spending your valuable time to come on and talk about Supercloud and the future of all the structural change that's happening in cloud computing. >> My pleasure, thanks for having me. >> Well, first of all, congratulations. We've been talking for many, many years, and I still go back to the video that we have in archive, you talking about cloud. And really, at the beginning of the big reboot, I called the post Hadoop, a revitalization of data. Congratulations, you've been cloud-first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Street Journal, so first, congratulations. >> Thank you so much, appreciate it. >> So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys, they're AI and ML-native, and that's their advantage over the competition. So I pressed why. I don't think they knew why, but that's an interesting perspective. This idea of cloud native, AI/ML-native, ML Ops, this has been a big trend and it's continuing. This is a big part of how this change and this structural change is happening. How do you react to that? And how do you see Databricks evolving into this new Supercloud-like multi-cloud environment? >> Yeah, look, I think it's a continuum. It starts with having data, but they want to clean it, you know, and they want to get insights out of it. But then, eventually, you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday, last week, but soon you want to start using the crystal ball, predictive technology. Okay, but what will my revenue be next week? Next quarter? Who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve, all the way to that, the prescriptive, automated AI machine learning. That's when you get real competitive advantage. And you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. We want to help all organizations to be able to leverage data and AI that way that the fans did. >> One of the things we're looking at with supercloud and why we call it supercloud versus other things like multi-cloud is that today a lot of the successful companies have started in the cloud have been successful, but have realized and even enterprises who have gotten by accident, and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So, people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things but the ones that are innovating have been successful in one native cloud because the use cases that drove that got scale got value, and then they're making that super by bringing it on premise, putting in a modern data stack, for the modern application development, and kind of dealing with the things that you guys are in the middle of with data bricks is that, that is where the action is, and they don't want to go, lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases, but now other use cases are emerging that's on premises and edge that have been driven by applications because of the developer boom, that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's the, what's your perspective on this supercloud kind of position? >> Look, it started with not AR on multiple clouds, right? So multi-cloud has been a thing. It became a thing 70, 80% of our customers when you ask them, they're more than one cloud. But then soon to start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, on each of the clouds. And that's when I started thinking about let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as we said so that we can actually go super, as you said, that's one. The second thing is, can we simplify it? You know, and I think today, the data landscape is complicated. Conceptually it's simple. You have data which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, store stuff in the Delta Lake you know, get your data engineers. If you want streaming real time thing that's another complete different set of technologies you have to buy. And then you have to stitch all these together, and you have to do again and again on every cloud. So they just want simplification. So that's why we're big believers in this Delta Lakehouse concept. Which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of supercloud as you call it. >> You know, we've been talking about that in previous interviews, do the heavy lifting let them get the value. I have to ask you about how you see that going forward, Because if I'm a customer, I have a lot of operational challenges. Cause the developers are are kicking butt right now. We see that clearly. Open sources growing at, and continue to be great. But ops and security teams they really care about this stuff. And most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? >> Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard. especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source, that's super important. We hear that all the time from them. They want open the source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it. If there's security breaches, those kind of things that they can have a community around it. So they can actually leverage that. So they're the ones that are really pushing this, and we're seeing it across the board. You know, it starts first with the digital natives you know, the companies that are, but slowly it's also now percolating to the other organizations, we're hearing across the board. >> Where are we, Ali on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of Databricks, and data at scale, as they start to build out their teams and operations, because some are like kind of starting, crawl, walk, run, kind of vibe. Some are big companies, they're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing in customer? >> Yeah, what I've seen, that's really exciting about this Delta Lakehouse concept is that we're now seeing a lot of use cases around real time. So real time fraud detection, real time stock ticker pricing, anyone that's doing trading, they want that to work real time. Lots of use cases around that. Lots of use cases around how do we in real time drive more engagement on our web assets if we're a media company, right? We have all these assets how do we get people to get engaged? Stay on our sites. Continue engaging with the material we have. Those are real time use cases. And the interesting thing is, they're real time. So, you know, it's really important that you that now you don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant, half an hour ago. So you want it to be real time, but B, that it's also all based on machine learning. These are a lot of this is trying to predict what you want to see, what you want to do, is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see, the combination of real time and machine learning on the Lakehouse. And finally, I would say the Lakehouse is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real time use cases. And then it can't hit those real time deadlines. So that's another catalyst for this Lakehouse pattern. >> Would that be an example of how the metrics are changing? Cause I've been looking at some people saying, well you can tell if someone's doing well there's a lot of data being transferred. And then I was saying, well, wait a minute. Data transfer costs money, right? And time. So this is interesting dynamic, in a way you don't want to have a lot of movement, right? >> Yeah, movement actually decreases for a lot of these real time use cases. 'Cause what we saw in the past was that they would run a batch processing to process all the data. So once they process all the data. But actually if you look at the things that have changed since the data that we have yesterday it's actually not that much. So if you can actually incrementally process it in real time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, the bill shrinks and the cost goes down, and they can process less. >> Yeah, and it'd be interesting to see how those KPIs evolve into industry metrics down the road around the supercloud of evolution. I got to ask you about the open source concept of data platforms. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop World left off as Dave Vellante always points out. But if working across clouds is super important. How are you guys looking at the ability to work across the different clouds with data bricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this data bricks capability across the clouds? >> Yeah, I mean, let me start by saying, we just we're big fans of open source. We think that open source is a force in software. That's going to continue for, decades, hundreds of years, and it's going to slowly replace all proprietary code in its way. We saw that, it could do that with the most advanced technology. Windows, you know proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the Delta Lakehouse is that slowly the open source community is building a replacement for the proprietary data warehouse, Delta Lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you layout your data in the cloud. And when it comes a really important protocol called data sharing, that enables you in a open way actually for the first time ever share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks, you just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently just one copy of the data. So you don't have to copy it if you're within the same cloud. >> So you're playing the long game on open source. >> Absolutely. I mean, this is a force it's going to be there if if you deny it, before you know it there's going to be, something like Linux, that is going to be a threat to your propriety. >> I totally agree by the way. I was just talking to somebody the other day and they're like hey, the software industry someone made the comment, the software industry, the software industry is open source. There's no more software industry, it's called open source. It's integrations that become interesting. And I was looking at integrations now is really where the action is. And we had a panel with the Clouderati we called it, the people have been around for a long time. And it was called the innovator's dilemma. And one of the comments was it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of supercloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? >> Actually that's a great point. I think the beauty of this is, look the ecosystem of data today is vast, there's this picture that someone puts together every year of all the different vendors and how they relate, and it gets bigger and bigger and messy and messier. So, we see customers use all kinds of different aspects of what's existing in the ecosystem and they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks as a vendor, we don't have to go tell people please integrate with Databricks. The open source technology that we contribute to, automatically, people are integrating with it. Delta Lake has integrations with lots of different software out there and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure make sure our software works well with Spark. Make sure our software works well with Delta and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. >> Well, I really appreciate your time. And I, my final question for you is, as we're kind of unpack and and kind of shape and frame supercloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating. Integrations has got to be seamless and frictionless. Abstraction layer make things super easy and take away the complexity. What is supercloud to them? What does the outcome look like? How would you define a supercloud environment for an enterprise? >> Yeah, for me, it's the simplification that you get where you standardize an open source. You get your data in one place, in one format in one standardized way, and then you can get your insights from it, without having to buy lots of different idiosyncratic proprietary software from different vendors. That's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple years it's going to be a requirement that, does your software work on all these different departments? Is it based on open source? Is it using this Delta Lake house pattern? And if it's not, I think they're going to demand it. >> Yeah, I feel like we're close to some sort of defacto standard coming and you guys are a big part of it, once that clicks in, it's going to highly accelerate in the open, and I think it's going to be super valuable. Ali, thank you so much for your time, and congratulations to you and your team. Like we've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference in making a super cool environment out there. Thanks for coming on sharing. >> Thank you so much John. >> Okay, this is supercloud 22. I'm John Furrier stay with more for more coverage and more commentary after this break. (light hearted music)

Published Date : Aug 7 2022

SUMMARY :

and the future of all Congratulations to you and your team And how do you see Databricks evolving And if you can finally One of the things we're And then you have to I have to ask you about how We hear that all the time from them. What are some of the use cases that delays the real time use cases. in a way you don't want to So if you can actually incrementally I got to ask you about So you don't have to copy it So you're playing the that is going to be a And one of the comments was And that's the way you and take away the complexity. simplification that you get and congratulations to you and your team. Okay, this is supercloud 22.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Ali GhodsiPERSON

0.99+

Dave VellantePERSON

0.99+

GoogleORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

JohnPERSON

0.99+

last weekDATE

0.99+

next weekDATE

0.99+

AliPERSON

0.99+

Next quarterDATE

0.99+

yesterdayDATE

0.99+

John FurrierPERSON

0.99+

DeltaORGANIZATION

0.99+

one formatQUANTITY

0.99+

firstQUANTITY

0.99+

todayDATE

0.98+

second thingQUANTITY

0.98+

oneQUANTITY

0.98+

LinuxTITLE

0.98+

one copyQUANTITY

0.98+

Delta LakehouseORGANIZATION

0.98+

supercloud 22ORGANIZATION

0.98+

more than one cloudQUANTITY

0.98+

each environmentQUANTITY

0.98+

ClouderatiORGANIZATION

0.98+

Supercloud22ORGANIZATION

0.98+

hundreds of yearsQUANTITY

0.97+

Delta LakeLOCATION

0.97+

one big problemQUANTITY

0.97+

70, 80%QUANTITY

0.97+

WindowsTITLE

0.96+

one placeQUANTITY

0.96+

first timeQUANTITY

0.96+

billion dollarsQUANTITY

0.95+

decadesQUANTITY

0.95+

Delta LakeORGANIZATION

0.95+

OneQUANTITY

0.94+

supercloudORGANIZATION

0.94+

SupercloudORGANIZATION

0.94+

half an hour agoDATE

0.93+

Delta LakeTITLE

0.92+

LakehouseORGANIZATION

0.92+

SparkTITLE

0.91+

eachQUANTITY

0.91+

a minuteQUANTITY

0.85+

one ofQUANTITY

0.73+

one nativeQUANTITY

0.72+

supercloudTITLE

0.7+

couple yearsQUANTITY

0.66+

AltaVistaORGANIZATION

0.65+

Wall Street JournalORGANIZATION

0.63+

theCUBEORGANIZATION

0.63+

LakehouseTITLE

0.51+

LakeLOCATION

0.46+

Hadoop WorldTITLE

0.41+

'22EVENT

0.24+

Starburst Panel Q2


 

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RichardPERSON

0.99+

Jess BorgmanPERSON

0.99+

JustinPERSON

0.99+

sixQUANTITY

0.99+

OracleORGANIZATION

0.99+

Richard JarvisPERSON

0.99+

20 centsQUANTITY

0.99+

20%QUANTITY

0.99+

Kurt MonashPERSON

0.99+

P Sanji MohanPERSON

0.99+

TodayDATE

0.99+

seven yearsQUANTITY

0.99+

pythonsTITLE

0.99+

TeradataORGANIZATION

0.99+

JAKPERSON

0.99+

JavasTITLE

0.99+

10DATE

0.99+

todayDATE

0.98+

StarbustTITLE

0.98+

StarburstORGANIZATION

0.97+

VMwareORGANIZATION

0.97+

bothQUANTITY

0.97+

12 years agoDATE

0.96+

single pointQUANTITY

0.96+

millions of hoursQUANTITY

0.95+

10QUANTITY

0.93+

UnixTITLE

0.92+

12 yearsQUANTITY

0.92+

GoogleORGANIZATION

0.9+

two worldsQUANTITY

0.9+

DYORGANIZATION

0.87+

first data lakeQUANTITY

0.86+

HuduLOCATION

0.85+

trillionsQUANTITY

0.85+

one thingQUANTITY

0.83+

many years agoDATE

0.79+

Apache icebergORGANIZATION

0.79+

over a couple timesQUANTITY

0.77+

emus healthORGANIZATION

0.75+

JiminPERSON

0.73+

StarburstTITLE

0.73+

years agoDATE

0.72+

AzureTITLE

0.7+

KubernetesORGANIZATION

0.67+

TCOORGANIZATION

0.64+

S3TITLE

0.62+

DeltaORGANIZATION

0.6+

plus yearsDATE

0.59+

Number twoQUANTITY

0.58+

a decadeQUANTITY

0.56+

icebergTITLE

0.47+

ParqueORGANIZATION

0.47+

lastDATE

0.47+

20QUANTITY

0.46+

Q2QUANTITY

0.31+

ORCORGANIZATION

0.27+