Image Title

Search Results for sql:

Breaking Analysis: Databricks faces critical strategic decisions…here’s why


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Alex MyersonPERSON

0.99+

David FloyerPERSON

0.99+

Mike OlsonPERSON

0.99+

2014DATE

0.99+

George GilbertPERSON

0.99+

Dave VellantePERSON

0.99+

GeorgePERSON

0.99+

Cheryl KnightPERSON

0.99+

Ken SchiffmanPERSON

0.99+

Andy JassyPERSON

0.99+

OracleORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Erik BradleyPERSON

0.99+

DavePERSON

0.99+

UberORGANIZATION

0.99+

thousandsQUANTITY

0.99+

Sun MicrosystemsORGANIZATION

0.99+

50 yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

Bob MugliaPERSON

0.99+

GartnerORGANIZATION

0.99+

AirbnbORGANIZATION

0.99+

60 yearsQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

Ali GhodsiPERSON

0.99+

2010DATE

0.99+

DatabricksORGANIZATION

0.99+

Kristin MartinPERSON

0.99+

Rob HofPERSON

0.99+

threeQUANTITY

0.99+

15 yearsQUANTITY

0.99+

Databricks'ORGANIZATION

0.99+

two placesQUANTITY

0.99+

BostonLOCATION

0.99+

Tristan HandyPERSON

0.99+

M&AORGANIZATION

0.99+

Frank QuattronePERSON

0.99+

second elementQUANTITY

0.99+

Daren BrabhamPERSON

0.99+

TechAlpha PartnersORGANIZATION

0.99+

third elementQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

50 yearQUANTITY

0.99+

40%QUANTITY

0.99+

ClouderaORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

five yearsQUANTITY

0.99+

Adam Wenchel & John Dickerson, Arthur | AWS Startup Showcase S3 E1


 

(upbeat music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase AI Machine Learning Top Startups Building Generative AI on AWS. This is season 3, episode 1 of the ongoing series covering the exciting startup from the AWS ecosystem to talk about AI and machine learning. I'm your host, John Furrier. I'm joined by two great guests here, Adam Wenchel, who's the CEO of Arthur, and Chief Scientist of Arthur, John Dickerson. Talk about how they help people build better LLM AI systems to get them into the market faster. Gentlemen, thank you for coming on. >> Yeah, thanks for having us, John. >> Well, I got to say I got to temper my enthusiasm because the last few months explosion of interest in LLMs with ChatGPT, has opened the eyes to everybody around the reality of that this is going next gen, this is it, this is the moment, this is the the point we're going to look back and say, this is the time where AI really hit the scene for real applications. So, a lot of Large Language Models, also known as LLMs, foundational models, and generative AI is all booming. This is where all the alpha developers are going. This is where everyone's focusing their business model transformations on. This is where developers are seeing action. So it's all happening, the wave is here. So I got to ask you guys, what are you guys seeing right now? You're in the middle of it, it's hitting you guys right on. You're in the front end of this massive wave. >> Yeah, John, I don't think you have to temper your enthusiasm at all. I mean, what we're seeing every single day is, everything from existing enterprise customers coming in with new ways that they're rethinking, like business things that they've been doing for many years that they can now do an entirely different way, as well as all manner of new companies popping up, applying LLMs to everything from generating code and SQL statements to generating health transcripts and just legal briefs. Everything you can imagine. And when you actually sit down and look at these systems and the demos we get of them, the hype is definitely justified. It's pretty amazing what they're going to do. And even just internally, we built, about a month ago in January, we built an Arthur chatbot so customers could ask questions, technical questions from our, rather than read our product documentation, they could just ask this LLM a particular question and get an answer. And at the time it was like state of the art, but then just last week we decided to rebuild it because the tooling has changed so much that we, last week, we've completely rebuilt it. It's now way better, built on an entirely different stack. And the tooling has undergone a full generation worth of change in six weeks, which is crazy. So it just tells you how much energy is going into this and how fast it's evolving right now. >> John, weigh in as a chief scientist. I mean, you must be blown away. Talk about kid in the candy store. I mean, you must be looking like this saying, I mean, she must be super busy to begin with, but the change, the acceleration, can you scope the kind of change you're seeing and be specific around the areas you're seeing movement and highly accelerated change? >> Yeah, definitely. And it is very, very exciting actually, thinking back to when ChatGPT was announced, that was a night our company was throwing an event at NeurIPS, which is maybe the biggest machine learning conference out there. And the hype when that happened was palatable and it was just shocking to see how well that performed. And then obviously over the last few months since then, as LLMs have continued to enter the market, we've seen use cases for them, like Adam mentioned all over the place. And so, some things I'm excited about in this space are the use of LLMs and more generally, foundation models to redesign traditional operations, research style problems, logistics problems, like auctions, decisioning problems. So moving beyond the already amazing news cases, like creating marketing content into more core integration and a lot of the bread and butter companies and tasks that drive the American ecosystem. And I think we're just starting to see some of that. And in the next 12 months, I think we're going to see a lot more. If I had to make other predictions, I think we're going to continue seeing a lot of work being done on managing like inference time costs via shrinking models or distillation. And I don't know how to make this prediction, but at some point we're going to be seeing lots of these very large scale models operating on the edge as well. So the time scales are extremely compressed, like Adam mentioned, 12 months from now, hard to say. >> We were talking on theCUBE prior to this session here. We had theCUBE conversation here and then the Wall Street Journal just picked up on the same theme, which is the printing press moment created the enlightenment stage of the history. Here we're in the whole nother automating intellect efficiency, doing heavy lifting, the creative class coming back, a whole nother level of reality around the corner that's being hyped up. The question is, is this justified? Is there really a breakthrough here or is this just another result of continued progress with AI? Can you guys weigh in, because there's two schools of thought. There's the, "Oh my God, we're entering a new enlightenment tech phase, of the equivalent of the printing press in all areas. Then there's, Ah, it's just AI (indistinct) inch by inch. What's your guys' opinion? >> Yeah, I think on the one hand when you're down in the weeds of building AI systems all day, every day, like we are, it's easy to look at this as an incremental progress. Like we have customers who've been building on foundation models since we started the company four years ago, particular in computer vision for classification tasks, starting with pre-trained models, things like that. So that part of it doesn't feel real new, but what does feel new is just when you apply these things to language with all the breakthroughs and computational efficiency, algorithmic improvements, things like that, when you actually sit down and interact with ChatGPT or one of the other systems that's out there that's building on top of LLMs, it really is breathtaking, like, the level of understanding that they have and how quickly you can accelerate your development efforts and get an actual working system in place that solves a really important real world problem and makes people way faster, way more efficient. So I do think there's definitely something there. It's more than just incremental improvement. This feels like a real trajectory inflection point for the adoption of AI. >> John, what's your take on this? As people come into the field, I'm seeing a lot of people move from, hey, I've been coding in Python, I've been doing some development, I've been a software engineer, I'm a computer science student. I'm coding in C++ old school, OG systems person. Where do they come in? Where's the focus, where's the action? Where are the breakthroughs? Where are people jumping in and rolling up their sleeves and getting dirty with this stuff? >> Yeah, all over the place. And it's funny you mentioned students in a different life. I wore a university professor hat and so I'm very, very familiar with the teaching aspects of this. And I will say toward Adam's point, this really is a leap forward in that techniques like in a co-pilot for example, everybody's using them right now and they really do accelerate the way that we develop. When I think about the areas where people are really, really focusing right now, tooling is certainly one of them. Like you and I were chatting about LangChain right before this interview started, two or three people can sit down and create an amazing set of pipes that connect different aspects of the LLM ecosystem. Two, I would say is in engineering. So like distributed training might be one, or just understanding better ways to even be able to train large models, understanding better ways to then distill them or run them. So like this heavy interaction now between engineering and what I might call traditional machine learning from 10 years ago where you had to know a lot of math, you had to know calculus very well, things like that. Now you also need to be, again, a very strong engineer, which is exciting. >> I interviewed Swami when he talked about the news. He's ahead of Amazon's machine learning and AI when they announced Hugging Face announcement. And I reminded him how Amazon was easy to get into if you were developing a startup back in 2007,8, and that the language models had that similar problem. It's step up a lot of content and a lot of expense to get provisioned up, now it's easy. So this is the next wave of innovation. So how do you guys see that from where we are right now? Are we at that point where it's that moment where it's that cloud-like experience for LLMs and large language models? >> Yeah, go ahead John. >> I think the answer is yes. We see a number of large companies that are training these and serving these, some of which are being co-interviewed in this episode. I think we're at that. Like, you can hit one of these with a simple, single line of Python, hitting an API, you can boot this up in seconds if you want. It's easy. >> Got it. >> So I (audio cuts out). >> Well let's take a step back and talk about the company. You guys being featured here on the Showcase. Arthur, what drove you to start the company? How'd this all come together? What's the origination story? Obviously you got a big customers, how'd get started? What are you guys doing? How do you make money? Give a quick overview. >> Yeah, I think John and I come at it from slightly different angles, but for myself, I have been a part of a number of technology companies. I joined Capital One, they acquired my last company and shortly after I joined, they asked me to start their AI team. And so even though I've been doing AI for a long time, I started my career back in DARPA. It was the first time I was really working at scale in AI at an organization where there were hundreds of millions of dollars in revenue at stake with the operation of these models and that they were impacting millions of people's financial livelihoods. And so it just got me hyper-focused on these issues around making sure that your AI worked well and it worked well for your company and it worked well for the people who were being affected by it. At the time when I was doing this 2016, 2017, 2018, there just wasn't any tooling out there to support this production management model monitoring life phase of the life cycle. And so we basically left to start the company that I wanted. And John has a his own story. I'll let let you share that one, John. >> Go ahead John, you're up. >> Yeah, so I'm coming at this from a different world. So I'm on leave now from a tenured role in academia where I was leading a large lab focusing on the intersection of machine learning and economics. And so questions like fairness or the response to the dynamism on the underlying environment have been around for quite a long time in that space. And so I've been thinking very deeply about some of those more like R and D style questions as well as having deployed some automation code across a couple of different industries, some in online advertising, some in the healthcare space and so on, where concerns of, again, fairness come to bear. And so Adam and I connected to understand the space of what that might look like in the 2018 20 19 realm from a quantitative and from a human-centered point of view. And so booted things up from there. >> Yeah, bring that applied engineering R and D into the Capital One, DNA that he had at scale. I could see that fit. I got to ask you now, next step, as you guys move out and think about LLMs and the recent AI news around the generative models and the foundational models like ChatGPT, how should we be looking at that news and everyone watching might be thinking the same thing. I know at the board level companies like, we should refactor our business, this is the future. It's that kind of moment, and the tech team's like, okay, boss, how do we do this again? Or are they prepared? How should we be thinking? How should people watching be thinking about LLMs? >> Yeah, I think they really are transformative. And so, I mean, we're seeing companies all over the place. Everything from large tech companies to a lot of our large enterprise customers are launching significant projects at core parts of their business. And so, yeah, I would be surprised, if you're serious about becoming an AI native company, which most leading companies are, then this is a trend that you need to be taking seriously. And we're seeing the adoption rate. It's funny, I would say the AI adoption in the broader business world really started, let's call it four or five years ago, and it was a relatively slow adoption rate, but I think all that kind of investment in and scaling the maturity curve has paid off because the rate at which people are adopting and deploying systems based on this is tremendous. I mean, this has all just happened in the few months and we're already seeing people get systems into production. So, now there's a lot of things you have to guarantee in order to put these in production in a way that basically is added into your business and doesn't cause more headaches than it solves. And so that's where we help customers is where how do you put these out there in a way that they're going to represent your company well, they're going to perform well, they're going to do their job and do it properly. >> So in the use case, as a customer, as I think about this, there's workflows. They might have had an ML AI ops team that's around IT. Their inference engines are out there. They probably don't have a visibility on say how much it costs, they're kicking the tires. When you look at the deployment, there's a cost piece, there's a workflow piece, there's fairness you mentioned John, what should be, I should be thinking about if I'm going to be deploying stuff into production, I got to think about those things. What's your opinion? >> Yeah, I'm happy to dive in on that one. So monitoring in general is extremely important once you have one of these LLMs in production, and there have been some changes versus traditional monitoring that we can dive deeper into that LLMs are really accelerated. But a lot of that bread and butter style of things you should be looking out for remain just as important as they are for what you might call traditional machine learning models. So the underlying environment of data streams, the way users interact with these models, these are all changing over time. And so any performance metrics that you care about, traditional ones like an accuracy, if you can define that for an LLM, ones around, for example, fairness or bias. If that is a concern for your particular use case and so on. Those need to be tracked. Now there are some interesting changes that LLMs are bringing along as well. So most ML models in production that we see are relatively static in the sense that they're not getting flipped in more than maybe once a day or once a week or they're just set once and then not changed ever again. With LLMs, there's this ongoing value alignment or collection of preferences from users that is often constantly updating the model. And so that opens up all sorts of vectors for, I won't say attack, but for problems to arise in production. Like users might learn to use your system in a different way and thus change the way those preferences are getting collected and thus change your system in ways that you never intended. So maybe that went through governance already internally at the company and now it's totally, totally changed and it's through no fault of your own, but you need to be watching over that for sure. >> Talk about the reinforced learnings from human feedback. How's that factoring in to the LLMs? Is that part of it? Should people be thinking about that? Is that a component that's important? >> It certainly is, yeah. So this is one of the big tweaks that happened with InstructGPT, which is the basis model behind ChatGPT and has since gone on to be used all over the place. So value alignment I think is through RLHF like you mentioned is a very interesting space to get into and it's one that you need to watch over. Like, you're asking humans for feedback over outputs from a model and then you're updating the model with respect to that human feedback. And now you've thrown humans into the loop here in a way that is just going to complicate things. And it certainly helps in many ways. You can ask humans to, let's say that you're deploying an internal chat bot at an enterprise, you could ask humans to align that LLM behind the chatbot to, say company values. And so you're listening feedback about these company values and that's going to scoot that chatbot that you're running internally more toward the kind of language that you'd like to use internally on like a Slack channel or something like that. Watching over that model I think in that specific case, that's a compliance and HR issue as well. So while it is part of the greater LLM stack, you can also view that as an independent bit to watch over. >> Got it, and these are important factors. When people see the Bing news, they freak out how it's doing great. Then it goes off the rails, it goes big, fails big. (laughing) So these models people see that, is that human interaction or is that feedback, is that not accepting it or how do people understand how to take that input in and how to build the right apps around LLMs? This is a tough question. >> Yeah, for sure. So some of the examples that you'll see online where these chatbots go off the rails are obviously humans trying to break the system, but some of them clearly aren't. And that's because these are large statistical models and we don't know what's going to pop out of them all the time. And even if you're doing as much in-house testing at the big companies like the Go-HERE's and the OpenAI's of the world, to try to prevent things like toxicity or racism or other sorts of bad content that might lead to bad pr, you're never going to catch all of these possible holes in the model itself. And so, again, it's very, very important to keep watching over that while it's in production. >> On the business model side, how are you guys doing? What's the approach? How do you guys engage with customers? Take a minute to explain the customer engagement. What do they need? What do you need? How's that work? >> Yeah, I can talk a little bit about that. So it's really easy to get started. It's literally a matter of like just handing out an API key and people can get started. And so we also offer alternative, we also offer versions that can be installed on-prem for models that, we find a lot of our customers have models that deal with very sensitive data. So you can run it in your cloud account or use our cloud version. And so yeah, it's pretty easy to get started with this stuff. We find people start using it a lot of times during the validation phase 'cause that way they can start baselining performance models, they can do champion challenger, they can really kind of baseline the performance of, maybe they're considering different foundation models. And so it's a really helpful tool for understanding differences in the way these models perform. And then from there they can just flow that into their production inferencing, so that as these systems are out there, you have really kind of real time monitoring for anomalies and for all sorts of weird behaviors as well as that continuous feedback loop that helps you make make your product get better and observability and you can run all sorts of aggregated reports to really understand what's going on with these models when they're out there deciding. I should also add that we just today have another way to adopt Arthur and that is we are in the AWS marketplace, and so we are available there just to make it that much easier to use your cloud credits, skip the procurement process, and get up and running really quickly. >> And that's great 'cause Amazon's got SageMaker, which handles a lot of privacy stuff, all kinds of cool things, or you can get down and dirty. So I got to ask on the next one, production is a big deal, getting stuff into production. What have you guys learned that you could share to folks watching? Is there a cost issue? I got to monitor, obviously you brought that up, we talked about the even reinforcement issues, all these things are happening. What is the big learnings that you could share for people that are going to put these into production to watch out for, to plan for, or be prepared for, hope for the best plan for the worst? What's your advice? >> I can give a couple opinions there and I'm sure Adam has. Well, yeah, the big one from my side is, again, I had mentioned this earlier, it's just the input data streams because humans are also exploring how they can use these systems to begin with. It's really, really hard to predict the type of inputs you're going to be seeing in production. Especially, we always talk about chatbots, but then any generative text tasks like this, let's say you're taking in news articles and summarizing them or something like that, it's very hard to get a good sampling even of the set of news articles in such a way that you can really predict what's going to pop out of that model. So to me, it's, adversarial maybe isn't the word that I would use, but it's an unnatural shifting input distribution of like prompts that you might see for these models. That's certainly one. And then the second one that I would talk about is, it can be hard to understand the costs, the inference time costs behind these LLMs. So the pricing on these is always changing as the models change size, it might go up, it might go down based on model size, based on energy cost and so on, but your pricing per token or per a thousand tokens and that I think can be difficult for some clients to wrap their head around. Again, you don't know how these systems are going to be used after all so it can be tough. And so again that's another metric that really should be tracked. >> Yeah, and there's a lot of trade off choices in there with like, how many tokens do you want at each step and in the sequence and based on, you have (indistinct) and you reject these tokens and so based on how your system's operating, that can make the cost highly variable. And that's if you're using like an API version that you're paying per token. A lot of people also choose to run these internally and as John mentioned, the inference time on these is significantly higher than a traditional classifi, even NLP classification model or tabular data model, like orders of magnitude higher. And so you really need to understand how that, as you're constantly iterating on these models and putting out new versions and new features in these models, how that's affecting the overall scale of that inference cost because you can use a lot of computing power very quickly with these profits. >> Yeah, scale, performance, price all come together. I got to ask while we're here on the secret sauce of the company, if you had to describe to people out there watching, what's the secret sauce of the company? What's the key to your success? >> Yeah, so John leads our research team and they've had a number of really cool, I think AI as much as it's been hyped for a while, it's still commercial AI at least is really in its infancy. And so the way we're able to pioneer new ways to think about performance for computer vision NLP LLMs is probably the thing that I'm proudest about. John and his team publish papers all the time at Navs and other places. But I think it's really being able to define what performance means for basically any kind of model type and give people really powerful tools to understand that on an ongoing basis. >> John, secret sauce, how would you describe it? You got all the action happening all around you. >> Yeah, well I going to appreciate Adam talking me up like that. No, I. (all laughing) >> Furrier: Robs to you. >> I would also say a couple of other things here. So we have a very strong engineering team and so I think some early hires there really set the standard at a very high bar that we've maintained as we've grown. And I think that's really paid dividends as scalabilities become even more of a challenge in these spaces, right? And so that's not just scalability when it comes to LLMs, that's scalability when it comes to millions of inferences per day, that kind of thing as well in traditional ML models. And I think that's compared to potential competitors, that's really... Well, it's made us able to just operate more efficiently and pass that along to the client. >> Yeah, and I think the infancy comment is really important because it's the beginning. You really is a long journey ahead. A lot of change coming, like I said, it's a huge wave. So I'm sure you guys got a lot of plannings at the foundation even for your own company, so I appreciate the candid response there. Final question for you guys is, what should the top things be for a company in 2023? If I'm going to set the agenda and I'm a customer moving forward, putting the pedal to the metal, so to speak, what are the top things I should be prioritizing or I need to do to be successful with AI in 2023? >> Yeah, I think, so number one, as we talked about, we've been talking about this entire episode, the things are changing so quickly and the opportunities for business transformation and really disrupting different applications, different use cases, is almost, I don't think we've even fully comprehended how big it is. And so really digging in to your business and understanding where I can apply these new sets of foundation models is, that's a top priority. The interesting thing is I think there's another force at play, which is the macroeconomic conditions and a lot of places are, they're having to work harder to justify budgets. So in the past, couple years ago maybe, they had a blank check to spend on AI and AI development at a lot of large enterprises that was limited primarily by the amount of talent they could scoop up. Nowadays these expenditures are getting scrutinized more. And so one of the things that we really help our customers with is like really calculating the ROI on these things. And so if you have models out there performing and you have a new version that you can put out that lifts the performance by 3%, how many tens of millions of dollars does that mean in business benefit? Or if I want to go to get approval from the CFO to spend a few million dollars on this new project, how can I bake in from the beginning the tools to really show the ROI along the way? Because I think in these systems when done well for a software project, the ROI can be like pretty spectacular. Like we see over a hundred percent ROI in the first year on some of these projects. And so, I think in 2023, you just need to be able to show what you're getting for that spend. >> It's a needle moving moment. You see it all the time with some of these aha moments or like, whoa, blown away. John, I want to get your thoughts on this because one of the things that comes up a lot for companies that I talked to, that are on my second wave, I would say coming in, maybe not, maybe the front wave of adopters is talent and team building. You mentioned some of the hires you got were game changing for you guys and set the bar high. As you move the needle, new developers going to need to come in. What's your advice given that you've been a professor, you've seen students, I know a lot of computer science people want to shift, they might not be yet skilled in AI, but they're proficient in programming, is that's going to be another opportunity with open source when things are happening. How do you talk to that next level of talent that wants to come in to this market to supplement teams and be on teams, lead teams? Any advice you have for people who want to build their teams and people who are out there and want to be a coder in AI? >> Yeah, I've advice, and this actually works for what it would take to be a successful AI company in 2023 as well, which is, just don't be afraid to iterate really quickly with these tools. The space is still being explored on what they can be used for. A lot of the tasks that they're used for now right? like creating marketing content using a machine learning is not a new thing to do. It just works really well now. And so I'm excited to see what the next year brings in terms of folks from outside of core computer science who are, other engineers or physicists or chemists or whatever who are learning how to use these increasingly easy to use tools to leverage LLMs for tasks that I think none of us have really thought about before. So that's really, really exciting. And so toward that I would say iterate quickly. Build things on your own, build demos, show them the friends, host them online and you'll learn along the way and you'll have somebody to show for it. And also you'll help us explore that space. >> Guys, congratulations with Arthur. Great company, great picks and shovels opportunities out there for everybody. Iterate fast, get in quickly and don't be afraid to iterate. Great advice and thank you for coming on and being part of the AWS showcase, thanks. >> Yeah, thanks for having us on John. Always a pleasure. >> Yeah, great stuff. Adam Wenchel, John Dickerson with Arthur. Thanks for coming on theCUBE. I'm John Furrier, your host. Generative AI and AWS. Keep it right there for more action with theCUBE. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startup Showcase has opened the eyes to everybody and the demos we get of them, but the change, the acceleration, And in the next 12 months, of the equivalent of the printing press and how quickly you can accelerate As people come into the field, aspects of the LLM ecosystem. and that the language models in seconds if you want. and talk about the company. of the life cycle. in the 2018 20 19 realm I got to ask you now, next step, in the broader business world So in the use case, as a the way users interact with these models, How's that factoring in to that LLM behind the chatbot and how to build the Go-HERE's and the OpenAI's What's the approach? differences in the way that are going to put So the pricing on these is always changing and in the sequence What's the key to your success? And so the way we're able to You got all the action Yeah, well I going to appreciate Adam and pass that along to the client. so I appreciate the candid response there. get approval from the CFO to spend You see it all the time with some of A lot of the tasks that and being part of the Yeah, thanks for having us Generative AI and AWS.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

Adam WenchelPERSON

0.99+

AmazonORGANIZATION

0.99+

AdamPERSON

0.99+

John FurrierPERSON

0.99+

twoQUANTITY

0.99+

John DickersonPERSON

0.99+

2016DATE

0.99+

2018DATE

0.99+

2023DATE

0.99+

3%QUANTITY

0.99+

2017DATE

0.99+

Capital OneORGANIZATION

0.99+

last weekDATE

0.99+

AWSORGANIZATION

0.99+

ArthurPERSON

0.99+

PythonTITLE

0.99+

millionsQUANTITY

0.99+

TwoQUANTITY

0.99+

each stepQUANTITY

0.99+

2018 20 19DATE

0.99+

two schoolsQUANTITY

0.99+

couple years agoDATE

0.99+

once a weekQUANTITY

0.99+

oneQUANTITY

0.98+

first yearQUANTITY

0.98+

SwamiPERSON

0.98+

four years agoDATE

0.98+

fourDATE

0.98+

first timeQUANTITY

0.98+

ArthurORGANIZATION

0.98+

two great guestsQUANTITY

0.98+

next yearDATE

0.98+

once a dayQUANTITY

0.98+

six weeksQUANTITY

0.97+

10 years agoDATE

0.97+

ChatGPTTITLE

0.97+

second oneQUANTITY

0.96+

three peopleQUANTITY

0.96+

frontEVENT

0.95+

second waveEVENT

0.95+

JanuaryDATE

0.95+

hundreds of millions of dollarsQUANTITY

0.95+

five years agoDATE

0.94+

about a month agoDATE

0.94+

tens of millionsQUANTITY

0.93+

todayDATE

0.92+

next 12 monthsDATE

0.91+

LangChainORGANIZATION

0.91+

over a hundred percentQUANTITY

0.91+

million dollarsQUANTITY

0.89+

millions of inferencesQUANTITY

0.89+

theCUBEORGANIZATION

0.88+

Steven Hillion & Jeff Fletcher, Astronomer | AWS Startup Showcase S3E1


 

(upbeat music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase AI/ML Top Startups Building Foundation Model Infrastructure. This is season three, episode one of our ongoing series covering exciting startups from the AWS ecosystem to talk about data and analytics. I'm your host, Lisa Martin and today we're excited to be joined by two guests from Astronomer. Steven Hillion joins us, it's Chief Data Officer and Jeff Fletcher, it's director of ML. They're here to talk about machine learning and data orchestration. Guys, thank you so much for joining us today. >> Thank you. >> It's great to be here. >> Before we get into machine learning let's give the audience an overview of Astronomer. Talk about what that is, Steven. Talk about what you mean by data orchestration. >> Yeah, let's start with Astronomer. We're the Airflow company basically. The commercial developer behind the open-source project, Apache Airflow. I don't know if you've heard of Airflow. It's sort of de-facto standard these days for orchestrating data pipelines, data engineering pipelines, and as we'll talk about later, machine learning pipelines. It's really is the de-facto standard. I think we're up to about 12 million downloads a month. That's actually as a open-source project. I think at this point it's more popular by some measures than Slack. Airflow was created by Airbnb some years ago to manage all of their data pipelines and manage all of their workflows and now it powers the data ecosystem for organizations as diverse as Electronic Arts, Conde Nast is one of our big customers, a big user of Airflow. And also not to mention the biggest banks on Wall Street use Airflow and Astronomer to power the flow of data throughout their organizations. >> Talk about that a little bit more, Steven, in terms of the business impact. You mentioned some great customer names there. What is the business impact or outcomes that a data orchestration strategy enables businesses to achieve? >> Yeah, I mean, at the heart of it is quite simply, scheduling and managing data pipelines. And so if you have some enormous retailer who's managing the flow of information throughout their organization they may literally have thousands or even tens of thousands of data pipelines that need to execute every day to do things as simple as delivering metrics for the executives to consume at the end of the day, to producing on a weekly basis new machine learning models that can be used to drive product recommendations. One of our customers, for example, is a British food delivery service. And you get those recommendations in your application that says, "Well, maybe you want to have samosas with your curry." That sort of thing is powered by machine learning models that they train on a regular basis to reflect changing conditions in the market. And those are produced through Airflow and through the Astronomer platform, which is essentially a managed platform for running airflow. So at its simplest it really is just scheduling and managing those workflows. But that's easier said than done of course. I mean if you have 10 thousands of those things then you need to make sure that they all run that they all have sufficient compute resources. If things fail, how do you track those down across those 10,000 workflows? How easy is it for an average data scientist or data engineer to contribute their code, their Python notebooks or their SQL code into a production environment? And then you've got reproducibility, governance, auditing, like managing data flows across an organization which we think of as orchestrating them is much more than just scheduling. It becomes really complicated pretty quickly. >> I imagine there's a fair amount of complexity there. Jeff, let's bring you into the conversation. Talk a little bit about Astronomer through your lens, data orchestration and how it applies to MLOps. >> So I come from a machine learning background and for me the interesting part is that machine learning requires the expansion into orchestration. A lot of the same things that you're using to go and develop and build pipelines in a standard data orchestration space applies equally well in a machine learning orchestration space. What you're doing is you're moving data between different locations, between different tools, and then tasking different types of tools to act on that data. So extending it made logical sense from a implementation perspective. And a lot of my focus at Astronomer is really to explain how Airflow can be used well in a machine learning context. It is being used well, it is being used a lot by the customers that we have and also by users of the open source version. But it's really being able to explain to people why it's a natural extension for it and how well it fits into that. And a lot of it is also extending some of the infrastructure capabilities that Astronomer provides to those customers for them to be able to run some of the more platform specific requirements that come with doing machine learning pipelines. >> Let's get into some of the things that make Astronomer unique. Jeff, sticking with you, when you're in customer conversations, what are some of the key differentiators that you articulate to customers? >> So a lot of it is that we are not specific to one cloud provider. So we have the ability to operate across all of the big cloud providers. I know, I'm certain we have the best developers that understand how best practices implementations for data orchestration works. So we spend a lot of time talking to not just the business outcomes and the business users of the product, but also also for the technical people, how to help them better implement things that they may have come across on a Stack Overflow article or not necessarily just grown with how the product has migrated. So it's the ability to run it wherever you need to run it and also our ability to help you, the customer, better implement and understand those workflows that I think are two of the primary differentiators that we have. >> Lisa: Got it. >> I'll add another one if you don't mind. >> You can go ahead, Steven. >> Is lineage and dependencies between workflows. One thing we've done is to augment core Airflow with Lineage services. So using the Open Lineage framework, another open source framework for tracking datasets as they move from one workflow to another one, team to another, one data source to another is a really key component of what we do and we bundle that within the service so that as a developer or as a production engineer, you really don't have to worry about lineage, it just happens. Jeff, may show us some of this later that you can actually see as data flows from source through to a data warehouse out through a Python notebook to produce a predictive model or a dashboard. Can you see how those data products relate to each other? And when something goes wrong, figure out what upstream maybe caused the problem, or if you're about to change something, figure out what the impact is going to be on the rest of the organization. So Lineage is a big deal for us. >> Got it. >> And just to add on to that, the other thing to think about is that traditional Airflow is actually a complicated implementation. It required quite a lot of time spent understanding or was almost a bespoke language that you needed to be able to develop in two write these DAGs, which is like fundamental pipelines. So part of what we are focusing on is tooling that makes it more accessible to say a data analyst or a data scientist who doesn't have or really needs to gain the necessary background in how the semantics of Airflow DAGs works to still be able to get the benefit of what Airflow can do. So there is new features and capabilities built into the astronomer cloud platform that effectively obfuscates and removes the need to understand some of the deep work that goes on. But you can still do it, you still have that capability, but we are expanding it to be able to have orchestrated and repeatable processes accessible to more teams within the business. >> In terms of accessibility to more teams in the business. You talked about data scientists, data analysts, developers. Steven, I want to talk to you, as the chief data officer, are you having more and more conversations with that role and how is it emerging and evolving within your customer base? >> Hmm. That's a good question, and it is evolving because I think if you look historically at the way that Airflow has been used it's often from the ground up. You have individual data engineers or maybe single data engineering teams who adopt Airflow 'cause it's very popular. Lots of people know how to use it and they bring it into an organization and say, "Hey, let's use this to run our data pipelines." But then increasingly as you turn from pure workflow management and job scheduling to the larger topic of orchestration you realize it gets pretty complicated, you want to have coordination across teams, and you want to have standardization for the way that you manage your data pipelines. And so having a managed service for Airflow that exists in the cloud is easy to spin up as you expand usage across the organization. And thinking long term about that in the context of orchestration that's where I think the chief data officer or the head of analytics tends to get involved because they really want to think of this as a strategic investment that they're making. Not just per team individual Airflow deployments, but a network of data orchestrators. >> That network is key. Every company these days has to be a data company. We talk about companies being data driven. It's a common word, but it's true. It's whether it is a grocer or a bank or a hospital, they've got to be data companies. So talk to me a little bit about Astronomer's business model. How is this available? How do customers get their hands on it? >> Jeff, go ahead. >> Yeah, yeah. So we have a managed cloud service and we have two modes of operation. One, you can bring your own cloud infrastructure. So you can say here is an account in say, AWS or Azure and we can go and deploy the necessary infrastructure into that, or alternatively we can host everything for you. So it becomes a full SaaS offering. But we then provide a platform that connects at the backend to your internal IDP process. So however you are authenticating users to make sure that the correct people are accessing the services that they need with role-based access control. From there we are deploying through Kubernetes, the different services and capabilities into either your cloud account or into an account that we host. And from there Airflow does what Airflow does, which is its ability to then reach to different data systems and data platforms and to then run the orchestration. We make sure we do it securely, we have all the necessary compliance certifications required for GDPR in Europe and HIPAA based out of the US, and a whole bunch host of others. So it is a secure platform that can run in a place that you need it to run, but it is a managed Airflow that includes a lot of the extra capabilities like the cloud developer environment and the open lineage services to enhance the overall airflow experience. >> Enhance the overall experience. So Steven, going back to you, if I'm a Conde Nast or another organization, what are some of the key business outcomes that I can expect? As one of the things I think we've learned during the pandemic is access to realtime data is no longer a nice to have for organizations. It's really an imperative. It's that demanding consumer that wants to have that personalized, customized, instant access to a product or a service. So if I'm a Conde Nast or I'm one of your customers, what can I expect my business to be able to achieve as a result of data orchestration? >> Yeah, I think in a nutshell it's about providing a reliable, scalable, and easy to use service for developing and running data workflows. And talking of demanding customers, I mean, I'm actually a customer myself, as you mentioned, I'm the head of data for Astronomer. You won't be surprised to hear that we actually use Astronomer and Airflow to run all of our data pipelines. And so I can actually talk about my experience. When I started I was of course familiar with Airflow, but it always seemed a little bit unapproachable to me if I was introducing that to a new team of data scientists. They don't necessarily want to have to think about learning something new. But I think because of the layers that Astronomer has provided with our Astro service around Airflow it was pretty easy for me to get up and running. Of course I've got an incentive for doing that. I work for the Airflow company, but we went from about, at the beginning of last year, about 500 data tasks that we were running on a daily basis to about 15,000 every day. We run something like a million data operations every month within my team. And so as one outcome, just the ability to spin up new production workflows essentially in a single day you go from an idea in the morning to a new dashboard or a new model in the afternoon, that's really the business outcome is just removing that friction to operationalizing your machine learning and data workflows. >> And I imagine too, oh, go ahead, Jeff. >> Yeah, I think to add to that, one of the things that becomes part of the business cycle is a repeatable capabilities for things like reporting, for things like new machine learning models. And the impediment that has existed is that it's difficult to take that from a team that's an analyst team who then provide that or a data science team that then provide that to the data engineering team who have to work the workflow all the way through. What we're trying to unlock is the ability for those teams to directly get access to scheduling and orchestrating capabilities so that a business analyst can have a new report for C-suite execs that needs to be done once a week, but the time to repeatability for that report is much shorter. So it is then immediately in the hands of the person that needs to see it. It doesn't have to go into a long list of to-dos for a data engineering team that's already overworked that they eventually get it to it in a month's time. So that is also a part of it is that the realizing, orchestration I think is fairly well and a lot of people get the benefit of being able to orchestrate things within a business, but it's having more people be able to do it and shorten the time that that repeatability is there is one of the main benefits from good managed orchestration. >> So a lot of workforce productivity improvements in what you're doing to simplify things, giving more people access to data to be able to make those faster decisions, which ultimately helps the end user on the other end to get that product or the service that they're expecting like that. Jeff, I understand you have a demo that you can share so we can kind of dig into this. >> Yeah, let me take you through a quick look of how the whole thing works. So our starting point is our cloud infrastructure. This is the login. You go to the portal. You can see there's a a bunch of workspaces that are available. Workspaces are like individual places for people to operate in. I'm not going to delve into all the deep technical details here, but starting point for a lot of our data science customers is we have what we call our Cloud IDE, which is a web-based development environment for writing and building out DAGs without actually having to know how the underpinnings of Airflow work. This is an internal one, something that we use. You have a notebook-like interface that lets you write python code and SQL code and a bunch of specific bespoke type of blocks if you want. They all get pulled together and create a workflow. So this is a workflow, which gets compiled to something that looks like a complicated set of Python code, which is the DAG. I then have a CICD process pipeline where I commit this through to my GitHub repo. So this comes to a repo here, which is where these DAGs that I created in the previous step exist. I can then go and say, all right, I want to see how those particular DAGs have been running. We then get to the actual Airflow part. So this is the managed Airflow component. So we add the ability for teams to fairly easily bring up an Airflow instance and write code inside our notebook-like environment to get it into that instance. So you can see it's been running. That same process that we built here that graph ends up here inside this, but you don't need to know how the fundamentals of Airflow work in order to get this going. Then we can run one of these, it runs in the background and we can manage how it goes. And from there, every time this runs, it's emitting to a process underneath, which is the open lineage service, which is the lineage integration that allows me to come in here and have a look and see this was that actual, that same graph that we built, but now it's the historic version. So I know where things started, where things are going, and how it ran. And then I can also do a comparison. So if I want to see how this particular run worked compared to one historically, I can grab one from a previous date and it will show me the comparison between the two. So that combination of managed Airflow, getting Airflow up and running very quickly, but the Cloud IDE that lets you write code and know how to get something into a repeatable format get that into Airflow and have that attached to the lineage process adds what is a complete end-to-end orchestration process for any business looking to get the benefit from orchestration. >> Outstanding. Thank you so much Jeff for digging into that. So one of my last questions, Steven is for you. This is exciting. There's a lot that you guys are enabling organizations to achieve here to really become data-driven companies. So where can folks go to get their hands on this? >> Yeah, just go to astronomer.io and we have plenty of resources. If you're new to Airflow, you can read our documentation, our guides to getting started. We have a CLI that you can download that is really I think the easiest way to get started with Airflow. But you can actually sign up for a trial. You can sign up for a guided trial where our teams, we have a team of experts, really the world experts on getting Airflow up and running. And they'll take you through that trial and allow you to actually kick the tires and see how this works with your data. And I think you'll see pretty quickly that it's very easy to get started with Airflow, whether you're doing that from the command line or doing that in our cloud service. And all of that is available on our website >> astronomer.io. Jeff, last question for you. What are you excited about? There's so much going on here. What are some of the things, maybe you can give us a sneak peek coming down the road here that prospects and existing customers should be excited about? >> I think a lot of the development around the data awareness components, so one of the things that's traditionally been complicated with orchestration is you leave your data in the place that you're operating on and we're starting to have more data processing capability being built into Airflow. And from a Astronomer perspective, we are adding more capabilities around working with larger datasets, doing bigger data manipulation with inside the Airflow process itself. And that lends itself to better machine learning implementation. So as we start to grow and as we start to get better in the machine learning context, well, in the data awareness context, it unlocks a lot more capability to do and implement proper machine learning pipelines. >> Awesome guys. Exciting stuff. Thank you so much for talking to me about Astronomer, machine learning, data orchestration, and really the value in it for your customers. Steve and Jeff, we appreciate your time. >> Thank you. >> My pleasure, thanks. >> And we thank you for watching. This is season three, episode one of our ongoing series covering exciting startups from the AWS ecosystem. I'm your host, Lisa Martin. You're watching theCUBE, the leader in live tech coverage. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startup Showcase let's give the audience and now it powers the data ecosystem What is the business impact or outcomes for the executives to consume how it applies to MLOps. and for me the interesting that you articulate to customers? So it's the ability to run it if you don't mind. that you can actually see as data flows the other thing to think about to more teams in the business. about that in the context of orchestration So talk to me a little bit at the backend to your So Steven, going back to you, just the ability to spin up but the time to repeatability a demo that you can share that allows me to come There's a lot that you guys We have a CLI that you can download What are some of the things, in the place that you're operating on and really the value in And we thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JeffPERSON

0.99+

Lisa MartinPERSON

0.99+

Jeff FletcherPERSON

0.99+

StevenPERSON

0.99+

StevePERSON

0.99+

Steven HillionPERSON

0.99+

LisaPERSON

0.99+

EuropeLOCATION

0.99+

Conde NastORGANIZATION

0.99+

USLOCATION

0.99+

thousandsQUANTITY

0.99+

twoQUANTITY

0.99+

HIPAATITLE

0.99+

AWSORGANIZATION

0.99+

two guestsQUANTITY

0.99+

AirflowORGANIZATION

0.99+

AirbnbORGANIZATION

0.99+

10 thousandsQUANTITY

0.99+

OneQUANTITY

0.99+

Electronic ArtsORGANIZATION

0.99+

oneQUANTITY

0.99+

PythonTITLE

0.99+

two modesQUANTITY

0.99+

AirflowTITLE

0.98+

10,000 workflowsQUANTITY

0.98+

about 500 data tasksQUANTITY

0.98+

todayDATE

0.98+

one outcomeQUANTITY

0.98+

tens of thousandsQUANTITY

0.98+

GDPRTITLE

0.97+

SQLTITLE

0.97+

GitHubORGANIZATION

0.96+

astronomer.ioOTHER

0.94+

SlackORGANIZATION

0.94+

AstronomerORGANIZATION

0.94+

some years agoDATE

0.92+

once a weekQUANTITY

0.92+

AstronomerTITLE

0.92+

theCUBEORGANIZATION

0.92+

last yearDATE

0.91+

KubernetesTITLE

0.88+

single dayQUANTITY

0.87+

about 15,000 every dayQUANTITY

0.87+

one cloudQUANTITY

0.86+

IDETITLE

0.86+

Gabriela de Queiroz, Microsoft | WiDS 2023


 

(upbeat music) >> Welcome back to theCUBE's coverage of Women in Data Science 2023 live from Stanford University. This is Lisa Martin. My co-host is Tracy Yuan. We're excited to be having great conversations all day but you know, 'cause you've been watching. We've been interviewing some very inspiring women and some men as well, talking about all of the amazing applications of data science. You're not going to want to miss this next conversation. Our guest is Gabriela de Queiroz, Principal Cloud Advocate Manager of Microsoft. Welcome, Gabriela. We're excited to have you. >> Thank you very much. I'm so excited to be talking to you. >> Yeah, you're on theCUBE. >> Yeah, finally. (Lisa laughing) Like a dream come true. (laughs) >> I know and we love that. We're so thrilled to have you. So you have a ton of experience in the data space. I was doing some research on you. You've worked in software, financial advertisement, health. Talk to us a little bit about you. What's your background in? >> So I was trained in statistics. So I'm a statistician and then I worked in epidemiology. I worked with air pollution and public health. So I was a researcher before moving into the industry. So as I was talking today, the weekly paths, it's exactly who I am. I went back and forth and back and forth and stopped and tried something else until I figured out that I want to do data science and that I want to do different things because with data science we can... The beauty of data science is that you can move across domains. So I worked in healthcare, financial, and then different technology companies. >> Well the nice thing, one of the exciting things that data science, that I geek out about and Tracy knows 'cause we've been talking about this all day, it's just all the different, to your point, diverse, pun intended, applications of data science. You know, this morning we were talking about, we had the VP of data science from Meta as a keynote. She came to theCUBE talking and really kind of explaining from a content perspective, from a monetization perspective, and of course so many people in the world are users of Facebook. It makes it tangible. But we also heard today conversations about the applications of data science in police violence, in climate change. We're in California, we're expecting a massive rainstorm and we don't know what to do when it rains or snows. But climate change is real. Everyone's talking about it, and there's data science at its foundation. That's one of the things that I love. But you also have a lot of experience building diverse teams. Talk a little bit about that. You've created some very sophisticated data science solutions. Talk about your recommendation to others to build diverse teams. What's in it for them? And maybe share some data science project or two that you really found inspirational. >> Yeah, absolutely. So I do love building teams. Every time I'm given the task of building teams, I feel the luckiest person in the world because you have the option to pick like different backgrounds and all the diverse set of like people that you can find. I don't think it's easy, like people say, yeah, it's very hard. You have to be intentional. You have to go from the very first part when you are writing the job description through the interview process. So you have to be very intentional in every step. And you have to think through when you are doing that. And I love, like my last team, we had like 10 people and we were so diverse. Like just talking about languages. We had like 15 languages inside a team. So how beautiful it is. Like all different backgrounds, like myself as a statistician, but we had people from engineering background, biology, languages, and so on. So it's, yeah, like every time thinking about building a team, if you wanted your team to be diverse, you need to be intentional. >> I'm so glad you brought up that intention point because that is the fundamental requirement really is to build it with intention. >> Exactly, and I love to hear like how there's different languages. So like I'm assuming, or like different backgrounds, I'm assuming everybody just zig zags their way into the team and now you're all women in data science and I think that's so precious. >> Exactly. And not only woman, right. >> Tracy: Not only woman, you're right. >> The team was diverse not only in terms of like gender, but like background, ethnicity, and spoken languages, and language that they use to program and backgrounds. Like as I mentioned, not everybody did the statistics in school or computer science. And it was like one of my best teams was when we had this combination also like things that I'm good at the other person is not as good and we have this knowledge sharing all the time. Every day I would feel like I'm learning something. In a small talk or if I was reviewing something, there was always something new because of like the richness of the diverse set of people that were in your team. >> Well what you've done is so impressive, because not only have you been intentional with it, but you sound like the hallmark of a great leader of someone who hires and builds teams to fill gaps. They don't have to know less than I do for me to be the leader. They have to have different skills, different areas of expertise. That is really, honestly Gabriela, that's the hallmark of a great leader. And that's not easy to come by. So tell me, who were some of your mentors and sponsors along the way that maybe influenced you in that direction? Or is that just who you are? >> That's a great question. And I joke that I want to be the role model that I never had, right. So growing up, I didn't have anyone that I could see other than my mom probably or my sister. But there was no one that I could see, I want to become that person one day. And once I was tracing my path, I started to see people looking at me and like, you inspire me so much, and I'm like, oh wow, this is amazing and I want to do do this over and over and over again. So I want to be that person to inspire others. And no matter, like I'll be like a VP, CEO, whoever, you know, I want to be, I want to keep inspiring people because that's so valuable. >> Lisa: Oh, that's huge. >> And I feel like when we grow professionally and then go to the next level, we sometimes we lose that, you know, thing that's essential. And I think also like, it's part of who I am as I was building and all my experiences as I was going through, I became what I mentioned is unique person that I think we all are unique somehow. >> You're a rockstar. Isn't she a rockstar? >> You dropping quotes out. >> I'm loving this. I'm like, I've inspired Gabriela. (Gabriela laughing) >> Oh my God. But yeah, 'cause we were asking our other guests about the same question, like, who are your role models? And then we're talking about how like it's very important for women to see that there is a representation, that there is someone they look up to and they want to be. And so that like, it motivates them to stay in this field and to start in this field to begin with. So yeah, I think like you are definitely filling a void and for all these women who dream to be in data science. And I think that's just amazing. >> And you're a founder too. In 2012, you founded R Ladies. Talk a little bit about that. This is present in more than 200 cities in 55 plus countries. Talk about R Ladies and maybe the catalyst to launch it. >> Yes, so you always start, so I'm from Brazil, I always talk about this because it's such, again, I grew up over there. So I was there my whole life and then I moved to here, Silicon Valley. And when I moved to San Francisco, like the doors opened. So many things happening in the city. That was back in 2012. Data science was exploding. And I found out something about Meetup.com, it's a website that you can join and go in all these events. And I was going to this event and I joke that it was kind of like going to the Disneyland, where you don't know if I should go that direction or the other direction. >> Yeah, yeah. >> And I was like, should I go and learn about data visualization? Should I go and learn about SQL or should I go and learn about Hadoop, right? So I would go every day to those meetups. And I was a student back then, so you know, the budget was very restricted as a student. So we don't have much to spend. And then they would serve dinner and you would learn for free. And then I got to a point where I was like, hey, they are doing all of this as a volunteer. Like they are running this meetup and events for free. And I felt like it's a cycle. I need to do something, right. I'm taking all this in. I'm having this huge opportunity to be here. I want to give back. So that's what how everything started. I was like, no, I have to think about something. I need to think about something that I can give back. And I was using R back then and I'm like how about I do something with R. I love R, I'm so passionate about R, what about if I create a community around R but not a regular community, because by going to this events, I felt that as a Latina and as a woman, I was always in the corner and I was not being able to participate and to, you know, be myself and to network and ask questions. I would be in the corner. So I said to myself, what about if I do something where everybody feel included, where everybody can participate, can share, can ask questions without judgment? So that's how R ladies all came together. >> That's awesome. >> Talk about intentions, like you have to, you had that go in mind, but yeah, I wanted to dive a little bit into R. So could you please talk more about where did the passion for R come from, and like how did the special connection between you and R the language, like born, how did that come from? >> It was not a love at first sight. >> No. >> Not at all. Not at all. Because that was back in Brazil. So all the documentation were in English, all the tutorials, only two. We had like very few tutorials. It was not like nowadays that we have so many tutorials and courses. There were like two tutorials, other documentation in English. So it's was hard for me like as someone that didn't know much English to go through the language and then to learn to program was not easy task. But then as I was going through the language and learning and reading books and finding the people behind the language, I don't know how I felt in love. And then when I came to to San Francisco, I saw some of like the main contributors who are speaking in person and I'm like, wow, they are like humans. I don't know, it was like, I have no idea why I had this love. But I think the the people and then the community was the thing that kept me with the R language. >> Yeah, the community factors is so important. And it's so, at WIDS it's so palpable. I mean I literally walk in the door, every WIDS I've done, I think I've been doing them for theCUBE since 2017. theCUBE has been here since the beginning in 2015 with our co-founders. But you walk in, you get this sense of belonging. And this sense of I can do anything, why not? Why not me? Look at her up there, and now look at you speaking in the technical talk today on theCUBE. So inspiring. One of the things that I always think is you can't be what you can't see. We need to be able to see more people that look like you and sound like you and like me and like you as well. And WIDS gives us that opportunity, which is fantastic, but it's also helping to move the needle, really. And I was looking at some of the Anitab.org stats just yesterday about 2022. And they're showing, you know, the percentage of females in technical roles has been hovering around 25% for a while. It's a little higher now. I think it's 27.6 according to any to Anitab. We're seeing more women hired in roles. But what are the challenges, and I would love to get your advice on this, for those that might be in this situation is attrition, women who are leaving roles. What would your advice be to a woman who might be trying to navigate family and work and career ladder to stay in that role and keep pushing forward? >> I'll go back to the community. If you don't have a community around you, it's so hard to navigate. >> That's a great point. >> You are lonely. There is no one that you can bounce ideas off, that you can share what you are feeling or like that you can learn as well. So sometimes you feel like you are the only person that is going through that problem or like, you maybe have a family or you are planning to have a family and you have to make a decision. But you've never seen anyone going through this. So when you have a community, you see people like you, right. So that's where we were saying about having different people and people like you so they can share as well. And you feel like, oh yeah, so they went through this, they succeed. I can also go through this and succeed. So I think the attrition problem is still big problem. And I'm sure will be worse now with everything that is happening in Tech with layoffs. >> Yes and the great resignation. >> Yeah. >> We are going back, you know, a few steps, like a lot of like advancements that we did. I feel like we are going back unfortunately, but I always tell this, make sure that you have a community. Make sure that you have a mentor. Make sure that you have someone or some people, not only one mentor, different mentors, that can support you through this trajectory. Because it's not easy. But there are a lot of us out there. >> There really are. And that's a great point. I love everything about the community. It's all about that network effect and feeling like you belong- >> That's all WIDS is about. >> Yeah. >> Yes. Absolutely. >> Like coming over here, it's like seeing the old friends again. It's like I'm so glad that I'm coming because I'm all my old friends that I only see like maybe once a year. >> Tracy: Reunion. >> Yeah, exactly. And I feel like that our tank get, you know- >> Lisa: Replenished. >> Exactly. For the rest of the year. >> Yes. >> Oh, that's precious. >> I love that. >> I agree with that. I think one of the things that when I say, you know, you can't see, I think, well, how many females in technology would I be able to recognize? And of course you can be female technology working in the healthcare sector or working in finance or manufacturing, but, you know, we need to be able to have more that we can see and identify. And one of the things that I recently found out, I was telling Tracy this earlier that I geeked out about was finding out that the CTO of Open AI, ChatGPT, is a female. I'm like, (gasps) why aren't we talking about this more? She was profiled on Fast Company. I've seen a few pieces on her, Mira Murati. But we're hearing so much about ChatJTP being... ChatGPT, I always get that wrong, about being like, likening it to the launch of the iPhone, which revolutionized mobile and connectivity. And here we have a female in the technical role. Let's put her on a pedestal because that is hugely inspiring. >> Exactly, like let's bring everybody to the front. >> Yes. >> Right. >> And let's have them talk to us because like, you didn't know. I didn't know probably about this, right. You didn't know. Like, we don't know about this. It's kind of like we are hidden. We need to give them the spotlight. Every woman to give the spotlight, so they can keep aspiring the new generation. >> Or Susan Wojcicki who ran, how long does she run YouTube? All the YouTube influencers that probably have no idea who are influential for whatever they're doing on YouTube in different social platforms that don't realize, do you realize there was a female behind the helm that for a long time that turned it into what it is today? That's outstanding. Why aren't we talking about this more? >> How about Megan Smith, was the first CTO on the Obama administration. >> That's right. I knew it had to do with Obama. Couldn't remember. Yes. Let's let's find more pedestals. But organizations like WIDS, your involvement as a speaker, showing more people you can be this because you can see it, >> Yeah, exactly. is the right direction that will help hopefully bring us back to some of the pre-pandemic levels, and keep moving forward because there's so much potential with data science that can impact everyone's lives. I always think, you know, we have this expectation that we have our mobile phone and we can get whatever we want wherever we are in the world and whatever time of day it is. And that's all data driven. The regular average person that's not in tech thinks about data as a, well I'm paying for it. What's all these data charges? But it's powering the world. It's powering those experiences that we all want as consumers or in our business lives or we expect to be able to do a transaction, whether it's something in a CRM system or an Uber transaction like that, and have the app respond, maybe even know me a little bit better than I know myself. And that's all data. So I think we're just at the precipice of the massive impact that data science will make in our lives. And luckily we have leaders like you who can help navigate us along this path. >> Thank you. >> What advice for, last question for you is advice for those in the audience who might be nervous or maybe lack a little bit of confidence to go I really like data science, or I really like engineering, but I don't see a lot of me out there. What would you say to them? >> Especially for people who are from like a non-linear track where like going onto that track. >> Yeah, I would say keep going. Keep going. I don't think it's easy. It's not easy. But keep going because the more you go the more, again, you advance and there are opportunities out there. Sometimes it takes a little bit, but just keep going. Keep going and following your dreams, that you get there, right. So again, data science, such a broad field that doesn't require you to come from a specific background. And I think the beauty of data science exactly is this is like the combination, the most successful data science teams are the teams that have all these different backgrounds. So if you think that we as data scientists, we started programming when we were nine, that's not true, right. You can be 30, 40, shifting careers, starting to program right now. It doesn't matter. Like you get there no matter how old you are. And no matter what's your background. >> There's no limit. >> There was no limits. >> I love that, Gabriela, >> Thank so much. for inspiring. I know you inspired me. I'm pretty sure you probably inspired Tracy with your story. And sometimes like what you just said, you have to be your own mentor and that's okay. Because eventually you're going to turn into a mentor for many, many others and sounds like you're already paving that path and we so appreciate it. You are now officially a CUBE alumni. >> Yes. Thank you. >> Yay. We've loved having you. Thank you so much for your time. >> Thank you. Thank you. >> For our guest and for Tracy's Yuan, this is Lisa Martin. We are live at WIDS 23, the eighth annual Women in Data Science Conference at Stanford. Stick around. Our next guest joins us in just a few minutes. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

but you know, 'cause you've been watching. I'm so excited to be talking to you. Like a dream come true. So you have a ton of is that you can move across domains. But you also have a lot of like people that you can find. because that is the Exactly, and I love to hear And not only woman, right. that I'm good at the other Or is that just who you are? And I joke that I want And I feel like when You're a rockstar. I'm loving this. So yeah, I think like you the catalyst to launch it. And I was going to this event And I was like, and like how did the special I saw some of like the main more people that look like you If you don't have a community around you, There is no one that you Make sure that you have a mentor. and feeling like you belong- it's like seeing the old friends again. And I feel like that For the rest of the year. And of course you can be everybody to the front. you didn't know. do you realize there was on the Obama administration. because you can see it, I always think, you know, What would you say to them? are from like a non-linear track that doesn't require you to I know you inspired me. you so much for your time. Thank you. the eighth annual Women

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Tracy YuanPERSON

0.99+

Megan SmithPERSON

0.99+

Gabriela de QueirozPERSON

0.99+

Susan WojcickiPERSON

0.99+

GabrielaPERSON

0.99+

Lisa MartinPERSON

0.99+

BrazilLOCATION

0.99+

2015DATE

0.99+

2012DATE

0.99+

San FranciscoLOCATION

0.99+

San FranciscoLOCATION

0.99+

TracyPERSON

0.99+

ObamaPERSON

0.99+

LisaPERSON

0.99+

Mira MuratiPERSON

0.99+

MicrosoftORGANIZATION

0.99+

CaliforniaLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

UberORGANIZATION

0.99+

27.6QUANTITY

0.99+

twoQUANTITY

0.99+

30QUANTITY

0.99+

40QUANTITY

0.99+

15 languagesQUANTITY

0.99+

R LadiesORGANIZATION

0.99+

two tutorialsQUANTITY

0.99+

AnitabORGANIZATION

0.99+

10 peopleQUANTITY

0.99+

oneQUANTITY

0.99+

YouTubeORGANIZATION

0.99+

todayDATE

0.99+

55 plus countriesQUANTITY

0.99+

first partQUANTITY

0.99+

more than 200 citiesQUANTITY

0.99+

firstQUANTITY

0.98+

nineQUANTITY

0.98+

SQLTITLE

0.98+

theCUBEORGANIZATION

0.98+

WIDS 23EVENT

0.98+

Stanford UniversityORGANIZATION

0.98+

2017DATE

0.98+

CUBEORGANIZATION

0.97+

StanfordLOCATION

0.97+

Women in Data ScienceTITLE

0.97+

around 25%QUANTITY

0.96+

DisneylandLOCATION

0.96+

EnglishOTHER

0.96+

one mentorQUANTITY

0.96+

Women in Data Science ConferenceEVENT

0.96+

once a yearQUANTITY

0.95+

WIDSORGANIZATION

0.92+

this morningDATE

0.91+

Meetup.comORGANIZATION

0.91+

FacebookORGANIZATION

0.9+

HadoopTITLE

0.89+

WiDS 2023EVENT

0.88+

Anitab.orgORGANIZATION

0.87+

ChatJTPTITLE

0.86+

OneQUANTITY

0.86+

one dayQUANTITY

0.85+

ChatGPTTITLE

0.84+

pandemicEVENT

0.81+

Fast CompanyORGANIZATION

0.78+

CTOPERSON

0.76+

OpenORGANIZATION

0.76+

Jacqueline Kuo, Dataiku | WiDS 2023


 

(upbeat music) >> Morning guys and girls, welcome back to theCUBE's live coverage of Women in Data Science WIDS 2023 live at Stanford University. Lisa Martin here with my co-host for this segment, Tracy Zhang. We're really excited to be talking with a great female rockstar. You're going to learn a lot from her next, Jacqueline Kuo, solutions engineer at Dataiku. Welcome, Jacqueline. Great to have you. >> Thank you so much. >> Thank for being here. >> I'm so excited to be here. >> So one of the things I have to start out with, 'cause my mom Kathy Dahlia is watching, she's a New Yorker. You are a born and raised New Yorker and I learned from my mom and others. If you're born in New York no matter how long you've moved away, you are a New Yorker. There's you guys have like a secret club. (group laughs) >> I am definitely very proud of being born and raised in New York. My family immigrated to New York, New Jersey from Taiwan. So very proud Taiwanese American as well. But I absolutely love New York and I can't imagine living anywhere else. >> Yeah, yeah. >> I love it. >> So you studied, I was doing some research on you you studied mechanical engineering at MIT. >> Yes. >> That's huge. And you discovered your passion for all things data-related. You worked at IBM as an analytics consultant. Talk to us a little bit about your career path. Were you always interested in engineering STEM-related subjects from the time you were a child? >> I feel like my interests were ranging in many different things and I ended up landing in engineering, 'cause I felt like I wanted to gain a toolkit like a toolset to make some sort of change with or use my career to make some sort of change in this world. And I landed on engineering and mechanical engineering specifically, because I felt like I got to, in my undergrad do a lot of hands-on projects, learn every part of the engineering and design process to build products which is super-transferable and transferable skills sort of is like the trend in my career so far. Where after undergrad I wanted to move back to New York and mechanical engineering jobs are kind of few and fall far in between in the city. And I ended up landing at IBM doing analytics consulting, because I wanted to understand how to use data. I knew that data was really powerful and I knew that working with it could allow me to tell better stories to influence people across different industries. And that's also how I kind of landed at Dataiku to my current role, because it really does allow me to work across different industries and work on different problems that are just interesting. >> Yeah, I like the way that, how you mentioned building a toolkit when doing your studies at school. Do you think a lot of skills are still very relevant to your job at Dataiku right now? >> I think that at the core of it is just problem solving and asking questions and continuing to be curious or trying to challenge what is is currently given to you. And I think in an engineering degree you get a lot of that. >> Yeah, I'm sure. >> But I think that we've actually seen that a lot in the panels today already, that you get that through all different types of work and research and that kind of thoughtfulness comes across in all different industries too. >> Talk a little bit about some of the challenges, that data science is solving, because every company these days, whether it's an enterprise in manufacturing or a small business in retail, everybody has to be data-driven, because the end user, the end customer, whoever that is whether it's a person, an individual, a company, a B2B, expects to have a personalized custom experience and that comes from data. But you have to be able to understand that data treated properly, responsibly. Talk about some of the interesting projects that you're doing at Dataiku or maybe some that you've done in the past that are really kind of transformative across things climate change or police violence, some of the things that data science really is impacting these days. >> Yeah, absolutely. I think that what I love about coming to these conferences is that you hear about those really impactful social impact projects that I think everybody who's in data science wants to be working on. And I think at Dataiku what's great is that we do have this program called Ikig.AI where we work with nonprofits and we support them in their data and analytics projects. And so, a project I worked on was with the Clean Water, oh my goodness, the Ocean Cleanup project, Ocean Cleanup organization, which was amazing, because it was sort of outside of my day-to-day and it allowed me to work with them and help them understand better where plastic is being aggregated across the world and where it appears, whether that's on beaches or in lakes and rivers. So using data to help them better understand that. I feel like from a day-to-day though, we, in terms of our customers, they're really looking at very basic problems with data. And I say basic, not to diminish it, but really just to kind of say that it's high impact, but basic problems around how do they forecast sales better? That's a really kind of, sort of basic problem, but it's actually super-complex and really impactful for people, for companies when it comes to forecasting how much headcount they need to have in the next year or how much inventory to have if they're retail. And all of those are going to, especially for smaller companies, make a huge impact on whether they make profit or not. And so, what's great about working at Dataiku is you get to work on these high-impact projects and oftentimes I think from my perspective, I work as a solutions engineer on the commercial team. So it's just, we work generally with smaller customers and sometimes talking to them, me talking to them is like their first introduction to what data science is and what they can do with that data. And sort of using our platform to show them what the possibilities are and help them build a strategy around how they can implement data in their day-to-day. >> What's the difference? You were a data scientist by title and function, now you're a solutions engineer. Talk about the ascendancy into that and also some of the things that you and Tracy will talk about as those transferable, those transportable skills that probably maybe you learned in engineering, you brought data science now you're bringing to solutions engineering. >> Yeah, absolutely. So data science, I love working with data. I love getting in the weeds of things and I love, oftentimes that means debugging things or looking line by line at your code and trying to make it better. I found that on in the data science role, while those things I really loved, sometimes it also meant that I didn't, couldn't see or didn't have visibility into the broader picture of well like, well why are we doing this project? And who is it impacting? And because oftentimes your day-to-day is very much in the weeds. And so, I moved into sales or solutions engineering at Dataiku to get that perspective, because what a sales engineer does is support the sale from a technical perspective. And so, you really truly understand well, what is the customer looking for and what is going to influence them to make a purchase? And how do you tell the story of the impact of data? Because oftentimes they need to quantify well, if I purchase a software like Dataiku then I'm able to build this project and make this X impact on the business. And that is really powerful. That's where the storytelling comes in and that I feel like a lot of what we've been hearing today about connecting data with people who can actually do something with that data. That's really the bridge that we as sales engineers are trying to connect in that sales process. >> It's all about connectivity, isn't it? >> Yeah, definitely. We were talking about this earlier that it's about making impact and it's about people who we are analyzing data is like influencing. And I saw that one of the keywords or one of the biggest thing at Dataiku is everyday AI, so I wanted to just ask, could you please talk more about how does that weave into the problem solving and then day-to-day making an impact process? >> Yes, so I started working on Dataiku around three years ago and I fell in love with the product itself. The product that we have is we allow for people with different backgrounds. If you're coming from a data analyst background, data science, data engineering, maybe you are more of like a business subject matter expert, to all work in one unified central platform, one user interface. And why that's powerful is that when you're working with data, it's not just that data scientist working on their own and their own computer coding. We've heard today that it's all about connecting the data scientists with those business people, with maybe the data engineers and IT people who are actually going to put that model into production or other folks. And so, they all use different languages. Data scientists might use Python and R, your business people are using PowerPoint and Excel, everyone's using different tools. How do we bring them all in one place so that you can have conversations faster? So the business people can understand exactly what you're building with the data and can get their hands on that data and that model prediction faster. So that's what Dataiku does. That's the product that we have. And I completely forgot your question, 'cause I got so invested in talking about this. Oh, everyday AI. Yeah, so the goal of of Dataiku is really to allow for those maybe less technical people with less traditional data science backgrounds. Maybe they're data experts and they understand the data really well and they've been working in SQL for all their career. Maybe they're just subject matter experts and want to get more into working with data. We allow those people to do that through our no and low-code tools within our platform. Platform is very visual as well. And so, I've seen a lot of people learn data science, learn machine learning by working in the tool itself. And that's sort of, that's where everyday AI comes in, 'cause we truly believe that there are a lot of, there's a lot of unutilized expertise out there that we can bring in. And if we did give them access to data, imagine what we could do in the kind of work that they can do and become empowered basically with that. >> Yeah, we're just scratching the surface. I find data science so fascinating, especially when you talk about some of the real world applications, police violence, health inequities, climate change. Here we are in California and I don't know if you know, we're experiencing an atmospheric river again tomorrow. Californians and the rain- >> Storm is coming. >> We are not good... And I'm a native Californian, but we all know about climate change. People probably don't associate all of the data that is helping us understand it, make decisions based on what's coming what's happened in the past. I just find that so fascinating. But I really think we're truly at the beginning of really understanding the impact that being data-driven can actually mean whether you are investigating climate change or police violence or health inequities or your a grocery store that needs to become data-driven, because your consumer is expecting a personalized relevant experience. I want you to offer me up things that I know I was doing online grocery shopping, yesterday, I just got back from Europe and I was so thankful that my grocer is data-driven, because they made the process so easy for me. And but we have that expectation as consumers that it's going to be that easy, it's going to be that personalized. And what a lot of folks don't understand is the data the democratization of data, the AI that's helping make that a possibility that makes our lives easier. >> Yeah, I love that point around data is everywhere and the more we have, the actually the more access we actually are providing. 'cause now compute is cheaper, data is literally everywhere, you can get access to it very easily. And so, I feel like more people are just getting themselves involved and that's, I mean this whole conference around just bringing more women into this industry and more people with different backgrounds from minority groups so that we get their thoughts, their opinions into the work is so important and it's becoming a lot easier with all of the technology and tools just being open source being easier to access, being cheaper. And that I feel really hopeful about in this field. >> That's good. Hope is good, isn't it? >> Yes, that's all we need. But yeah, I'm glad to see that we're working towards that direction. I'm excited to see what lies in the future. >> We've been talking about numbers of women, percentages of women in technical roles for years and we've seen it hover around 25%. I was looking at some, I need to AnitaB.org stats from 2022 was just looking at this yesterday and the numbers are going up. I think the number was 26, 27.6% of women in technical roles. So we're seeing a growth there especially over pre-pandemic levels. Definitely the biggest challenge that still seems to be one of the biggest that remains is attrition. I would love to get your advice on what would you tell your younger self or the previous prior generation in terms of having the confidence and the courage to pursue engineering, pursue data science, pursue a technical role, and also stay in that role so you can be one of those females on stage that we saw today? >> Yeah, that's the goal right there one day. I think it's really about finding other people to lift and mentor and support you. And I talked to a bunch of people today who just found this conference through Googling it, and the fact that organizations like this exist really do help, because those are the people who are going to understand the struggles you're going through as a woman in this industry, which can get tough, but it gets easier when you have a community to share that with and to support you. And I do want to definitely give a plug to the WIDS@Dataiku team. >> Talk to us about that. >> Yeah, I was so fortunate to be a WIDS ambassador last year and again this year with Dataiku and I was here last year as well with Dataiku, but we have grown the WIDS effort so much over the last few years. So the first year we had two events in New York and also in London. Our Dataiku's global. So this year we additionally have one in the west coast out here in SF and another one in Singapore which is incredible to involve that team. But what I love is that everyone is really passionate about just getting more women involved in this industry. But then also what I find fortunate too at Dataiku is that we have a strong female, just a lot of women. >> Good. >> Yeah. >> A lot of women working as data scientists, solutions engineer and sales and all across the company who even if they aren't doing data work in a day-to-day, they are super-involved and excited to get more women in the technical field. And so. that's like our Empower group internally that hosts events and I feel like it's a really nice safe space for all of us to speak about challenges that we encounter and feel like we're not alone in that we have a support system to make it better. So I think from a nutrition standpoint every organization should have a female ERG to just support one another. >> Absolutely. There's so much value in a network in the community. I was talking to somebody who I'm blanking on this may have been in Barcelona last week, talking about a stat that showed that a really high percentage, 78% of people couldn't identify a female role model in technology. Of course, Sheryl Sandberg's been one of our role models and I thought a lot of people know Sheryl who's leaving or has left. And then a whole, YouTube influencers that have no idea that the CEO of YouTube for years has been a woman, who has- >> And she came last year to speak at WIDS. >> Did she? >> Yeah. >> Oh, I missed that. It must have been, we were probably filming. But we need more, we need to be, and it sounds like Dataiku was doing a great job of this. Tracy, we've talked about this earlier today. We need to see what we can be. And it sounds like Dataiku was pioneering that with that ERG program that you talked about. And I completely agree with you. That should be a standard program everywhere and women should feel empowered to raise their hand ask a question, or really embrace, "I'm interested in engineering, I'm interested in data science." Then maybe there's not a lot of women in classes. That's okay. Be the pioneer, be that next Sheryl Sandberg or the CTO of ChatGPT, Mira Murati, who's a female. We need more people that we can see and lean into that and embrace it. I think you're going to be one of them. >> I think so too. Just so that young girls like me like other who's so in school, can see, can look up to you and be like, "She's my role model and I want to be like her. And I know that there's someone to listen to me and to support me if I have any questions in this field." So yeah. >> Yeah, I mean that's how I feel about literally everyone that I'm surrounded by here. I find that you find role models and people to look up to in every conversation whenever I'm speaking with another woman in tech, because there's a journey that has had happen for you to get to that place. So it's incredible, this community. >> It is incredible. WIDS is a movement we're so proud of at theCUBE to have been a part of it since the very beginning, since 2015, I've been covering it since 2017. It's always one of my favorite events. It's so inspiring and it just goes to show the power that data can have, the influence, but also just that we're at the beginning of uncovering so much. Jacqueline's been such a pleasure having you on theCUBE. Thank you. >> Thank you. >> For sharing your story, sharing with us what Dataiku was doing and keep going. More power to you girl. We're going to see you up on that stage one of these years. >> Thank you so much. Thank you guys. >> Our pleasure. >> Our pleasure. >> For our guests and Tracy Zhang, this is Lisa Martin, you're watching theCUBE live at WIDS '23. #EmbraceEquity is this year's International Women's Day theme. Stick around, our next guest joins us in just a minute. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

We're really excited to be talking I have to start out with, and I can't imagine living anywhere else. So you studied, I was the time you were a child? and I knew that working Yeah, I like the way and continuing to be curious that you get that through and that comes from data. And I say basic, not to diminish it, and also some of the I found that on in the data science role, And I saw that one of the keywords so that you can have conversations faster? Californians and the rain- that it's going to be that easy, and the more we have, Hope is good, isn't it? I'm excited to see what and also stay in that role And I talked to a bunch of people today is that we have a strong and all across the company that have no idea that the And she came last and lean into that and embrace it. And I know that there's I find that you find role models but also just that we're at the beginning We're going to see you up on Thank you so much. #EmbraceEquity is this year's

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
SherylPERSON

0.99+

Mira MuratiPERSON

0.99+

Lisa MartinPERSON

0.99+

Tracy ZhangPERSON

0.99+

TracyPERSON

0.99+

JacquelinePERSON

0.99+

Kathy DahliaPERSON

0.99+

Jacqueline KuoPERSON

0.99+

CaliforniaLOCATION

0.99+

EuropeLOCATION

0.99+

DataikuORGANIZATION

0.99+

New YorkLOCATION

0.99+

SingaporeLOCATION

0.99+

LondonLOCATION

0.99+

last yearDATE

0.99+

Sheryl SandbergPERSON

0.99+

YouTubeORGANIZATION

0.99+

IBMORGANIZATION

0.99+

BarcelonaLOCATION

0.99+

2022DATE

0.99+

TaiwanLOCATION

0.99+

2015DATE

0.99+

last weekDATE

0.99+

two eventsQUANTITY

0.99+

26, 27.6%QUANTITY

0.99+

last yearDATE

0.99+

PowerPointTITLE

0.99+

ExcelTITLE

0.99+

this yearDATE

0.99+

yesterdayDATE

0.99+

PythonTITLE

0.99+

DataikuPERSON

0.99+

New York, New JerseyLOCATION

0.99+

tomorrowDATE

0.99+

2017DATE

0.99+

SFLOCATION

0.99+

MITORGANIZATION

0.99+

todayDATE

0.98+

78%QUANTITY

0.98+

ChatGPTORGANIZATION

0.98+

oneQUANTITY

0.98+

Ocean CleanupORGANIZATION

0.98+

SQLTITLE

0.98+

next yearDATE

0.98+

International Women's DayEVENT

0.97+

RTITLE

0.97+

around 25%QUANTITY

0.96+

CaliforniansPERSON

0.95+

Women in Data ScienceTITLE

0.94+

one dayQUANTITY

0.92+

theCUBEORGANIZATION

0.91+

WIDSORGANIZATION

0.89+

first introductionQUANTITY

0.88+

Stanford UniversityLOCATION

0.87+

one placeQUANTITY

0.87+

Phil Kippen, Snowflake, Dave Whittington, AT&T & Roddy Tranum, AT&T | | MWC Barcelona 2023


 

(gentle music) >> Narrator: "TheCUBE's" live coverage is made possible by funding from Dell Technologies, creating technologies that drive human progress. (upbeat music) >> Hello everybody, welcome back to day four of "theCUBE's" coverage of MWC '23. We're here live at the Fira in Barcelona. Wall-to-wall coverage, John Furrier is in our Palo Alto studio, banging out all the news. Really, the whole week we've been talking about the disaggregation of the telco network, the new opportunities in telco. We're really excited to have AT&T and Snowflake here. Dave Whittington is the AVP, at the Chief Data Office at AT&T. Roddy Tranum is the Assistant Vice President, for Channel Performance Data and Tools at AT&T. And Phil Kippen, the Global Head Of Industry-Telecom at Snowflake, Snowflake's new telecom business. Snowflake just announced earnings last night. Typical Scarpelli, they beat earnings, very conservative guidance, stocks down today, but we like Snowflake long term, they're on that path to 10 billion. Guys, welcome to "theCUBE." Thanks so much >> Phil: Thank you. >> for coming on. >> Dave and Roddy: Thanks Dave. >> Dave, let's start with you. The data culture inside of telco, We've had this, we've been talking all week about this monolithic system. Super reliable. You guys did a great job during the pandemic. Everything shifting to landlines. We didn't even notice, you guys didn't miss a beat. Saved us. But the data culture's changing inside telco. Explain that. >> Well, absolutely. So, first of all IoT and edge processing is bringing forth new and exciting opportunities all the time. So, we're bridging the world between a lot of the OSS stuff that we can do with edge processing. But bringing that back, and now we're talking about working, and I would say traditionally, we talk data warehouse. Data warehouse and big data are now becoming a single mesh, all right? And the use cases and the way you can use those, especially I'm taking that edge data and bringing it back over, now I'm running AI and ML models on it, and I'm pushing back to the edge, and I'm combining that with my relational data. So that mesh there is making all the difference. We're getting new use cases that we can do with that. And it's just, and the volume of data is immense. >> Now, I love ChatGPT, but I'm hoping your data models are more accurate than ChatGPT. I never know. Sometimes it's really good, sometimes it's really bad. But enterprise, you got to be clean with your AI, don't you? >> Not only you have to be clean, you have to monitor it for bias and be ethical about it. We're really good about that. First of all with AT&T, our brand is Platinum. We take care of that. So, we may not be as cutting-edge risk takers as others, but when we go to market with an AI or an ML or a product, it's solid. >> Well hey, as telcos go, you guys are leaning into the Cloud. So I mean, that's a good starting point. Roddy, explain your role. You got an interesting title, Channel Performance Data and Tools, what's that all about? >> So literally anything with our consumer, retail, concenters' channels, all of our channels, from a data perspective and metrics perspective, what it takes to run reps, agents, all the way to leadership levels, scorecards, how you rank in the business, how you're driving the business, from sales, service, customer experience, all that data infrastructure with our great partners on the CDO side, as well as Snowflake, that comes from my team. >> And that's traditionally been done in a, I don't mean the pejorative, but we're talking about legacy, monolithic, sort of data warehouse technologies. >> Absolutely. >> We have a love-hate relationship with them. It's what we had. It's what we used, right? And now that's evolving. And you guys are leaning into the Cloud. >> Dramatic evolution. And what Snowflake's enabled for us is impeccable. We've talked about having, people have dreamed of one data warehouse for the longest time and everything in one system. Really, this is the only way that becomes a reality. The more you get in Snowflake, we can have golden source data, and instead of duplicating that 50 times across AT&T, it's in one place, we just share it, everybody leverages it, and now it's not duplicated, and the process efficiency is just incredible. >> But it really hinges on that separation of storage and compute. And we talk about the monolithic warehouse, and one of the nightmares I've lived with, is having a monolithic warehouse. And let's just go with some of my primary, traditional customers, sales, marketing and finance. They are leveraging BSS OSS data all the time. For me to coordinate a deployment, I have to make sure that each one of these units can take an outage, if it's going to be a long deployment. With the separation of storage, compute, they own their own compute cluster. So I can move faster for these people. 'Cause if finance, I can implement his code without impacting finance or marketing. This brings in CI/CD to more reality. It brings us faster to market with more features. So if he wants to implement a new comp plan for the field reps, or we're reacting to the marketplace, where one of our competitors has done something, we can do that in days, versus waiting weeks or months. >> And we've reported on this a lot. This is the brilliance of Snowflake's founders, that whole separation >> Yep. >> from compute and data. I like Dave, that you're starting with sort of the business flexibility, 'cause there's a cost element of this too. You can dial down, you can turn off compute, and then of course the whole world said, "Hey, that's a good idea." And a VC started throwing money at Amazon, but Redshift said, "Oh, we can do that too, sort of, can't turn off the compute." But I want to ask you Phil, so, >> Sure. >> it looks from my vantage point, like you're taking your Data Cloud message which was originally separate compute from storage simplification, now data sharing, automated governance, security, ultimately the marketplace. >> Phil: Right. >> Taking that same model, break down the silos into telecom, right? It's that same, >> Mm-hmm. >> sorry to use the term playbook, Frank Slootman tells me he doesn't use playbooks, but he's not a pattern matcher, but he's a situational CEO, he says. But the situation in telco calls for that type of strategy. So explain what you guys are doing in telco. >> I think there's, so, what we're launching, we launched last week, and it really was three components, right? So we had our platform as you mentioned, >> Dave: Mm-hmm. >> and that platform is being utilized by a number of different companies today. We also are adding, for telecom very specifically, we're adding capabilities in marketplace, so that service providers can not only use some of the data and apps that are in marketplace, but as well service providers can go and sell applications or sell data that they had built. And then as well, we're adding our ecosystem, it's telecom-specific. So, we're bringing partners in, technology partners, and consulting and services partners, that are very much focused on telecoms and what they do internally, but also helping them monetize new services. >> Okay, so it's not just sort of generic Snowflake into telco? You have specific value there. >> We're purposing the platform specifically for- >> Are you a telco guy? >> I am. You are, okay. >> Total telco guy absolutely. >> So there you go. You see that Snowflake is actually an interesting organizational structure, 'cause you're going after verticals, which is kind of rare for a company of your sort of inventory, I'll say, >> Absolutely. >> I don't mean that as a negative. (Dave laughs) So Dave, take us through the data journey at AT&T. It's a long history. You don't have to go back to the 1800s, but- (Dave laughs) >> Thank you for pointing out, we're a 149-year-old company. So, Jesse James was one of the original customers, (Dave laughs) and we have no longer got his data. So, I'll go back. I've been 17 years singular AT&T, and I've watched it through the whole journey of, where the monolithics were growing, when the consolidation of small, wireless carriers, and we went through that boom. And then we've gone through mergers and acquisitions. But, Hadoop came out, and it was going to solve all world hunger. And we had all the aspects of, we're going to monetize and do AI and ML, and some of the things we learned with Hadoop was, we had this monolithic warehouse, we had this file-based-structured Hadoop, but we really didn't know how to bring this all together. And we were bringing items over to the relational, and we were taking the relational and bringing it over to the warehouse, and trying to, and it was a struggle. Let's just go there. And I don't think we were the only company to struggle with that, but we learned a lot. And so now as tech is finally emerging, with the cloud, companies like Snowflake, and others that can handle that, where we can create, we were discussing earlier, but it becomes more of a conducive mesh that's interoperable. So now we're able to simplify that environment. And the cloud is a big thing on that. 'Cause you could not do this on-prem with on-prem technologies. It would be just too cost prohibitive, and too heavy of lifting, going back and forth, and managing the data. The simplicity the cloud brings with a smaller set of tools, and I'll say in the data space specifically, really allows us, maybe not a single instance of data for all use cases, but a greatly reduced ecosystem. And when you simplify your ecosystem, you simplify speed to market and data management. >> So I'm going to ask you, I know it's kind of internal organizational plumbing, but it'll inform my next question. So, Dave, you're with the Chief Data Office, and Roddy, you're kind of, you all serve in the business, but you're really serving the, you're closer to those guys, they're banging on your door for- >> Absolutely. I try to keep the 130,000 users who may or may not have issues sometimes with our data and metrics, away from Dave. And he just gets a call from me. >> And he only calls when he has a problem. He's never wished me happy birthday. (Dave and Phil laugh) >> So the reason I asked that is because, you describe Dave, some of the Hadoop days, and again love-hate with that, but we had hyper-specialized roles. We still do. You've got data engineers, data scientists, data analysts, and you've got this sort of this pipeline, and it had to be this sequential pipeline. I know Snowflake and others have come to simplify that. My question to you is, how is that those roles, how are those roles changing? How is data getting closer to the business? Everybody talks about democratizing business. Are you doing that? What's a real use example? >> From our perspective, those roles, a lot of those roles on my team for years, because we're all about efficiency, >> Dave: Mm-hmm. >> we cut across those areas, and always have cut across those areas. So now we're into a space where things have been simplified, data processes and copying, we've gone from 40 data processes down to five steps now. We've gone from five steps to one step. We've gone from days, now take hours, hours to minutes, minutes to seconds. Literally we're seeing that time in and time out with Snowflake. So these resources that have spent all their time on data engineering and moving data around, are now freed up more on what they have skills for and always have, the data analytics area of the business, and driving the business forward, and new metrics and new analysis. That's some of the great operational value that we've seen here. As this simplification happens, it frees up brain power. >> So, you're pumping data from the OSS, the BSS, the OKRs everywhere >> Everywhere. >> into Snowflake? >> Scheduling systems, you name it. If you can think of what drives our retail and centers and online, all that data, scheduling system, chat data, call center data, call detail data, all of that enters into this common infrastructure to manage the business on a day in and day out basis. >> How are the roles and the skill sets changing? 'Cause you're doing a lot less ETL, you're doing a lot less moving of data around. There were guys that were probably really good at that. I used to joke in the, when I was in the storage world, like if your job is bandaging lungs, you need to look for a new job, right? So, and they did and people move on. So, are you able to sort of redeploy those assets, and those people, those human resources? >> These folks are highly skilled. And we were talking about earlier, SQL hasn't gone away. Relational databases are not going away. And that's one thing that's made this migration excellent, they're just transitioning their skills. Experts in legacy systems are now rapidly becoming experts on the Snowflake side. And it has not been that hard a transition. There are certainly nuances, things that don't operate as well in the cloud environment that we have to learn and optimize. But we're making that transition. >> Dave: So just, >> Please. >> within the Chief Data Office we have a couple of missions, and Roddy is a great partner and an example of how it works. We try to bring the data for democratization, so that we have one interface, now hopefully know we just have a logical connection back to these Snowflake instances that we connect. But we're providing that governance and cleansing, and if there's a business rule at the enterprise level, we provide it. But the goal at CDO is to make sure that business units like Roddy or marketing or finance, that they can come to a platform that's reliable, robust, and self-service. I don't want to be in his way. So I feel like I'm providing a sub-level of platform, that he can come to and anybody can come to, and utilize, that they're not having to go back and undo what's in Salesforce, or ServiceNow, or in our billers. So, I'm sort of that layer. And then making sure that that ecosystem is robust enough for him to use. >> And that self-service infrastructure is predominantly through the Azure Cloud, correct? >> Dave: Absolutely. >> And you work on other clouds, but it's predominantly through Azure? >> We're predominantly in Azure, yeah. >> Dave: That's the first-party citizen? >> Yeah. >> Okay, I like to think in terms sometimes of data products, and I know you've mentioned upfront, you're Gold standard or Platinum standard, you're very careful about personal information. >> Dave: Yeah. >> So you're not trying to sell, I'm an AT&T customer, you're not trying to sell my data, and make money off of my data. So the value prop and the business case for Snowflake is it's simpler. You do things faster, you're in the cloud, lower cost, et cetera. But I presume you're also in the business, AT&T, of making offers and creating packages for customers. I look at those as data products, 'cause it's not a, I mean, yeah, there's a physical phone, but there's data products behind it. So- >> It ultimately is, but not everybody always sees it that way. Data reporting often can be an afterthought. And we're making it more on the forefront now. >> Yeah, so I like to think in terms of data products, I mean even if the financial services business, it's a data business. So, if we can think about that sort of metaphor, do you see yourselves as data product builders? Do you have that, do you think about building products in that regard? >> Within the Chief Data Office, we have a data product team, >> Mm-hmm. >> and by the way, I wouldn't be disingenuous if I said, oh, we're very mature in this, but no, it's where we're going, and it's somewhat of a journey, but I've got a peer, and their whole job is to go from, especially as we migrate from cloud, if Roddy or some other group was using tables three, four and five and joining them together, it's like, "Well look, this is an offer for data product, so let's combine these and put it up in the cloud, and here's the offer data set product, or here's the opportunity data product," and it's a journey. We're on the way, but we have dedicated staff and time to do this. >> I think one of the hardest parts about that is the organizational aspects of it. Like who owns the data now, right? It used to be owned by the techies, and increasingly the business lines want to have access, you're providing self-service. So there's a discussion about, "Okay, what is a data product? Who's responsible for that data product? Is it in my P&L or your P&L? Somebody's got to sign up for that number." So, it sounds like those discussions are taking place. >> They are. And, we feel like we're more the, and CDO at least, we feel more, we're like the guardians, and the shepherds, but not the owners. I mean, we have a role in it all, but he owns his metrics. >> Yeah, and even from our perspective, we see ourselves as an enabler of making whatever AT&T wants to make happen in terms of the key products and officers' trade-in offers, trade-in programs, all that requires this data infrastructure, and managing reps and agents, and what they do from a channel performance perspective. We still ourselves see ourselves as key enablers of that. And we've got to be flexible, and respond quickly to the business. >> I always had empathy for the data engineer, and he or she had to service all these different lines of business with no business context. >> Yeah. >> Like the business knows good data from bad data, and then they just pound that poor individual, and they're like, "Okay, I'm doing my best. It's just ones and zeros to me." So, it sounds like that's, you're on that path. >> Yeah absolutely, and I think, we do have refined, getting more and more refined owners of, since Snowflake enables these golden source data, everybody sees me and my organization, channel performance data, go to Roddy's team, we have a great team, and we go to Dave in terms of making it all happen from a data infrastructure perspective. So we, do have a lot more refined, "This is where you go for the golden source, this is where it is, this is who owns it. If you want to launch this product and services, and you want to manage reps with it, that's the place you-" >> It's a strong story. So Chief Data Office doesn't own the data per se, but it's your responsibility to provide the self-service infrastructure, and make sure it's governed properly, and in as automated way as possible. >> Well, yeah, absolutely. And let me tell you more, everybody talks about single version of the truth, one instance of the data, but there's context to that, that we are taking, trying to take advantage of that as we do data products is, what's the use case here? So we may have an entity of Roddy as a prospective customer, and we may have a entity of Roddy as a customer, high-value customer over here, which may have a different set of mix of data and all, but as a data product, we can then create those for those specific use cases. Still point to the same data, but build it in different constructs. One for marketing, one for sales, one for finance. By the way, that's where your data engineers are struggling. >> Yeah, yeah, of course. So how do I serve all these folks, and really have the context-common story in telco, >> Absolutely. >> or are these guys ahead of the curve a little bit? Or where would you put them? >> I think they're definitely moving a lot faster than the industry is generally. I think the enabling technologies, like for instance, having that single copy of data that everybody sees, a single pane of glass, right, that's definitely something that everybody wants to get to. Not many people are there. I think, what AT&T's doing, is most definitely a little bit further ahead than the industry generally. And I think the successes that are coming out of that, and the learning experiences are starting to generate momentum within AT&T. So I think, it's not just about the product, and having a product now that gives you a single copy of data. It's about the experiences, right? And now, how the teams are getting trained, domains like network engineering for instance. They typically haven't been a part of data discussions, because they've got a lot of data, but they're focused on the infrastructure. >> Mm. >> So, by going ahead and deploying this platform, for platform's purpose, right, and the business value, that's one thing, but also to start bringing, getting that experience, and bringing new experience in to help other groups that traditionally hadn't been data-centric, that's also a huge step ahead, right? So you need to enable those groups. >> A big complaint of course we hear at MWC from carriers is, "The over-the-top guys are killing us. They're riding on our networks, et cetera, et cetera. They have all the data, they have all the client relationships." Do you see your client relationships changing as a result of sort of your data culture evolving? >> Yes, I'm not sure I can- >> It's a loaded question, I know. >> Yeah, and then I, so, we want to start embedding as much into our network on the proprietary value that we have, so we can start getting into that OTT play, us as any other carrier, we have distinct advantages of what we can do at the edge, and we just need to start exploiting those. But you know, 'cause whether it's location or whatnot, so we got to eat into that. Historically, the network is where we make our money in, and we stack the services on top of it. It used to be *69. >> Dave: Yeah. >> If anybody remembers that. >> Dave: Yeah, of course. (Dave laughs) >> But you know, it was stacked on top of our network. Then we stack another product on top of it. It'll be in the edge where we start providing distinct values to other partners as we- >> I mean, it's a great business that you're in. I mean, if they're really good at connectivity. >> Dave: Yeah. >> And so, it sounds like it's still to be determined >> Dave: Yeah. >> where you can go with this. You have to be super careful with private and for personal information. >> Dave: Yep. >> Yeah, but the opportunities are enormous. >> There's a lot. >> Yeah, particularly at the edge, looking at, private networks are just an amazing opportunity. Factories and name it, hospital, remote hospitals, remote locations. I mean- >> Dave: Connected cars. >> Connected cars are really interesting, right? I mean, if you start communicating car to car, and actually drive that, (Dave laughs) I mean that's, now we're getting to visit Xen Fault Tolerance people. This is it. >> Dave: That's not, let's hold the traffic. >> Doesn't scare me as much as we actually learn. (all laugh) >> So how's the show been for you guys? >> Dave: Awesome. >> What're your big takeaways from- >> Tremendous experience. I mean, someone who doesn't go outside the United States much, I'm a homebody. The whole experience, the whole trip, city, Mobile World Congress, the technologies that are out here, it's been a blast. >> Anything, top two things you learned, advice you'd give to others, your colleagues out in general? >> In general, we talked a lot about technologies today, and we talked a lot about data, but I'm going to tell you what, the accelerator that you cannot change, is the relationship that we have. So when the tech and the business can work together toward a common goal, and it's a partnership, you get things done. So, I don't know how many CDOs or CIOs or CEOs are out there, but this connection is what accelerates and makes it work. >> And that is our audience Dave. I mean, it's all about that alignment. So guys, I really appreciate you coming in and sharing your story in "theCUBE." Great stuff. >> Thank you. >> Thanks a lot. >> All right, thanks everybody. Thank you for watching. I'll be right back with Dave Nicholson. Day four SiliconANGLE's coverage of MWC '23. You're watching "theCUBE." (gentle music)

Published Date : Mar 2 2023

SUMMARY :

that drive human progress. And Phil Kippen, the Global But the data culture's of the OSS stuff that we But enterprise, you got to be So, we may not be as cutting-edge Channel Performance Data and all the way to leadership I don't mean the pejorative, And you guys are leaning into the Cloud. and the process efficiency and one of the nightmares I've lived with, This is the brilliance of the business flexibility, like you're taking your Data Cloud message But the situation in telco and that platform is being utilized You have specific value there. I am. So there you go. I don't mean that as a negative. and some of the things we and Roddy, you're kind of, And he just gets a call from me. (Dave and Phil laugh) and it had to be this sequential pipeline. and always have, the data all of that enters into How are the roles and in the cloud environment that But the goal at CDO is to and I know you've mentioned upfront, So the value prop and the on the forefront now. I mean even if the and by the way, I wouldn't and increasingly the business and the shepherds, but not the owners. and respond quickly to the business. and he or she had to service Like the business knows and we go to Dave in terms doesn't own the data per se, and we may have a entity and really have the and having a product now that gives you and the business value, that's one thing, They have all the data, on the proprietary value that we have, Dave: Yeah, of course. It'll be in the edge business that you're in. You have to be super careful Yeah, but the particularly at the edge, and actually drive that, let's hold the traffic. much as we actually learn. the whole trip, city, is the relationship that we have. and sharing your story in "theCUBE." Thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave WhittingtonPERSON

0.99+

Frank SlootmanPERSON

0.99+

RoddyPERSON

0.99+

AmazonORGANIZATION

0.99+

PhilPERSON

0.99+

Phil KippenPERSON

0.99+

AT&TORGANIZATION

0.99+

Jesse JamesPERSON

0.99+

AT&T.ORGANIZATION

0.99+

five stepsQUANTITY

0.99+

Dave NicholsonPERSON

0.99+

John FurrierPERSON

0.99+

50 timesQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

Roddy TranumPERSON

0.99+

10 billionQUANTITY

0.99+

one stepQUANTITY

0.99+

17 yearsQUANTITY

0.99+

130,000 usersQUANTITY

0.99+

United StatesLOCATION

0.99+

1800sDATE

0.99+

last weekDATE

0.99+

BarcelonaLOCATION

0.99+

Palo AltoLOCATION

0.99+

Dell TechnologiesORGANIZATION

0.99+

last nightDATE

0.99+

MWC '23EVENT

0.98+

telcoORGANIZATION

0.98+

one systemQUANTITY

0.98+

oneQUANTITY

0.98+

40 data processesQUANTITY

0.98+

todayDATE

0.98+

one placeQUANTITY

0.97+

P&LORGANIZATION

0.97+

telcosORGANIZATION

0.97+

CDOORGANIZATION

0.97+

149-year-oldQUANTITY

0.97+

fiveQUANTITY

0.97+

singleQUANTITY

0.96+

three componentsQUANTITY

0.96+

OneQUANTITY

0.96+

SiliconANGLE News | Beyond the Buzz: A deep dive into the impact of AI


 

(upbeat music) >> Hello, everyone, welcome to theCUBE. I'm John Furrier, the host of theCUBE in Palo Alto, California. Also it's SiliconANGLE News. Got two great guests here to talk about AI, the impact of the future of the internet, the applications, the people. Amr Awadallah, the founder and CEO, Ed Alban is the CEO of Vectara, a new startup that emerged out of the original Cloudera, I would say, 'cause Amr's known, famous for the Cloudera founding, which was really the beginning of the big data movement. And now as AI goes mainstream, there's so much to talk about, so much to go on. And plus the new company is one of the, now what I call the wave, this next big wave, I call it the fifth wave in the industry. You know, you had PCs, you had the internet, you had mobile. This generative AI thing is real. And you're starting to see startups come out in droves. Amr obviously was founder of Cloudera, Big Data, and now Vectara. And Ed Albanese, you guys have a new company. Welcome to the show. >> Thank you. It's great to be here. >> So great to see you. Now the story is theCUBE started in the Cloudera office. Thanks to you, and your friendly entrepreneurship views that you have. We got to know each other over the years. But Cloudera had Hadoop, which was the beginning of what I call the big data wave, which then became what we now call data lakes, data oceans, and data infrastructure that's developed from that. It's almost interesting to look back 12 plus years, and see that what AI is doing now, right now, is opening up the eyes to the mainstream, and the application's almost mind blowing. You know, Sati Natel called it the Mosaic Moment, didn't say Netscape, he built Netscape (laughing) but called it the Mosaic Moment. You're seeing companies in startups, kind of the alpha geeks running here, because this is the new frontier, and there's real meat on the bone, in terms of like things to do. Why? Why is this happening now? What's is the confluence of the forces happening, that are making this happen? >> Yeah, I mean if you go back to the Cloudera days, with big data, and so on, that was more about data processing. Like how can we process data, so we can extract numbers from it, and do reporting, and maybe take some actions, like this is a fraud transaction, or this is not. And in the meanwhile, many of the researchers working in the neural network, and deep neural network space, were trying to focus on data understanding, like how can I understand the data, and learn from it, so I can take actual actions, based on the data directly, just like a human does. And we were only good at doing that at the level of somebody who was five years old, or seven years old, all the way until about 2013. And starting in 2013, which is only 10 years ago, a number of key innovations started taking place, and each one added on. It was no major innovation that just took place. It was a couple of really incremental ones, but they added on top of each other, in a very exponentially additive way, that led to, by the end of 2019, we now have models, deep neural network models, that can read and understand human text just like we do. Right? And they can reason about it, and argue with you, and explain it to you. And I think that's what is unlocking this whole new wave of innovation that we're seeing right now. So data understanding would be the essence of it. >> So it's not a Big Bang kind of theory, it's been evolving over time, and I think that the tipping point has been the advancements and other things. I mean look at cloud computing, and look how fast it just crept up on AWS. I mean AWS you back three, five years ago, I was talking to Swami yesterday, and their big news about AI, expanding the Hugging Face's relationship with AWS. And just three, five years ago, there wasn't a model training models out there. But as compute comes out, and you got more horsepower,, these large language models, these foundational models, they're flexible, they're not monolithic silos, they're interacting. There's a whole new, almost fusion of data happening. Do you see that? I mean is that part of this? >> Of course, of course. I mean this wave is building on all the previous waves. We wouldn't be at this point if we did not have hardware that can scale, in a very efficient way. We wouldn't be at this point, if we don't have data that we're collecting about everything we do, that we're able to process in this way. So this, this movement, this motion, this phase we're in, absolutely builds on the shoulders of all the previous phases. For some of the observers from the outside, when they see chatGPT for the first time, for them was like, "Oh my god, this just happened overnight." Like it didn't happen overnight. (laughing) GPT itself, like GPT3, which is what chatGPT is based on, was released a year ahead of chatGPT, and many of us were seeing the power it can provide, and what it can do. I don't know if Ed agrees with that. >> Yeah, Ed? >> I do. Although I would acknowledge that the possibilities now, because of what we've hit from a maturity standpoint, have just opened up in an incredible way, that just wasn't tenable even three years ago. And that's what makes it, it's true that it developed incrementally, in the same way that, you know, the possibilities of a mobile handheld device, you know, in 2006 were there, but when the iPhone came out, the possibilities just exploded. And that's the moment we're in. >> Well, I've had many conversations over the past couple months around this area with chatGPT. John Markoff told me the other day, that he calls it, "The five dollar toy," because it's not that big of a deal, in context to what AI's doing behind the scenes, and all the work that's done on ethics, that's happened over the years, but it has woken up the mainstream, so everyone immediately jumps to ethics. "Does it work? "It's not factual," And everyone who's inside the industry is like, "This is amazing." 'Cause you have two schools of thought there. One's like, people that think this is now the beginning of next gen, this is now we're here, this ain't your grandfather's chatbot, okay?" With NLP, it's got reasoning, it's got other things. >> I'm in that camp for sure. >> Yeah. Well I mean, everyone who knows what's going on is in that camp. And as the naysayers start to get through this, and they go, "Wow, it's not just plagiarizing homework, "it's helping me be better. "Like it could rewrite my memo, "bring the lead to the top." It's so the format of the user interface is interesting, but it's still a data-driven app. >> Absolutely. >> So where does it go from here? 'Cause I'm not even calling this the first ending. This is like pregame, in my opinion. What do you guys see this going, in terms of scratching the surface to what happens next? >> I mean, I'll start with, I just don't see how an application is going to look the same in the next three years. Who's going to want to input data manually, in a form field? Who is going to want, or expect, to have to put in some text in a search box, and then read through 15 different possibilities, and try to figure out which one of them actually most closely resembles the question they asked? You know, I don't see that happening. Who's going to start with an absolute blank sheet of paper, and expect no help? That is not how an application will work in the next three years, and it's going to fundamentally change how people interact and spend time with opening any element on their mobile phone, or on their computer, to get something done. >> Yes. I agree with that. Like every single application, over the next five years, will be rewritten, to fit within this model. So imagine an HR application, I don't want to name companies, but imagine an HR application, and you go into application and you clicking on buttons, because you want to take two weeks of vacation, and menus, and clicking here and there, reasons and managers, versus just telling the system, "I'm taking two weeks of vacation, going to Las Vegas," book it, done. >> Yeah. >> And the system just does it for you. If you weren't completing in your input, in your description, for what you want, then the system asks you back, "Did you mean this? "Did you mean that? "Were you trying to also do this as well?" >> Yeah. >> "What was the reason?" And that will fit it for you, and just do it for you. So I think the user interface that we have with apps, is going to change to be very similar to the user interface that we have with each other. And that's why all these apps will need to evolve. >> I know we don't have a lot of time, 'cause you guys are very busy, but I want to definitely have multiple segments with you guys, on this topic, because there's so much to talk about. There's a lot of parallels going on here. I was talking again with Swami who runs all the AI database at AWS, and I asked him, I go, "This feels a lot like the original AWS. "You don't have to provision a data center." A lot of this heavy lifting on the back end, is these large language models, with these foundational models. So the bottleneck in the past, was the energy, and cost to actually do it. Now you're seeing it being stood up faster. So there's definitely going to be a tsunami of apps. I would see that clearly. What is it? We don't know yet. But also people who are going to leverage the fact that I can get started building value. So I see a startup boom coming, and I see an application tsunami of refactoring things. >> Yes. >> So the replatforming is already kind of happening. >> Yes, >> OpenAI, chatGPT, whatever. So that's going to be a developer environment. I mean if Amazon turns this into an API, or a Microsoft, what you guys are doing. >> We're turning it into API as well. That's part of what we're doing as well, yes. >> This is why this is exciting. Amr, you've lived the big data dream, and and we used to talk, if you didn't have a big data problem, if you weren't full of data, you weren't really getting it. Now people have all the data, and they got to stand this up. >> Yeah. >> So the analogy is again, the mobile, I like the mobile movement, and using mobile as an analogy, most companies were not building for a mobile environment, right? They were just building for the web, and legacy way of doing apps. And as soon as the user expectations shifted, that my expectation now, I need to be able to do my job on this small screen, on the mobile device with a touchscreen. Everybody had to invest in re-architecting, and re-implementing every single app, to fit within that model, and that model of interaction. And we are seeing the exact same thing happen now. And one of the core things we're focused on at Vectara, is how to simplify that for organizations, because a lot of them are overwhelmed by large language models, and ML. >> They don't have the staff. >> Yeah, yeah, yeah. They're understaffed, they don't have the skills. >> But they got developers, they've got DevOps, right? >> Yes. >> So they have the DevSecOps going on. >> Exactly, yes. >> So our goal is to simplify it enough for them that they can start leveraging this technology effectively, within their applications. >> Ed, you're the COO of the company, obviously a startup. You guys are growing. You got great backup, and good team. You've also done a lot of business development, and technical business development in this area. If you look at the landscape right now, and I agree the apps are coming, every company I talk to, that has that jet chatGPT of, you know, epiphany, "Oh my God, look how cool this is. "Like magic." Like okay, it's code, settle down. >> Mm hmm. >> But everyone I talk to is using it in a very horizontal way. I talk to a very senior person, very tech alpha geek, very senior person in the industry, technically. they're using it for log data, they're using it for configuration of routers. And in other areas, they're using it for, every vertical has a use case. So this is horizontally scalable from a use case standpoint. When you hear horizontally scalable, first thing I chose in my mind is cloud, right? >> Mm hmm. >> So cloud, and scalability that way. And the data is very specialized. So now you have this vertical specialization, horizontally scalable, everyone will be refactoring. What do you see, and what are you seeing from customers, that you talk to, and prospects? >> Yeah, I mean put yourself in the shoes of an application developer, who is actually trying to make their application a bit more like magic. And to have that soon-to-be, honestly, expected experience. They've got to think about things like performance, and how efficiently that they can actually execute a query, or a question. They've got to think about cost. Generative isn't cheap, like the inference of it. And so you've got to be thoughtful about how and when you take advantage of it, you can't use it as a, you know, everything looks like a nail, and I've got a hammer, and I'm going to hit everything with it, because that will be wasteful. Developers also need to think about how they're going to take advantage of, but not lose their own data. So there has to be some controls around what they feed into the large language model, if anything. Like, should they fine tune a large language model with their own data? Can they keep it logically separated, but still take advantage of the powers of a large language model? And they've also got to take advantage, and be aware of the fact that when data is generated, that it is a different class of data. It might not fully be their own. >> Yeah. >> And it may not even be fully verified. And so when the logical cycle starts, of someone making a request, the relationship between that request, and the output, those things have to be stored safely, logically, and identified as such. >> Yeah. >> And taken advantage of in an ongoing fashion. So these are mega problems, each one of them independently, that, you know, you can think of it as middleware companies need to take advantage of, and think about, to help the next wave of application development be logical, sensible, and effective. It's not just calling some raw API on the cloud, like openAI, and then just, you know, you get your answer and you're done, because that is a very brute force approach. >> Well also I will point, first of all, I agree with your statement about the apps experience, that's going to be expected, form filling. Great point. The interesting about chatGPT. >> Sorry, it's not just form filling, it's any action you would like to take. >> Yeah. >> Instead of clicking, and dragging, and dropping, and doing it on a menu, or on a touch screen, you just say it, and it's and it happens perfectly. >> Yeah. It's a different interface. And that's why I love that UIUX experiences, that's the people falling out of their chair moment with chatGPT, right? But a lot of the things with chatGPT, if you feed it right, it works great. If you feed it wrong and it goes off the rails, it goes off the rails big. >> Yes, yes. >> So the the Bing catastrophes. >> Yeah. >> And that's an example of garbage in, garbage out, classic old school kind of comp-side phrase that we all use. >> Yep. >> Yes. >> This is about data in injection, right? It reminds me the old SQL days, if you had to, if you can sling some SQL, you were a magician, you know, to get the right answer, it's pretty much there. So you got to feed the AI. >> You do, Some people call this, the early word to describe this as prompt engineering. You know, old school, you know, search, or, you know, engagement with data would be, I'm going to, I have a question or I have a query. New school is, I have, I have to issue it a prompt, because I'm trying to get, you know, an action or a reaction, from the system. And the active engineering, there are a lot of different ways you could do it, all the way from, you know, raw, just I'm going to send you whatever I'm thinking. >> Yeah. >> And you get the unintended outcomes, to more constrained, where I'm going to just use my own data, and I'm going to constrain the initial inputs, the data I already know that's first party, and I trust, to, you know, hyper constrain, where the application is actually, it's looking for certain elements to respond to. >> It's interesting Amr, this is why I love this, because one we are in the media, we're recording this video now, we'll stream it. But we got all your linguistics, we're talking. >> Yes. >> This is data. >> Yep. >> So the data quality becomes now the new intellectual property, because, if you have that prompt source data, it makes data or content, in our case, the original content, intellectual property. >> Absolutely. >> Because that's the value. And that's where you see chatGPT fall down, is because they're trying to scroll the web, and people think it's search. It's not necessarily search, it's giving you something that you wanted. It is a lot of that, I remember in Cloudera, you said, "Ask the right questions." Remember that phrase you guys had, that slogan? >> Mm hmm. And that's prompt engineering. So that's exactly, that's the reinvention of "Ask the right question," is prompt engineering is, if you don't give these models the question in the right way, and very few people know how to frame it in the right way with the right context, then you will get garbage out. Right? That is the garbage in, garbage out. But if you specify the question correctly, and you provide with it the metadata that constrain what that question is going to be acted upon or answered upon, then you'll get much better answers. And that's exactly what we solved Vectara. >> Okay. So before we get into the last couple minutes we have left, I want to make sure we get a plug in for the opportunity, and the profile of Vectara, your new company. Can you guys both share with me what you think the current situation is? So for the folks who are now having those moments of, "Ah, AI's bullshit," or, "It's not real, it's a lot of stuff," from, "Oh my god, this is magic," to, "Okay, this is the future." >> Yes. >> What would you say to that person, if you're at a cocktail party, or in the elevator say, "Calm down, this is the first inning." How do you explain the dynamics going on right now, to someone who's either in the industry, but not in the ropes? How would you explain like, what this wave's about? How would you describe it, and how would you prepare them for how to change their life around this? >> Yeah, so I'll go first and then I'll let Ed go. Efficiency, efficiency is the description. So we figured that a way to be a lot more efficient, a way where you can write a lot more emails, create way more content, create way more presentations. Developers can develop 10 times faster than they normally would. And that is very similar to what happened during the Industrial Revolution. I always like to look at examples from the past, to read what will happen now, and what will happen in the future. So during the Industrial Revolution, it was about efficiency with our hands, right? So I had to make a piece of cloth, like this piece of cloth for this shirt I'm wearing. Our ancestors, they had to spend month taking the cotton, making it into threads, taking the threads, making them into pieces of cloth, and then cutting it. And now a machine makes it just like that, right? And the ancestors now turned from the people that do the thing, to manage the machines that do the thing. And I think the same thing is going to happen now, is our efficiency will be multiplied extremely, as human beings, and we'll be able to do a lot more. And many of us will be able to do things they couldn't do before. So another great example I always like to use is the example of Google Maps, and GPS. Very few of us knew how to drive a car from one location to another, and read a map, and get there correctly. But once that efficiency of an AI, by the way, behind these things is very, very complex AI, that figures out how to do that for us. All of us now became amazing navigators that can go from any point to any point. So that's kind of how I look at the future. >> And that's a great real example of impact. Ed, your take on how you would talk to a friend, or colleague, or anyone who asks like, "How do I make sense of the current situation? "Is it real? "What's in it for me, and what do I do?" I mean every company's rethinking their business right now, around this. What would you say to them? >> You know, I usually like to show, rather than describe. And so, you know, the other day I just got access, I've been using an application for a long time, called Notion, and it's super popular. There's like 30 or 40 million users. And the new version of Notion came out, which has AI embedded within it. And it's AI that allows you primarily to create. So if you could break down the world of AI into find and create, for a minute, just kind of logically separate those two things, find is certainly going to be massively impacted in our experiences as consumers on, you know, Google and Bing, and I can't believe I just said the word Bing in the same sentence as Google, but that's what's happening now (all laughing), because it's a good example of change. >> Yes. >> But also inside the business. But on the crate side, you know, Notion is a wiki product, where you try to, you know, note down things that you are thinking about, or you want to share and memorialize. But sometimes you do need help to get it down fast. And just in the first day of using this new product, like my experience has really fundamentally changed. And I think that anybody who would, you know, anybody say for example, that is using an existing app, I would show them, open up the app. Now imagine the possibility of getting a starting point right off the bat, in five seconds of, instead of having to whole cloth draft this thing, imagine getting a starting point then you can modify and edit, or just dispose of and retry again. And that's the potential for me. I can't imagine a scenario where, in a few years from now, I'm going to be satisfied if I don't have a little bit of help, in the same way that I don't manually spell check every email that I send. I automatically spell check it. I love when I'm getting type ahead support inside of Google, or anything. Doesn't mean I always take it, or when texting. >> That's efficiency too. I mean the cloud was about developers getting stuff up quick. >> Exactly. >> All that heavy lifting is there for you, so you don't have to do it. >> Right? >> And you get to the value faster. >> Exactly. I mean, if history taught us one thing, it's, you have to always embrace efficiency, and if you don't fast enough, you will fall behind. Again, looking at the industrial revolution, the companies that embraced the industrial revolution, they became the leaders in the world, and the ones who did not, they all like. >> Well the AI thing that we got to watch out for, is watching how it goes off the rails. If it doesn't have the right prompt engineering, or data architecture, infrastructure. >> Yes. >> It's a big part. So this comes back down to your startup, real quick, I know we got a couple minutes left. Talk about the company, the motivation, and we'll do a deeper dive on on the company. But what's the motivation? What are you targeting for the market, business model? The tech, let's go. >> Actually, I would like Ed to go first. Go ahead. >> Sure, I mean, we're a developer-first, API-first platform. So the product is oriented around allowing developers who may not be superstars, in being able to either leverage, or choose, or select their own large language models for appropriate use cases. But they that want to be able to instantly add the power of large language models into their application set. We started with search, because we think it's going to be one of the first places that people try to take advantage of large language models, to help find information within an application context. And we've built our own large language models, focused on making it very efficient, and elegant, to find information more quickly. So what a developer can do is, within minutes, go up, register for an account, and get access to a set of APIs, that allow them to send data, to be converted into a format that's easy to understand for large language models, vectors. And then secondarily, they can issue queries, ask questions. And they can ask them very, the questions that can be asked, are very natural language questions. So we're talking about long form sentences, you know, drill down types of questions, and they can get answers that either come back in depending upon the form factor of the user interface, in list form, or summarized form, where summarized equals the opportunity to kind of see a condensed, singular answer. >> All right. I have a. >> Oh okay, go ahead, you go. >> I was just going to say, I'm going to be a customer for you, because I want, my dream was to have a hologram of theCUBE host, me and Dave, and have questions be generated in the metaverse. So you know. (all laughing) >> There'll be no longer any guests here. They'll all be talking to you guys. >> Give a couple bullets, I'll spit out 10 good questions. Publish a story. This brings the automation, I'm sorry to interrupt you. >> No, no. No, no, I was just going to follow on on the same. So another way to look at exactly what Ed described is, we want to offer you chatGPT for your own data, right? So imagine taking all of the recordings of all of the interviews you have done, and having all of the content of that being ingested by a system, where you can now have a conversation with your own data and say, "Oh, last time when I met Amr, "which video games did we talk about? "Which movie or book did we use as an analogy "for how we should be embracing data science, "and big data, which is moneyball," I know you use moneyball all the time. And you start having that conversation. So, now the data doesn't become a passive asset that you just have in your organization. No. It's an active participant that's sitting with you, on the table, helping you make decisions. >> One of my favorite things to do with customers, is to go to their site or application, and show them me using it. So for example, one of the customers I talked to was one of the biggest property management companies in the world, that lets people go and rent homes, and houses, and things like that. And you know, I went and I showed them me searching through reviews, looking for information, and trying different words, and trying to find out like, you know, is this place quiet? Is it comfortable? And then I put all the same data into our platform, and I showed them the world of difference you can have when you start asking that question wholeheartedly, and getting real information that doesn't have anything to do with the words you asked, but is really focused on the meaning. You know, when I asked like, "Is it quiet?" You know, answers would come back like, "The wind whispered through the trees peacefully," and you know, it's like nothing to do with quiet in the literal word sense, but in the meaning sense, everything to do with it. And that that was magical even for them, to see that. >> Well you guys are the front end of this big wave. Congratulations on the startup, Amr. I know you guys got great pedigree in big data, and you've got a great team, and congratulations. Vectara is the name of the company, check 'em out. Again, the startup boom is coming. This will be one of the major waves, generative AI is here. I think we'll look back, and it will be pointed out as a major inflection point in the industry. >> Absolutely. >> There's not a lot of hype behind that. People are are seeing it, experts are. So it's going to be fun, thanks for watching. >> Thanks John. (soft music)

Published Date : Feb 23 2023

SUMMARY :

I call it the fifth wave in the industry. It's great to be here. and the application's almost mind blowing. And in the meanwhile, and you got more horsepower,, of all the previous phases. in the same way that, you know, and all the work that's done on ethics, "bring the lead to the top." in terms of scratching the surface and it's going to fundamentally change and you go into application And the system just does it for you. is going to change to be very So the bottleneck in the past, So the replatforming is So that's going to be a That's part of what and they got to stand this up. And one of the core things don't have the skills. So our goal is to simplify it and I agree the apps are coming, I talk to a very senior And the data is very specialized. and be aware of the fact that request, and the output, some raw API on the cloud, about the apps experience, it's any action you would like to take. you just say it, and it's But a lot of the things with chatGPT, comp-side phrase that we all use. It reminds me the old all the way from, you know, raw, and I'm going to constrain But we got all your So the data quality And that's where you That is the garbage in, garbage out. So for the folks who are and how would you prepare them that do the thing, to manage the current situation? And the new version of Notion came out, But on the crate side, you I mean the cloud was about developers so you don't have to do it. and the ones who did not, they all like. If it doesn't have the So this comes back down to Actually, I would like Ed to go first. factor of the user interface, I have a. generated in the metaverse. They'll all be talking to you guys. This brings the automation, of all of the interviews you have done, one of the customers I talked to Vectara is the name of the So it's going to be fun, Thanks John.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
John MarkoffPERSON

0.99+

2013DATE

0.99+

AWSORGANIZATION

0.99+

Ed AlbanPERSON

0.99+

AmazonORGANIZATION

0.99+

30QUANTITY

0.99+

10 timesQUANTITY

0.99+

2006DATE

0.99+

John FurrierPERSON

0.99+

two weeksQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

DavePERSON

0.99+

Ed AlbanesePERSON

0.99+

JohnPERSON

0.99+

five secondsQUANTITY

0.99+

Las VegasLOCATION

0.99+

EdPERSON

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

10 good questionsQUANTITY

0.99+

SwamiPERSON

0.99+

15 different possibilitiesQUANTITY

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

VectaraORGANIZATION

0.99+

Amr AwadallahPERSON

0.99+

GoogleORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

first timeQUANTITY

0.99+

bothQUANTITY

0.99+

end of 2019DATE

0.99+

yesterdayDATE

0.98+

Big DataORGANIZATION

0.98+

40 million usersQUANTITY

0.98+

two thingsQUANTITY

0.98+

two great guestsQUANTITY

0.98+

12 plus yearsQUANTITY

0.98+

oneQUANTITY

0.98+

five dollarQUANTITY

0.98+

NetscapeORGANIZATION

0.98+

five years agoDATE

0.98+

SQLTITLE

0.98+

first inningQUANTITY

0.98+

AmrPERSON

0.97+

two schoolsQUANTITY

0.97+

firstQUANTITY

0.97+

10 years agoDATE

0.97+

OneQUANTITY

0.96+

first dayQUANTITY

0.96+

threeDATE

0.96+

chatGPTTITLE

0.96+

first placesQUANTITY

0.95+

BingORGANIZATION

0.95+

NotionTITLE

0.95+

first thingQUANTITY

0.94+

theCUBEORGANIZATION

0.94+

Beyond the BuzzTITLE

0.94+

Sati NatelPERSON

0.94+

Industrial RevolutionEVENT

0.93+

one locationQUANTITY

0.93+

three years agoDATE

0.93+

single applicationQUANTITY

0.92+

one thingQUANTITY

0.91+

first platformQUANTITY

0.91+

five years oldQUANTITY

0.91+

How to Make a Data Fabric Smart A Technical Demo With Jess Jowdy


 

(inspirational music) (music ends) >> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities, including data exploration business intelligence, natural language processing and machine learning directly within the fabric makes it faster and easier for organizations to gain new insights and power intelligence predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi, yeah, thank you so much for having me. And so for this demo, we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements, and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo, and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see, and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be, for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software, or adverse reaction warnings from a clinical risk grouping application, and so much more. So I'm really going to be simulating a patient logging in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here, and I'm going to be looking for information where the last name of this patient is Simmons, and their medical record number or their patient identifier in the system is 32345. And so as you can see, I have this single JSON payload that showed up here of, just, relevant clinical information for my patient whose last name is Simmons, all within a single response. So fantastic, right? Typically though, when we see responses that look like this there is an assumption that this service is interacting with a single backend system, and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture, we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here, we have our data fabric coordinator which is going to be in charge of this refinement and analysis, those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service, and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end, we do also support full life cycle API management within this platform. When you're dealing with APIs, I always like to make a little shout out on this, that you really want to make sure you have enough, like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what context. >> Can I just interrupt you for a second, Jess? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you could have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry, and API securities are like, really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So, there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So, the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product, and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So, that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of the security. >> And that's been designed in, it's not a bolt on as they like to say. >> Absolutely. >> Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly, each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like Fire. Interactions with a homegrown enterprise data warehouse for instance, may use SQL. For a cloud-based solutions managed by a vendor, they may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and applications. And I'm about to log out, so I'm going to (chuckles) keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources, and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is, it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST, or SOAP, or SQL, or FTP, regardless of that protocol, there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as in healthcare we have HL7, we have Fire, we have CCDs, across the industry, JSON is, you know, really hitting a market strong now, and XML payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel, I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example, communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR, I'm leveraging a standard healthcare messaging format known as Fire, which is a REST based protocol. And then when I'm working with my health record management system, I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly, and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN, and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out-of-the box or black box approach to be able to develop things that are specific to their data fabric, or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you not only get an opportunity to view how we're establishing these connections or how we're building out these processes, but you have the opportunity to inject your own kind of processing, your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out-of-the-box code that is provided in this data fabric platform from IRIS, combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out-of-the-box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. (laughs) >> It's a lot here. You know, actually- >> I can pause. >> If I could, if we just want to sort of play that back. So we went to the connect and the collect phase. >> Yes, we're going into refine. So it's a good place to stop. >> So before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know, the ability to bring in different dev tools. We heard about Fire, which of course big in healthcare. And that's the standard, and then SQL for traditional kind of structured data, and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely. And I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection, into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refinement. >> We're going into refinement. Exciting. (chuckles) So how do we actually do refinement? Where does refinement happen? And how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator, or stands for Smart Data Fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information, it's aggregating it, and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like. And as you can see, it follows a flow chart like structure. So there's a start, there is an end, and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce, or we make this data fabric a bit smarter, and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection, we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging, 'cause you need to be able to know, you know, if there was an issue, where did that issue happen in which connected process, and how did it affect the other processes that are related to it? In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric, to when data was sent back out from that smart data fabric. So I didn't record the time, but I bet if you recorded the time it was this time that we sent that request in and you can see my patient's name and their medical record number here, and you can see that that instigated four different calls to four different systems, and they're represented by these arrows going out. So we sent something to chart script, to our health record management system, to our clinical risk grouping application, into my EMR through their Fire server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems, and we bundle them together. And from my Fire lovers, here's our Fire bundle that we got back from our Fire server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping, or errors that were thrown, alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Sure, who did what when where, what did the machine do what went wrong, and where did that go wrong? Right at your fingertips. >> Right. And I'm a visual person so a bunch of log files to me is not the most helpful. While being able to see this happened at this time in this location, gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric, is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information it's transforming that data, in a format that your consumer's not going to understand. It's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? It all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well, we can keep going. I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this, but essentially if we go back to our coordinator here, we can see here's that original, that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here, which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric, but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at, and we're running it through a machine learning model that exists within the smart data fabric pipeline, and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world, is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL-like syntax to be able to construct and execute these predictions. So, it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge, right? Because it directly affects the cost for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment, or, you know, as an outpatient perhaps, to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day, which is what makes this so exciting. >> Awesome demo. >> Thank you! >> Jess, are you cool if people want to get in touch with you? Can they do that? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy, and we'd love to hear from you. I always love talking about this topic so we'd be happy to engage on that. >> Great stuff. Thank you Jessica, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment, we're going to dig into the use cases where data fabric is driving business value. Stay right there. (inspirational music) (music fades)

Published Date : Feb 22 2023

SUMMARY :

and she's going to show And to that end, we do also So you were showing hundreds of these APIs depending in the healthcare industry, So can I even see this as they like to say. that are specific to their data fabric, Yeah, I'll pause. It's a lot here. So we went to the connect So it's a good place to stop. So before we get So that platform needs to All right, so now we're that are related to it? Right at your fingertips. I need to actually troubleshoot a problem. of being able to create of clients that are using this technology Anything else you want to show us? So in this scenario, we're and the patient, you know. And that really brings So you can find me on Thank you Jessica, appreciate it. in the next segment,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Joe LichtenbergPERSON

0.99+

Jessica JowdyPERSON

0.99+

JessicaPERSON

0.99+

Jess JowdyPERSON

0.99+

InterSystemsORGANIZATION

0.99+

ScottPERSON

0.99+

PythonTITLE

0.99+

SimmonsPERSON

0.99+

JessPERSON

0.99+

32345OTHER

0.99+

hundredsQUANTITY

0.99+

IRISORGANIZATION

0.99+

eachQUANTITY

0.99+

todayDATE

0.99+

LinkedInORGANIZATION

0.99+

third segmentQUANTITY

0.98+

FireCOMMERCIAL_ITEM

0.98+

SQLTITLE

0.98+

single platformQUANTITY

0.97+

each dataQUANTITY

0.97+

oneQUANTITY

0.97+

singleQUANTITY

0.95+

single responseQUANTITY

0.94+

single backend systemQUANTITY

0.92+

two moreQUANTITY

0.92+

four different segmentsQUANTITY

0.89+

APIsQUANTITY

0.88+

one stepQUANTITY

0.88+

fourQUANTITY

0.85+

Healthcare Field EngineeringORGANIZATION

0.82+

JSONTITLE

0.8+

single payloadQUANTITY

0.8+

secondQUANTITY

0.79+

one payloadQUANTITY

0.76+

next 30 daysDATE

0.76+

IRISTITLE

0.75+

FireTITLE

0.72+

PostmanTITLE

0.71+

everyQUANTITY

0.68+

four different callsQUANTITY

0.66+

JesPERSON

0.66+

a secondQUANTITY

0.61+

servicesQUANTITY

0.6+

evelopersPERSON

0.58+

PostmanORGANIZATION

0.54+

HL7OTHER

0.4+

Ed Walsh & Thomas Hazel | A New Database Architecture for Supercloud


 

(bright music) >> Hi, everybody, this is Dave Vellante, welcome back to Supercloud 2. Last August, at the first Supercloud event, we invited the broader community to help further define Supercloud, we assessed its viability, and identified the critical elements and deployment models of the concept. The objectives here at Supercloud too are, first of all, to continue to tighten and test the concept, the second is, we want to get real world input from practitioners on the problems that they're facing and the viability of Supercloud in terms of applying it to their business. So on the program, we got companies like Walmart, Sachs, Western Union, Ionis Pharmaceuticals, NASDAQ, and others. And the third thing that we want to do is we want to drill into the intersection of cloud and data to project what the future looks like in the context of Supercloud. So in this segment, we want to explore the concept of data architectures and what's going to be required for Supercloud. And I'm pleased to welcome one of our Supercloud sponsors, ChaosSearch, Ed Walsh is the CEO of the company, with Thomas Hazel, who's the Founder, CTO, and Chief Scientist. Guys, good to see you again, thanks for coming into our Marlborough studio. >> Always great. >> Great to be here. >> Okay, so there's a little debate, I'm going to put you right in the spot. (Ed chuckling) A little debate going on in the community started by Bob Muglia, a former CEO of Snowflake, and he was at Microsoft for a long time, and he looked at the Supercloud definition, said, "I think you need to tighten it up a little bit." So, here's what he came up with. He said, "A Supercloud is a platform that provides a programmatically consistent set of services hosted on heterogeneous cloud providers." So he's calling it a platform, not an architecture, which was kind of interesting. And so presumably the platform owner is going to be responsible for the architecture, but Dr. Nelu Mihai, who's a computer scientist behind the Cloud of Clouds Project, he chimed in and responded with the following. He said, "Cloud is a programming paradigm supporting the entire lifecycle of applications with data and logic natively distributed. Supercloud is an open architecture that integrates heterogeneous clouds in an agnostic manner." So, Ed, words matter. Is this an architecture or is it a platform? >> Put us on the spot. So, I'm sure you have concepts, I would say it's an architectural or design principle. Listen, I look at Supercloud as a mega trend, just like cloud, just like data analytics. And some companies are using the principle, design principles, to literally get dramatically ahead of everyone else. I mean, things you couldn't possibly do if you didn't use cloud principles, right? So I think it's a Supercloud effect, you're able to do things you're not able to. So I think it's more a design principle, but if you do it right, you get dramatic effect as far as customer value. >> So the conversation that we were having with Muglia, and Tristan Handy of dbt Labs, was, I'll set it up as the following, and, Thomas, would love to get your thoughts, if you have a CRM, think about applications today, it's all about forms and codifying business processes, you type a bunch of stuff into Salesforce, and all the salespeople do it, and this machine generates a forecast. What if you have this new type of data app that pulls data from the transaction system, the e-commerce, the supply chain, the partner ecosystem, et cetera, and then, without humans, actually comes up with a plan. That's their vision. And Muglia was saying, in order to do that, you need to rethink data architectures and database architectures specifically, you need to get down to the level of how the data is stored on the disc. What are your thoughts on that? Well, first of all, I'm going to cop out, I think it's actually both. I do think it's a design principle, I think it's not open technology, but open APIs, open access, and you can build a platform on that design principle architecture. Now, I'm a database person, I love solving the database problems. >> I'm waited for you to launch into this. >> Yeah, so I mean, you know, Snowflake is a database, right? It's a distributed database. And we wanted to crack those codes, because, multi-region, multi-cloud, customers wanted access to their data, and their data is in a variety of forms, all these services that you're talked about. And so what I saw as a core principle was cloud object storage, everyone streams their data to cloud object storage. From there we said, well, how about we rethink database architecture, rethink file format, so that we can take each one of these services and bring them together, whether distributively or centrally, such that customers can access and get answers, whether it's operational data, whether it's business data, AKA search, or SQL, complex distributed joins. But we had to rethink the architecture. I like to say we're not a first generation, or a second, we're a third generation distributed database on pure, pure cloud storage, no caching, no SSDs. Why? Because all that availability, the cost of time, is a struggle, and cloud object storage, we think, is the answer. >> So when you're saying no caching, so when I think about how companies are solving some, you know, pretty hairy problems, take MySQL Heatwave, everybody thought Oracle was going to just forget about MySQL, well, they come out with Heatwave. And the way they solve problems, and you see their benchmarks against Amazon, "Oh, we crush everybody," is they put it all in memory. So you said no caching? You're not getting performance through caching? How is that true, and how are you getting performance? >> Well, so five, six years ago, right? When you realize that cloud object storage is going to be everywhere, and it's going to be a core foundational, if you will, fabric, what would you do? Well, a lot of times the second generation say, "We'll take it out of cloud storage, put in SSDs or something, and put into cache." And that adds a lot of time, adds a lot of costs. But I said, what if, what if we could actually make the first read hot, the first read distributed joins and searching? And so what we went out to do was said, we can't cache, because that's adds time, that adds cost. We have to make cloud object storage high performance, like it feels like a caching SSD. That's where our patents are, that's where our technology is, and we've spent many years working towards this. So, to me, if you can crack that code, a lot of these issues we're talking about, multi-region, multicloud, different services, everybody wants to send their data to the data lake, but then they move it out, we said, "Keep it right there." >> You nailed it, the data gravity. So, Bob's right, the data's coming in, and you need to get the data from everywhere, but you need an environment that you can deal with all that different schema, all the different type of technology, but also at scale. Bob's right, you cannot use memory or SSDs to cache that, that doesn't scale, it doesn't scale cost effectively. But if you could, and what you did, is you made object storage, S3 first, but object storage, the only persistence by doing that. And then we get performance, we should talk about it, it's literally, you know, hundreds of terabytes of queries, and it's done in seconds, it's done without memory caching. We have concepts of caching, but the only caching, the only persistence, is actually when we're doing caching, we're just keeping another side-eye track of things on the S3 itself. So we're using, actually, the object storage to be a database, which is kind of where Bob was saying, we agree, but that's what you started at, people thought you were crazy. >> And maybe make it live. Don't think of it as archival or temporary space, make it live, real time streaming, operational data. What we do is make it smart, we see the data coming in, we uniquely index it such that you can get your use cases, that are search, observability, security, or backend operational. But we don't have to have this, I dunno, static, fixed, siloed type of architecture technologies that were traditionally built prior to Supercloud thinking. >> And you don't have to move everything, essentially, you can do it wherever the data lands, whatever cloud across the globe, you're able to bring it together, you get the cost effectiveness, because the only persistence is the cheapest storage persistent layer you can buy. But the key thing is you cracked the code. >> We had to crack the code, right? That was the key thing. >> That's where the plans are. >> And then once you do that, then everything else gets easier to scale, your architecture, across regions, across cloud. >> Now, it's a general purpose database, as Bob was saying, but we use that database to solve a particular issue, which is around operational data, right? So, we agree with Bob's. >> Interesting. So this brings me to this concept of data, Jimata Gan is one of our speakers, you know, we talk about data fabric, which is a NetApp, originally NetApp concept, Gartner's kind of co-opted it. But so, the basic concept is, data lives everywhere, whether it's an S3 bucket, or a SQL database, or a data lake, it's just a node on the data mesh. So in your view, how does this fit in with Supercloud? Ed, you've said that you've built, essentially, an enabler for that, for the data mesh, I think you're an enabler for the Supercloud-like principles. This is a big, chewy opportunity, and it requires, you know, a team approach. There's got to be an ecosystem, there's not going to be one Supercloud to rule them all, so where does the ecosystem fit into the discussion, and where do you fit into the ecosystem? >> Right, so we agree completely, there's not one Supercloud in effect, but we use Supercloud principles to build our platform, and then, you know, the ecosystem's going to be built on leveraging what everyone else's secret powers are, right? So our power, our superpower, based upon what we built is, we deal with, if you're having any scale, or cost effective scale issues, with data, machine generated data, like business observability or security data, we are your force multiplier, we will take that in singularly, just let it, simply put it in your object storage wherever it sits, and we give you uniformity access to that using OpenAPI access, SQL, or you know, Elasticsearch API. So, that's what we do, that's our superpower. So I'll play it into data mesh, that's a perfect, we are a node on a data mesh, but I'll play it in the soup about how, the ecosystem, we see it kind of playing, and we talked about it in just in the last couple days, how we see this kind of possibly. Short term, our superpowers, we deal with this data that's coming at these environments, people, customers, building out observability or security environments, or vendors that are selling their own Supercloud, I do observability, the Datadogs of the world, dot dot dot, the Splunks of the world, dot dot dot, and security. So what we do is we fit in naturally. What we do is a cost effective scale, just land it anywhere in the world, we deal with ingest, and it's a cost effective, an order of magnitude, or two or three order magnitudes more cost effective. Allows them, their customers are asking them to do the impossible, "Give me fast monitoring alerting. I want it snappy, but I want it to keep two years of data, (laughs) and I want it cost effective." It doesn't work. They're good at the fast monitoring alerting, we're good at the long-term retention. And yet there's some gray area between those two, but one to one is actually cheaper, so we would partner. So the first ecosystem plays, who wants to have the ability to, really, all the data's in those same environments, the security observability players, they can literally, just through API, drag our data into their point to grab. We can make it seamless for customers. Right now, we make it helpful to customers. Your Datadog, we make a button, easy go from Datadog to us for logs, save you money. Same thing with Grafana. But you can also look at ecosystem, those same vendors, it used to be a year ago it was, you know, its all about how can you grow, like it's growth at all costs, now it's about cogs. So literally we can go an environment, you supply what your customer wants, but we can help with cogs. And one-on one in a partnership is better than you trying to build on your own. >> Thomas, you were saying you make the first read fast, so you think about Snowflake. Everybody wants to talk about Snowflake and Databricks. So, Snowflake, great, but you got to get the data in there. All right, so that's, can you help with that problem? >> I mean we want simple in, right? And if you have to have structure in, you're not simple. So the idea that you have a simple in, data lake, schema read type philosophy, but schema right type performance. And so what I wanted to do, what we have done, is have that simple lake, and stream that data real time, and those access points of Search or SQL, to go after whatever business case you need, security observability, warehouse integration. But the key thing is, how do I make that click, click, click answer, and do it quickly? And so what we want to do is, that first read has to be fast. Why? 'Cause then you're going to do all this siloing, layers, complexity. If your first read's not fast, you're at a disadvantage, particularly in cost. And nobody says I want less data, but everyone has to, whether they say we're going to shorten the window, we're going to use AI to choose, but in a security moment, when you don't have that answer, you're in trouble. And that's why we are this service, this Supercloud service, if you will, providing access, well-known search, well-known SQL type access, that if you just have one access point, you're at a disadvantage. >> We actually talked about Snowflake and BigQuery, and a different platform, Data Bricks. That's kind of where we see the phase two of ecosystem. One is easy, the low-hanging fruit is observability and security firms. But the next one is, what we do, our super power is dealing with this messy data that schema is changing like night and day. Pipelines are tough, and it's changing all the time, but you want these things fast, and it's big data around the world. That's the next point, just use us alongside, or inside, one of their platforms, and now we get the best of both worlds. Our superpower is keeping this messy data as a streaming, okay, not a batch thing, allow you to do that. So, that's the second one. And then to be honest, the third one, which plays you to Supercloud, it also plays perfectly in the data mesh, is if you really go to the ultimate thing, what we have done is made object storage, S3, GCS, and blob storage, we made it a database. Put, get, complex query with big joins. You know, so back to your original thing, and Muglia teed it up perfectly, we've done that. Now imagine if that's an ecosystem, who would want that? If it's, again, it's uniform available across all the regions, across all the clouds, and it's right next to where you are building a service, or a client's trying, that's where the ecosystem, I think people are going to use Superclouds for their superpowers. We're really good at this, allows that short term. I think the Snowflakes and the Data Bricks are the medium term, you know? And then I think eventually gets to, hey, listen if you can make object storage fast, you can just go after it with simple SQL queries, or elastic. Who would want that? I think that's where people are going to leverage it. It's not going to be one Supercloud, and we leverage the super clouds. >> Our viewpoint is smart object storage can be programmable, and so we agree with Bob, but we're not saying do it here, do it here. This core, fundamental layer across regions, across clouds, that everyone has? Simple in. Right now, it's hard to get data in for access for analysis. So we said, simply, we'll automate the entire process, give you API access across regions, across clouds. And again, how do you do a distributed join that's fast? How do you do a distributed join that doesn't cost you an arm or a leg? And how do you do it at scale? And that's where we've been focused. >> So prior, the cloud object store was a niche. >> Yeah. >> S3 obviously changed that. How standard is, essentially, object store across the different cloud platforms? Is that a problem for you? Is that an easy thing to solve? >> Well, let's talk about it. I mean we've fundamentally, yeah we've extracted it, but fundamentally, cloud object storage, put, get, and list. That's why it's so scalable, 'cause it doesn't have all these other components. That complexity is where we have moved up, and provide direct analytical API access. So because of its simplicity, and costs, and security, and reliability, it can scale naturally. I mean, really, distributed object storage is easy, it's put-get anywhere, now what we've done is we put a layer of intelligence, you know, call it smart object storage, where access is simple. So whether it's multi-region, do a query across, or multicloud, do a query across, or hunting, searching. >> We've had clients doing Amazon and Google, we have some Azure, but we see Amazon and Google more, and it's a consistent service across all of them. Just literally put your data in the bucket of choice, or folder of choice, click a couple buttons, literally click that to say "that's hot," and after that, it's hot, you can see it. But we're not moving data, the data gravity issue, that's the other. That it's already natively flowing to these pools of object storage across different regions and clouds. We don't move it, we index it right there, we're spinning up stateless compute, back to the Supercloud concept. But now that allows us to do all these other things, right? >> And it's no longer just cheap and deep object storage. Right? >> Yeah, we make it the same, like you have an analytic platform regardless of where you're at, you don't have to worry about that. Yeah, we deal with that, we deal with a stateless compute coming up -- >> And make it programmable. Be able to say, "I want this bucket to provide these answers." Right, that's really the hope, the vision. And the complexity to build the entire stack, and then connect them together, we said, the fabric is cloud storage, we just provide the intelligence on top. >> Let's bring it back to the customers, and one of the things we're exploring in Supercloud too is, you know, is Supercloud a solution looking for a problem? Is a multicloud really a problem? I mean, you hear, you know, a lot of the vendor marketing says, "Oh, it's a disaster, because it's all different across the clouds." And I talked to a lot of customers even as part of Supercloud too, they're like, "Well, I solved that problem by just going mono cloud." Well, but then you're not able to take advantage of a lot of the capabilities and the primitives that, you know, like Google's data, or you like Microsoft's simplicity, their RPA, whatever it is. So what are customers telling you, what are their near term problems that they're trying to solve today, and how are they thinking about the future? >> Listen, it's a real problem. I think it started, I think this is a a mega trend, just like cloud. Just, cloud data, and I always add, analytics, are the mega trends. If you're looking at those, if you're not considering using the Supercloud principles, in other words, leveraging what I have, abstracting it out, and getting the most out of that, and then build value on top, I think you're not going to be able to keep up, In fact, no way you're going to keep up with this data volume. It's a geometric challenge, and you're trying to do linear things. So clients aren't necessarily asking, hey, for Supercloud, but they're really saying, I need to have a better mechanism to simplify this and get value across it, and how do you abstract that out to do that? And that's where they're obviously, our conversations are more amazed what we're able to do, and what they're able to do with our platform, because if you think of what we've done, the S3, or GCS, or object storage, is they can't imagine the ingest, they can't imagine how easy, time to glass, one minute, no matter where it lands in the world, querying this in seconds for hundreds of terabytes squared. People are amazed, but that's kind of, so they're not asking for that, but they are amazed. And then when you start talking on it, if you're an enterprise person, you're building a big cloud data platform, or doing data or analytics, if you're not trying to leverage the public clouds, and somehow leverage all of them, and then build on top, then I think you're missing it. So they might not be asking for it, but they're doing it. >> And they're looking for a lens, you mentioned all these different services, how do I bring those together quickly? You know, our viewpoint, our service, is I have all these streams of data, create a lens where they want to go after it via search, go after via SQL, bring them together instantly, no e-tailing out, no define this table, put into this database. We said, let's have a service that creates a lens across all these streams, and then make those connections. I want to take my CRM with my Google AdWords, and maybe my Salesforce, how do I do analysis? Maybe I want to hunt first, maybe I want to join, maybe I want to add another stream to it. And so our viewpoint is, it's so natural to get into these lake platforms and then provide lenses to get that access. >> And they don't want it separate, they don't want something different here, and different there. They want it basically -- >> So this is our industry, right? If something new comes out, remember virtualization came out, "Oh my God, this is so great, it's going to solve all these problems." And all of a sudden it just got to be this big, more complex thing. Same thing with cloud, you know? It started out with S3, and then EC2, and now hundreds and hundreds of different services. So, it's a complex matter for a lot of people, and this creates problems for customers, especially when you got divisions that are using different clouds, and you're saying that the solution, or a solution for the part of the problem, is to really allow the data to stay in place on S3, use that standard, super simple, but then give it what, Ed, you've called superpower a couple of times, to make it fast, make it inexpensive, and allow you to do that across clouds. >> Yeah, yeah. >> I'll give you guys the last word on that. >> No, listen, I think, we think Supercloud allows you to do a lot more. And for us, data, everyone says more data, more problems, more budget issue, everyone knows more data is better, and we show you how to do it cost effectively at scale. And we couldn't have done it without the design principles of we're leveraging the Supercloud to get capabilities, and because we use super, just the object storage, we're able to get these capabilities of ingest, scale, cost effectiveness, and then we built on top of this. In the end, a database is a data platform that allows you to go after everything distributed, and to get one platform for analytics, no matter where it lands, that's where we think the Supercloud concepts are perfect, that's where our clients are seeing it, and we're kind of excited about it. >> Yeah a third generation database, Supercloud database, however we want to phrase it, and make it simple, but provide the value, and make it instant. >> Guys, thanks so much for coming into the studio today, I really thank you for your support of theCUBE, and theCUBE community, it allows us to provide events like this and free content. I really appreciate it. >> Oh, thank you. >> Thank you. >> All right, this is Dave Vellante for John Furrier in theCUBE community, thanks for being with us today. You're watching Supercloud 2, keep it right there for more thought provoking discussions around the future of cloud and data. (bright music)

Published Date : Feb 17 2023

SUMMARY :

And the third thing that we want to do I'm going to put you right but if you do it right, So the conversation that we were having I like to say we're not a and you see their So, to me, if you can crack that code, and you need to get the you can get your use cases, But the key thing is you cracked the code. We had to crack the code, right? And then once you do that, So, we agree with Bob's. and where do you fit into the ecosystem? and we give you uniformity access to that so you think about Snowflake. So the idea that you have are the medium term, you know? and so we agree with Bob, So prior, the cloud that an easy thing to solve? you know, call it smart object storage, and after that, it's hot, you can see it. And it's no longer just you don't have to worry about And the complexity to and one of the things we're and how do you abstract it's so natural to get and different there. and allow you to do that across clouds. I'll give you guys and we show you how to do it but provide the value, I really thank you for around the future of cloud and data.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
WalmartORGANIZATION

0.99+

Dave VellantePERSON

0.99+

NASDAQORGANIZATION

0.99+

Bob MugliaPERSON

0.99+

ThomasPERSON

0.99+

Thomas HazelPERSON

0.99+

Ionis PharmaceuticalsORGANIZATION

0.99+

Western UnionORGANIZATION

0.99+

Ed WalshPERSON

0.99+

BobPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Nelu MihaiPERSON

0.99+

SachsORGANIZATION

0.99+

Tristan HandyPERSON

0.99+

twoQUANTITY

0.99+

AmazonORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

two yearsQUANTITY

0.99+

Supercloud 2TITLE

0.99+

firstQUANTITY

0.99+

Last AugustDATE

0.99+

threeQUANTITY

0.99+

OracleORGANIZATION

0.99+

SnowflakeORGANIZATION

0.99+

bothQUANTITY

0.99+

dbt LabsORGANIZATION

0.99+

John FurrierPERSON

0.99+

EdPERSON

0.99+

GartnerORGANIZATION

0.99+

Jimata GanPERSON

0.99+

third oneQUANTITY

0.99+

one minuteQUANTITY

0.99+

secondQUANTITY

0.99+

first generationQUANTITY

0.99+

third generationQUANTITY

0.99+

GrafanaORGANIZATION

0.99+

second generationQUANTITY

0.99+

second oneQUANTITY

0.99+

hundreds of terabytesQUANTITY

0.98+

SQLTITLE

0.98+

fiveDATE

0.98+

oneQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

a year agoDATE

0.98+

ChaosSearchORGANIZATION

0.98+

MugliaPERSON

0.98+

MySQLTITLE

0.98+

both worldsQUANTITY

0.98+

third thingQUANTITY

0.97+

MarlboroughLOCATION

0.97+

theCUBEORGANIZATION

0.97+

todayDATE

0.97+

SupercloudORGANIZATION

0.97+

ElasticsearchTITLE

0.96+

NetAppTITLE

0.96+

DatadogORGANIZATION

0.96+

OneQUANTITY

0.96+

EC2TITLE

0.96+

each oneQUANTITY

0.96+

S3TITLE

0.96+

one platformQUANTITY

0.95+

Supercloud 2EVENT

0.95+

first readQUANTITY

0.95+

six years agoDATE

0.95+

Supercloud Applications & Developer Impact | Supercloud2


 

(gentle music) >> Okay, welcome back to Supercloud 2, live here in Palo Alto, California for our live stage performance. Supercloud 2 is our second Supercloud event. We're going to get these out as fast as we can every couple months. It's our second one, you'll see two and three this year. I'm John Furrier, my co-host, Dave Vellante. A panel here to break down the Supercloud momentum, the wave, and the developer impact that we bringing back Vittorio Viarengo, who's a VP for Cross-Cloud Services at VMware. Sarbjeet Johal, industry influencer and Analyst at StackPayne, his company, Cube alumni and Influencer. Sarbjeet, great to see you. Vittorio, thanks for coming back. >> Nice to be here. >> My pleasure. >> Vittorio, you just gave a keynote where we unpacked the cross-cloud services, what VMware is doing, how you guys see it, not just from VMware's perspective, but VMware looking out broadly at the industry and developers came up and you were like, "Developers, developer, developers", kind of a goof on the Steve Ballmer famous meme that everyone's seen. This is a huge star, sorry, I mean a big piece of it. The developers are the canary in the coal mines. They're the ones who are being asked to code the digital transformation, which is fully business transformation and with the market the way it is right now in terms of the accelerated technology, every enterprise grade business model's changing. The technology is evolving, the builders are kind of, they want go faster. I'm saying they're stuck in a way, but that's my opinion, but there's a lot of growth. >> Yeah. >> The impact, they got to get released up and let it go. Those developers need to accelerate faster. It's been a big part of productivity, and the conversations we've had. So developer impact is huge in Supercloud. What's your, what do you guys think about this? We'll start with you, Sarbjeet. >> Yeah, actually, developers are the masons of the digital empires I call 'em, right? They lay every brick and build all these big empires. On the left side of the SDLC, or the, you know, when you look at the system operations, developer is number one cost from economic side of things, and from technology side of things, they are tech hungry people. They are developers for that reason because developer nights are long, hours are long, they forget about when to eat, you know, like, I've been a developer, I still code. So you want to keep them happy, you want to hug your developers. We always say that, right? Vittorio said that right earlier. The key is to, in this context, in the Supercloud context, is that developers don't mind mucking around with platforms or APIs or new languages, but they hate the infrastructure part. That's a fact. They don't want to muck around with servers. It's friction for them, it is like they don't want to muck around even with the VMs. So they want the programmability to the nth degree. They want to automate everything, so that's how they think and cloud is the programmable infrastructure, industrialization of infrastructure in many ways. So they are happy with where we are going, and we need more abstraction layers for some developers. By the way, I have this sort of thinking frame for last year or so, not all developers are same, right? So if you are a developer at an ISV, you behave differently. If you are a developer at a typical enterprise, you behave differently or you are forced to behave differently because you're not writing software.- >> Well, developers, developers have changed, I mean, Vittorio, you and I were talking earlier on the keynote, and this is kind of the key point is what is a developer these days? If everything is software enabled, I mean, even hardware interviews we do with Nvidia, and Amazon and other people building silicon, they all say the same thing, "It's software on a chip." So you're seeing the role of software up and down the stack and the role of the stack is changing. The old days of full stack developer, what does that even mean? I mean, the cloud is a half a stack kind of right there. So, you know, developers are certainly more agile, but cloud native, I mean VMware is epitome of operations, IT operations, and the Tan Zoo initiative, you guys started, you went after the developers to look at them, and ask them questions, "What do you need?", "How do you transform the Ops from virtualization?" Again, back to your point, so this hardware abstraction, what is software, what is cloud native? It's kind of messy equation these days. How do you guys grokel with that? >> I would argue that developers don't want the Supercloud. I dropped that up there, so, >> Dave: Why not? >> Because developers, they, once they get comfortable in AWS or Google, because they're doing some AI stuff, which is, you know, very trendy right now, or they are in IBM, any of the IPA scaler, professional developers, system developers, they love that stuff, right? Yeah, they don't, the infrastructure gets in the way, but they're just, the problem is, and I think the Supercloud should be driven by the operators because as we discussed, the operators have been left behind because they're busy with day-to-day jobs, and in most cases IT is centralized, developers are in the business units. >> John: Yeah. >> Right? So they get the mandate from the top, say, "Our bank, they're competing against". They gave teenagers or like young people the ability to do all these new things online, and Venmo and all this integration, where are we? "Oh yeah, we can do it", and then build it, and then deploy it, "Okay, we caught up." but now the operators are back in the private cloud trying to keep the backend system running and so I think the Supercloud is needed for the primarily, initially, for the operators to get in front of the developers, fit in the workflow, but lay the foundation so it is secure.- >> So, so I love this thinking because I love the rift, because the rift points to what is the target audience for the value proposition and if you're a developer, Supercloud enables you so you shouldn't have to deal with Supercloud. >> Exactly. >> What you're saying is get the operating environment or operating system done properly, whether it's architecture, building the platform, this comes back to architecture platform conversations. What is the future platform? Is it a vendor supplied or is it customer created platform? >> Dave: So developers want best to breed, is what you just said. >> Vittorio: Yeah. >> Right and operators, they, 'cause developers don't want to deal with governance, they don't want to deal with security, >> No. >> They don't want to deal with spinning up infrastructure. That's the role of the operator, but that's where Supercloud enables, to John's point, the developer, so to your question, is it a platform where the platform vendor is responsible for the architecture, or there is it an architectural standard that spans multiple clouds that has to emerge? Based on what you just presented earlier, Vittorio, you are the determinant of the architecture. It's got to be open, but you guys determine that, whereas the nirvana is, "Oh no, it's all open, and it just kind of works." >> Yeah, so first of all, let's all level set on one thing. You cannot tell developers what to do. >> Dave: Right, great >> At least great developers, right? Cannot tell them what to do. >> Dave: So that's what, that's the way I want to sort of, >> You can tell 'em what's possible. >> There's a bottle on that >> If you tell 'em what's possible, they'll test it, they'll look at it, but if you try to jam it down their throat, >> Yeah. >> Dave: You can't tell 'em how to do it, just like your point >> Let me answer your answer the question. >> Yeah, yeah. >> So I think we need to build an architect, help them build an architecture, but it cannot be proprietary, has to be built on what works in the cloud and so what works in the cloud today is Kubernetes, is you know, number of different open source project that you need to enable and then provide, use this, but when I first got exposed to Kubernetes, I said, "Hallelujah!" We had a runtime that works the same everywhere only to realize there are 12 different distributions. So that's where we come in, right? And other vendors come in to say, "Hey, no, we can make them all look the same. So you still use Kubernetes, but we give you a place to build, to set those operation policy once so that you don't create friction for the developers because that's the last thing you want to do." >> Yeah, actually, coming back to the same point, not all developers are same, right? So if you're ISV developer, you want to go to the lowest sort of level of the infrastructure and you want to shave off the milliseconds from to get that performance, right? If you're working at AWS, you are doing that. If you're working at scale at Facebook, you're doing that. At Twitter, you're doing that, but when you go to DMV and Kansas City, you're not doing that, right? So your developers are different in nature. They are given certain parameters to work with, certain sort of constraints on the budget side. They are educated at a different level as well. Like they don't go to that end of the degree of sort of automation, if you will. So you cannot have the broad stroking of developers. We are talking about a citizen developer these days. That's a extreme low, >> You mean Low-Code. >> Yeah, Low-Code, No-code, yeah, on the extreme side. On one side, that's citizen developers. On the left side is the professional developers, when you say developers, your mind goes to the professional developers, like the hardcore developers, they love the flexibility, you know, >> John: Well app, developers too, I mean. >> App developers, yeah. >> You're right a lot of, >> Sarbjeet: Infrastructure platform developers, app developers, yes. >> But there are a lot of customers, its a spectrum, you're saying. >> Yes, it's a spectrum >> There's a lot of customers don't want deal with that muck. >> Yeah. >> You know, like you said, AWS, Twitter, the sophisticated developers do, but there's a whole suite of developers out there >> Yeah >> That just want tools that are abstracted. >> Within a company, within a company. Like how I see the Supercloud is there shouldn't be anything which blocks the developers, like their view of the world, of the future. Like if you're blocked as a developer, like something comes in front of you, you are not developer anymore, believe me, (John laughing) so you'll go somewhere else >> John: First of all, I'm, >> You'll leave the company by the way. >> Dave: Yeah, you got to quit >> Yeah, you will quit, you will go where the action is, where there's no sort of blockage there. So like if you put in front of them like a huge amount of a distraction, they don't like it, so they don't, >> Well, the idea of a developer, >> Coming back to that >> Let's get into 'cause you mentioned platform. Get year in the term platform engineering now. >> Yeah. >> Platform developer. You know, I remember back in, and I think there's still a term used today, but when I graduated my computer science degree, we were called "Software engineers," right? Do people use that term "Software engineering", or is it "Software development", or they the same, are they different? >> Well, >> I think there's a, >> So, who's engineering what? Are they engineering or are they developing? Or both? Well, I think it the, you made a great point. There is a factor of, I had the, I was blessed to work with Adam Bosworth, that is the guy that created some of the abstraction layer, like Visual Basic and Microsoft Access and he had so, he made his whole career thinking about this layer, and he always talk about the professional developers, the developers that, you know, give him a user manual, maybe just go at the APIs, he'll build anything, right, from system engine, go down there, and then through obstruction, you get the more the procedural logic type of engineers, the people that used to be able to write procedural logic and visual basic and so on and so forth. I think those developers right now are a little cut out of the picture. There's some No-code, Low-Code environment that are maybe gain some traction, I caught up with Adam Bosworth two weeks ago in New York and I asked him "What's happening to this higher level developers?" and you know what he is told me, and he is always a little bit out there, so I'm going to use his thought process here. He says, "ChapGPT", I mean, they will get to a point where this high level procedural logic will be written by, >> John: Computers. >> Computers, and so we may not need as many at the high level, but we still need the engineers down there. The point is the operation needs to get in front of them >> But, wait, wait, you seen the ChatGPT meme, I dunno if it's a Dilbert thing where it's like, "Time to tic" >> Yeah, yeah, yeah, I did that >> "Time to develop the code >> Five minutes, time to decode", you know, to debug the codes like five hours. So you know, the whole equation >> Well, this ChatGPT is a hot wave, everyone's been talking about it because I think it illustrates something that's NextGen, feels NextGen, and it's just getting started so it's going to get better. I mean people are throwing stones at it, but I think it's amazing. It's the equivalent of me seeing the browser for the first time, you know, like, "Wow, this is really compelling." This is game-changing, it's not just keyword chat bots. It's like this is real, this is next level, and I think the Supercloud wave that people are getting behind points to that and I think the question of Ops and Dev comes up because I think if you limit the infrastructure opportunity for a developer, I think they're going to be handicapped. I mean that's a general, my opinion, the thesis is you give more aperture to developers, more choice, more capabilities, more good things could happen, policy, and that's why you're seeing the convergence of networking people, virtualization talent, operational talent, get into the conversation because I think it's an infrastructure engineering opportunity. I think this is a seminal moment in a new stack that's emerging from an infrastructure, software virtualization, low-code, no-code layer that will be completely programmable by things like the next Chat GPT or something different, but yet still the mechanics and the plumbing will still need engineering. >> Sarbjeet: Oh yeah. >> So there's still going to be more stuff coming on. >> Yeah, we have, with the cloud, we have made the infrastructure programmable and you give the programmability to the programmer, they will be very creative with that and so we are being very creative with our infrastructure now and on top of that, we are being very creative with the silicone now, right? So we talk about that. That's part of it, by the way. So you write the code to the particle's silicone now, and on the flip side, the silicone is built for certain use cases for AI Inference and all that. >> You saw this at CES? >> Yeah, I saw at CES, the scenario is this, the Bosch, I spoke to Bosch, I spoke to John Deere, I spoke to AWS guys, >> Yeah. >> They were showcasing their technology there and I was spoke to Azure guys as well. So the Bosch is a good example. So they are building, they are right now using AWS. I have that interview on camera, I will put it some sometime later on there online. So they're using AWS on the back end now, but Bosch is the number one, number one or number two depending on what day it is of the year, supplier of the componentry to the auto industry, and they are creating a platform for our auto industry, so is Qualcomm actually by the way, with the Snapdragon. So they told me that customers, their customers, BMW, Audi, all the manufacturers, they demand the diversity of the backend. Like they don't want all, they, all of them don't want to go to AWS. So they want the choice on the backend. So whatever they cook in the middle has to work, they have to sprinkle the data for the data sovereign side because they have Chinese car makers as well, and for, you know, for other reasons, competitive reasons and like use. >> People don't go to, aw, people don't go to AWS either for political reasons or like competitive reasons or specific use cases, but for the most part, generally, I haven't met anyone who hasn't gone first choice with either, but that's me personally. >> No, but they're building. >> Point is the developer wants choice at the back end is what I'm hearing, but then finish that thought. >> Their developers want the choice, they want the choice on the back end, number one, because the customers are asking for, in this case, the customers are asking for it, right? But the customers requirements actually drive, their economics drives that decision making, right? So in the middle they have to, they're forced to cook up some solution which is vendor neutral on the backend or multicloud in nature. So >> Yeah, >> Every >> I mean I think that's nirvana. I don't think, I personally don't see that happening right now. I mean, I don't see the parody with clouds. So I think that's a challenge. I mean, >> Yeah, true. >> I mean the fact of the matter is if the development teams get fragmented, we had this chat with Kit Colbert last time, I think he's going to come on and I think he's going to talk about his keynote in a few, in an hour or so, development teams is this, the cloud is heterogenous, which is great. It's complex, which is challenging. You need skilled engineering to manage these clouds. So if you're a CIO and you go all in on AWS, it's hard. Then to then go out and say, "I want to be completely multi-vendor neutral" that's a tall order on many levels and this is the multicloud challenge, right? So, the question is, what's the strategy for me, the CIO or CISO, what do I do? I mean, to me, I would go all in on one and start getting hedges and start playing and then look at some >> Crystal clear. Crystal clear to me. >> Go ahead. >> If you're a CIO today, you have to build a platform engineering team, no question. 'Cause if we agree that we cannot tell the great developers what to do, we have to create a platform engineering team that using pieces of the Supercloud can build, and let's make this very pragmatic and give examples. First you need to be able to lay down the run time, okay? So you need a way to deploy multiple different Kubernetes environment in depending on the cloud. Okay, now we got that. The second part >> That's like table stakes. >> That are table stake, right? But now what is the advantage of having a Supercloud service to do that is that now you can put a policy in one place and it gets distributed everywhere consistently. So for example, you want to say, "If anybody in this organization across all these different buildings, all these developers don't even know, build a PCI compliant microservice, They can only talk to PCI compliant microservice." Now, I sleep tight. The developers still do that. Of course they're going to get their hands slapped if they don't encrypt some messages and say, "Oh, that should have been encrypted." So number one. The second thing I want to be able to say, "This service that this developer built over there better satisfy this SLA." So if the SLA is not satisfied, boom, I automatically spin up multiple instances to certify the SLA. Developers unencumbered, they don't even know. So this for me is like, CIO build a platform engineering team using one of the many Supercloud services that allow you to do that and lay down. >> And part of that is that the vendor behavior is such, 'cause the incentive is that they don't necessarily always work together. (John chuckling) I'll give you an example, we're going to hear today from Western Union. They're AWS shop, but they want to go to Google, they want to use some of Google's AI tools 'cause they're good and maybe they're even arguably better, but they're also a Snowflake customer and what you'll hear from them is Amazon and Snowflake are working together so that SageMaker can be integrated with Snowflake but Google said, "No, you want to use our AI tools, you got to use BigQuery." >> Yeah. >> Okay. So they say, "Ah, forget it." So if you have a platform engineering team, you can maybe solve some of that vendor friction and get competitive advantage. >> I think that the future proximity concept that I talk about is like, when you're doing one thing, you want to do another thing. Where do you go to get that thing, right? So that is very important. Like your question, John, is that your point is that AWS is ahead of the pack, which is true, right? They have the >> breadth of >> Infrastructure by a lot >> infrastructure service, right? They breadth of services, right? So, how do you, When do you bring in other cloud providers, right? So I believe that you should standardize on one cloud provider, like that's your primary, and for others, bring them in on as needed basis, in the subsection or sub portfolio of your applications or your platforms, what ever you can. >> So yeah, the Google AI example >> Yeah, I mean, >> Or the Microsoft collaboration software example. I mean there's always or the M and A. >> Yeah, but- >> You're going to get to run Windows, you can run Windows on Amazon, so. >> By the way, Supercloud doesn't mean that you cannot do that. So the perfect example is say that you're using Azure because you have a SQL server intensive workload. >> Yep >> And you're using Google for ML, great. If you are using some differentiated feature of this cloud, you'll have to go somewhere and configure this widget, but what you can abstract with the Supercloud is the lifecycle manage of the service that runs on top, right? So how does the service get deployed, right? How do you monitor performance? How do you lifecycle it? How you secure it that you can abstract and that's the value and eventually value will win. So the customers will find what is the values, obstructing in making it uniform or going deeper? >> How about identity? Like take identity for instance, you know, that's an opportunity to abstract. Whether I use Microsoft Identity or Okta, and I can abstract that. >> Yeah, and then we have APIs and standards that we can use so eventually I think where there is enough pain, the right open source will emerge to solve that problem. >> Dave: Yeah, I can use abstract things like object store, right? That's pretty simple. >> But back to the engineering question though, is that developers, developers, developers, one thing about developers psychology is if something's not right, they say, "Go get fixing. I'm not touching it until you fix it." They're very sticky about, if something's not working, they're not going to do it again, right? So you got to get it right for developers. I mean, they'll maybe tolerate something new, but is the "juice worth the squeeze" as they say, right? So you can't go to direct say, "Hey, it's, what's a work in progress? We're going to get our infrastructure together and the world's going to be great for you, but just hang tight." They're going to be like, "Get your shit together then talk to me." So I think that to me is the question. It's an Ops question, but where's that value for the developer in Supercloud where the capabilities are there, there's less friction, it's simpler, it solves the complexity problem. I don't need these high skilled labor to manage Amazon. I got services exposed. >> That's what we talked about earlier. It's like the Walmart example. They basically, they took away from the developer the need to spin up infrastructure and worry about all the governance. I mean, it's not completely there yet. So the developer could focus on what he or she wanted to do. >> But there's a big, like in our industry, there's a big sort of flaw or the contention between developers and operators. Developers want to be on the cutting edge, right? And operators want to be on the stability, you know, like we want governance. >> Yeah, totally. >> Right, so they want to control, developers are like these little bratty kids, right? And they want Legos, like they want toys, right? Some of them want toys by way. They want Legos, they want to build there and they want make a mess out of it. So you got to make sure. My number one advice in this context is that do it up your application portfolio and, or your platform portfolio if you are an ISV, right? So if you are ISV you most probably, you're building a platform these days, do it up in a way that you can say this portion of our applications and our platform will adhere to what you are saying, standardization, you know, like Kubernetes, like slam dunk, you know, it works across clouds and in your data center hybrid, you know, whole nine yards, but there is some subset on the next door systems of innovation. Everybody has, it doesn't matter if you're DMV of Kansas or you are, you know, metaverse, right? Or Meta company, right, which is Facebook, they have it, they are building something new. For that, give them some freedom to choose different things like play with non-standard things. So that is the mantra for moving forward, for any enterprise. >> Do you think developers are happy with the infrastructure now or are they wanting people to get their act together? I mean, what's your reaction, or you think. >> Developers are happy as long as they can do their stuff, which is running code. They want to write code and innovate. So to me, when Ballmer said, "Developer, develop, Developer, what he meant was, all you other people get your act together so these developers can do their thing, and to me the Supercloud is the way for IT to get there and let developer be creative and go fast. Why not, without getting in trouble. >> Okay, let's wrap up this segment with a super clip. Okay, we're going to do a sound bite that we're going to make into a short video for each of you >> All right >> On you guys summarizing why Supercloud's important, why this next wave is relevant for the practitioners, for the industry and we'll turn this into an Instagram reel, YouTube short. So we'll call it a "Super clip. >> Alright, >> Sarbjeet, you want, you want some time to think about it? You want to go first? Vittorio, you want. >> I just didn't mind. (all laughing) >> No, okay, okay. >> I'll do it again. >> Go back. No, we got a fresh one. We'll going to already got that one in the can. >> I'll go. >> Sarbjeet, you go first. >> I'll go >> What's your super clip? >> In software systems, abstraction is your friend. I always say that. Abstraction is your friend, even if you're super professional developer, abstraction is your friend. We saw from the MFC library from C++ days till today. Abstract, use abstraction. Do not try to reinvent what's already being invented. Leverage cloud, leverage the platform side of the cloud. Not just infrastructure service, but platform as a service side of the cloud as well, and Supercloud is a meta platform built on top of these infrastructure services from three or four or five cloud providers. So use that and embrace the programmability, embrace the abstraction layer. That's the key actually, and developers who are true developers or professional developers as you said, they know that. >> Awesome. Great super clip. Vittorio, another shot at the plate here for super clip. Go. >> Multicloud is awesome. There's a reason why multicloud happened, is because gave our developers the ability to innovate fast and ever before. So if you are embarking on a digital transformation journey, which I call a survival journey, if you're not innovating and transforming, you're not going to be around in business three, five years from now. You have to adopt the Supercloud so the developer can be developer and keep building great, innovating digital experiences for your customers and IT can get in front of it and not get in trouble together. >> Building those super apps with Supercloud. That was a great super clip. Vittorio, thank you for sharing. >> Thanks guys. >> Sarbjeet, thanks for coming on talking about the developer impact Supercloud 2. On our next segment, coming up right now, we're going to hear from Walmart enterprise architect, how they are building and they are continuing to innovate, to build their own Supercloud. Really informative, instructive from a practitioner doing it in real time. Be right back with Walmart here in Palo Alto. Thanks for watching. (gentle music)

Published Date : Feb 17 2023

SUMMARY :

the Supercloud momentum, and developers came up and you were like, and the conversations we've had. and cloud is the and the role of the stack is changing. I dropped that up there, so, developers are in the business units. the ability to do all because the rift points to What is the future platform? is what you just said. the developer, so to your question, You cannot tell developers what to do. Cannot tell them what to do. You can tell 'em your answer the question. but we give you a place to build, and you want to shave off the milliseconds they love the flexibility, you know, platform developers, you're saying. don't want deal with that muck. that are abstracted. Like how I see the Supercloud is So like if you put in front of them you mentioned platform. and I think there's the developers that, you The point is the operation to decode", you know, the browser for the first time, you know, going to be more stuff coming on. and on the flip side, the middle has to work, but for the most part, generally, Point is the developer So in the middle they have to, the parody with clouds. I mean the fact of the matter Crystal clear to me. in depending on the cloud. So if the SLA is not satisfied, boom, 'cause the incentive is that So if you have a platform AWS is ahead of the pack, So I believe that you should standardize or the M and A. you can run Windows on Amazon, so. So the perfect example is abstract and that's the value Like take identity for instance, you know, the right open source will Dave: Yeah, I can use abstract things and the world's going to be great for you, the need to spin up infrastructure on the stability, you know, So that is the mantra for moving forward, Do you think developers are happy and to me the Supercloud is for each of you for the industry you want some time to think about it? I just didn't mind. got that one in the can. platform side of the cloud. Vittorio, another shot at the the ability to innovate thank you for sharing. the developer impact Supercloud 2.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

DavePERSON

0.99+

BMWORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

JohnPERSON

0.99+

SarbjeetPERSON

0.99+

John FurrierPERSON

0.99+

BoschORGANIZATION

0.99+

VittorioPERSON

0.99+

NvidiaORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

AudiORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Steve BallmerPERSON

0.99+

QualcommORGANIZATION

0.99+

Adam BosworthPERSON

0.99+

Palo AltoLOCATION

0.99+

FacebookORGANIZATION

0.99+

New YorkLOCATION

0.99+

Vittorio ViarengoPERSON

0.99+

Kit ColbertPERSON

0.99+

BallmerPERSON

0.99+

fourQUANTITY

0.99+

Sarbjeet JohalPERSON

0.99+

five hoursQUANTITY

0.99+

VMwareORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

Five minutesQUANTITY

0.99+

NextGenORGANIZATION

0.99+

StackPayneORGANIZATION

0.99+

Visual BasicTITLE

0.99+

second partQUANTITY

0.99+

12 different distributionsQUANTITY

0.99+

CESEVENT

0.99+

FirstQUANTITY

0.99+

TwitterORGANIZATION

0.99+

Kansas CityLOCATION

0.99+

second oneQUANTITY

0.99+

threeQUANTITY

0.99+

bothQUANTITY

0.99+

KansasLOCATION

0.98+

first timeQUANTITY

0.98+

WindowsTITLE

0.98+

last yearDATE

0.98+

Discussion about Walmart's Approach | Supercloud2


 

(upbeat electronic music) >> Okay, welcome back to Supercloud 2, live here in Palo Alto. I'm John Furrier, with Dave Vellante. Again, all day wall-to-wall coverage, just had a great interview with Walmart, we've got a Next interview coming up, you're going to hear from Bob Muglia and Tristan Handy, two experts, both experienced entrepreneurs, executives in technology. We're here to break down what just happened with Walmart, and what's coming up with George Gilbert, former colleague, Wikibon analyst, Gartner Analyst, and now independent investor and expert. George, great to see you, I know you're following this space. Like you read about it, remember the first days when Dataverse came out, we were talking about them coming out of Berkeley? >> Dave: Snowflake. >> John: Snowflake. >> Dave: Snowflake In the early days. >> We, collectively, have been chronicling the data movement since 2010, you were part of our team, now you've got your nose to the grindstone, you're seeing the next wave. What's this all about? Walmart building their own super cloud, we got Bob Muglia talking about how these next wave of apps are coming. What are the super apps? What's the super cloud to you? >> Well, this key's off Dave's really interesting questions to Walmart, which was like, how are they building their supercloud? 'Cause it makes a concrete example. But what was most interesting about his description of the Walmart WCMP, I forgot what it stood for. >> Dave: Walmart Cloud Native Platform. >> Walmart, okay. He was describing where the logic could run in these stateless containers, and maybe eventually serverless functions. But that's just it, and that's the paradigm of microservices, where the logic is in this stateless thing, where you can shoot it, or it fails, and you can spin up another one, and you've lost nothing. >> That was their triplet model. >> Yeah, in fact, and that was what they were trying to move to, where these things move fluidly between data centers. >> But there's a but, right? Which is they're all stateless apps in the cloud. >> George: Yeah. >> And all their stateful apps are on-prem and VMs. >> Or the stateful part of the apps are in VMs. >> Okay. >> And so if they really want to lift their super cloud layer off of this different provider's infrastructure, they're going to need a much more advanced software platform that manages data. And that goes to the -- >> Muglia and Handy, that you and I did, that's coming up next. So the big takeaway there, George, was, I'll set it up and you can chime in, a new breed of data apps is emerging, and this highly decentralized infrastructure. And Tristan Handy of DBT Labs has a sort of a solution to begin the journey today, Muglia is working on something that's way out there, describe what you learned from it. >> Okay. So to talk about what the new data apps are, and then the platform to run them, I go back to the using what will probably be seen as one of the first data app examples, was Uber, where you're describing entities in the real world, riders, drivers, routes, city, like a city plan, these are all defined by data. And the data is described in a structure called a knowledge graph, for lack of a, no one's come up with a better term. But that means the tough, the stuff that Jack built, which was all stateless and sits above cloud vendors' infrastructure, it needs an entirely different type of software that's much, much harder to build. And the way Bob described it is, you're going to need an entirely new data management infrastructure to handle this. But where, you know, we had this really colorful interview where it was like Rock 'Em Sock 'Em, but they weren't really that much in opposition to each other, because Tristan is going to define this layer, starting with like business intelligence metrics, where you're defining things like bookings, billings, and revenue, in business terms, not in SQL terms -- >> Well, business terms, if I can interrupt, he said the one thing we haven't figured out how to APIify is KPIs that sit inside of a data warehouse, and that's essentially what he's doing. >> George: That's what he's doing, yes. >> Right. And so then you can now expose those APIs, those KPIs, that sit inside of a data warehouse, or a data lake, a data store, whatever, through APIs. >> George: And the difference -- >> So what does that do for you? >> Okay, so all of a sudden, instead of working at technical data terms, where you're dealing with tables and columns and rows, you're dealing instead with business entities, using the Uber example of drivers, riders, routes, you know, ETA prices. But you can define, DBT will be able to define those progressively in richer terms, today they're just doing things like bookings, billings, and revenue. But Bob's point was, today, the data warehouse that actually runs that stuff, whereas DBT defines it, the data warehouse that runs it, you can't do it with relational technology >> Dave: Relational totality, cashing architecture. >> SQL, you can't -- >> SQL caching architectures in memory, you can't do it, you've got to rethink down to the way the data lake is laid out on the disk or cache. Which by the way, Thomas Hazel, who's speaking later, he's the chief scientist and founder at Chaos Search, he says, "I've actually done this," basically leave it in an S3 bucket, and I'm going to query it, you know, with no caching. >> All right, so what I hear you saying then, tell me if I got this right, there are some some things that are inadequate in today's world, that's not compatible with the Supercloud wave. >> Yeah. >> Specifically how you're using storage, and data, and stateful. >> Yes. >> And then the software that makes it run, is that what you're saying? >> George: Yeah. >> There's one other thing you mentioned to me, it's like, when you're using a CRM system, a human is inputting data. >> George: Nothing happens till the human does something. >> Right, nothing happens until that data entry occurs. What you're talking about is a world that self forms, polling data from the transaction system, or the ERP system, and then builds a plan without human intervention. >> Yeah. Something in the real world happens, where the user says, "I want a ride." And then the software goes out and says, "Okay, we got to match a driver to the rider, we got to calculate how long it takes to get there, how long to deliver 'em." That's not driven by a form, other than the first person hitting a button and saying, "I want a ride." All the other stuff happens autonomously, driven by data and analytics. >> But my question was different, Dave, so I want to get specific, because this is where the startups are going to come in, this is the disruption. Snowflake is a data warehouse that's in the cloud, they call it a data cloud, they refactored it, they did it differently, the success, we all know it looks like. These areas where it's inadequate for the future are areas that'll probably be either disrupted, or refactored. What is that? >> That's what Muglia's contention is, that the DBT can start adding that layer where you define these business entities, they're like mini digital twins, you can define them, but the data warehouse isn't strong enough to actually manage and run them. And Muglia is behind a company that is rethinking the database, really in a fundamental way that hasn't been done in 40 or 50 years. It's the first, in his contention, the first real rethink of database technology in a fundamental way since the rise of the relational database 50 years ago. >> And I think you admit it's a real Hail Mary, I mean it's quite a long shot right? >> George: Yes. >> Huge potential. >> But they're pretty far along. >> Well, we've been talking on theCUBE for 12 years, and what, 10 years going to AWS Reinvent, Dave, that no one database will rule the world, Amazon kind of showed that with them. What's different, is it databases are changing, or you can have multiple databases, or? >> It's a good question. And the reason we've had multiple different types of databases, each one specialized for a different type of workload, but actually what Muglia is behind is a new engine that would essentially, you'll never get rid of the data warehouse, or the equivalent engine in like a Databricks datalake house, but it's a new engine that manages the thing that describes all the data and holds it together, and that's the new application platform. >> George, we have one minute left, I want to get real quick thought, you're an investor, and we know your history, and the folks watching, George's got a deep pedigree in investment data, and we can testify against that. If you're going to invest in a company right now, if you're a customer, I got to make a bet, what does success look like for me, what do I want walking through my door, and what do I want to send out? What companies do I want to look at? What's the kind of of vendor do I want to evaluate? Which ones do I want to send home? >> Well, the first thing a customer really has to do when they're thinking about next gen applications, all the people have told you guys, "we got to get our data in order," getting that data in order means building an integrated view of all your data landscape, which is data coming out of all your applications. It starts with the data model, so, today, you basically extract data from all your operational systems, put it in this one giant, central place, like a warehouse or lake house, but eventually you want this, whether you call it a fabric or a mesh, it's all the data that describes how everything hangs together as in one big knowledge graph. There's different ways to implement that. And that's the most critical thing, 'cause that describes your Uber landscape, your Uber platform. >> That's going to power the digital transformation, which will power the business transformation, which powers the business model, which allows the builders to build -- >> Yes. >> Coders to code. That's Supercloud application. >> Yeah. >> George, great stuff. Next interview you're going to see right here is Bob Muglia and Tristan Handy, they're going to unpack this new wave. Great segment, really worth unpacking and reading between the lines with George, and Dave Vellante, and those two great guests. And then we'll come back here for the studio for more of the live coverage of Supercloud 2. Thanks for watching. (upbeat electronic music)

Published Date : Feb 17 2023

SUMMARY :

remember the first days What's the super cloud to you? of the Walmart WCMP, I and that's the paradigm of microservices, and that was what they stateless apps in the cloud. And all their stateful of the apps are in VMs. And that goes to the -- Muglia and Handy, that you and I did, But that means the tough, he said the one thing we haven't And so then you can now the data warehouse that runs it, Dave: Relational totality, Which by the way, Thomas I hear you saying then, and data, and stateful. thing you mentioned to me, George: Nothing happens polling data from the transaction Something in the real world happens, that's in the cloud, that the DBT can start adding that layer Amazon kind of showed that with them. and that's the new application platform. and the folks watching, all the people have told you guys, Coders to code. for more of the live

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

GeorgePERSON

0.99+

Bob MugliaPERSON

0.99+

Tristan HandyPERSON

0.99+

DavePERSON

0.99+

BobPERSON

0.99+

Thomas HazelPERSON

0.99+

George GilbertPERSON

0.99+

AmazonORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

John FurrierPERSON

0.99+

Palo AltoLOCATION

0.99+

Chaos SearchORGANIZATION

0.99+

JackPERSON

0.99+

TristanPERSON

0.99+

12 yearsQUANTITY

0.99+

BerkeleyLOCATION

0.99+

UberORGANIZATION

0.99+

firstQUANTITY

0.99+

DBT LabsORGANIZATION

0.99+

10 yearsQUANTITY

0.99+

two expertsQUANTITY

0.99+

Supercloud 2TITLE

0.99+

GartnerORGANIZATION

0.99+

AWSORGANIZATION

0.99+

bothQUANTITY

0.99+

MugliaORGANIZATION

0.99+

one minuteQUANTITY

0.99+

40QUANTITY

0.99+

two great guestsQUANTITY

0.98+

WikibonORGANIZATION

0.98+

50 yearsQUANTITY

0.98+

JohnPERSON

0.98+

Rock 'Em Sock 'EmTITLE

0.98+

todayDATE

0.98+

first personQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

S3COMMERCIAL_ITEM

0.97+

50 years agoDATE

0.97+

2010DATE

0.97+

MaryPERSON

0.96+

first daysQUANTITY

0.96+

SQLTITLE

0.96+

oneQUANTITY

0.95+

Supercloud waveEVENT

0.95+

each oneQUANTITY

0.93+

DBTORGANIZATION

0.91+

SupercloudTITLE

0.91+

Supercloud2TITLE

0.91+

Supercloud 2ORGANIZATION

0.89+

SnowflakeTITLE

0.86+

DataverseORGANIZATION

0.83+

tripletQUANTITY

0.78+

Chat w/ Arctic Wolf exec re: budget restraints could lead to lax cloud security


 

>> Now we're recording. >> All right. >> Appreciate that, Hannah. >> Yeah, so I mean, I think in general we continue to do very, very well as a company. I think like everybody, there's economic headwinds today that are unavoidable, but I think we have a couple things going for us. One, we're in the cyberspace, which I think is, for the most part, recession proof as an industry. I think the impact of a recession will impact some vendors and some categories, but in general, I think the industry is pretty resilient. It's like the power industry, no? Recession or not, you still need electricity to your house. Cybersecurity is almost becoming a utility like that as far as the needs of companies go. I think for us, we also have the ability to do the security, the security operations, for a lot of companies, and if you look at the value proposition, the ROI for the cost of less than one to maybe two or three, depending on how big you are as a customer, what you'd have to pay for half to three security operations people, we can give you a full security operations. And so the ROI is is almost kind of brain dead simple, and so that keeps us going pretty well. And I think the other areas, we remove all that complexity for people. So in a world where you got other problems to worry about, handling all the security complexity is something that adds to that ROI. So for us, I think what we're seeing is mostly is some of the larger deals are taking a little bit longer than they have, some of the large enterprise deals, 'cause I think they are being a little more cautious about how they spend it, but in general, business is still kind of cranking along. >> Anything you can share with me that you guys have talked about publicly in terms of any metrics, or what can you tell me other than cranking? >> Yeah, I mean, I would just say we're still very, very high growth, so I think our financial profile would kind of still put us clearly in the cyber unicorn position, but I think other than that, we don't really share business metrics as a private- >> Okay, so how about headcount? >> Still growing. So we're not growing as fast as we've been growing, but I don't think we were anyway. I think we kind of, we're getting to the point of critical mass. We'll start to grow in a more kind of normal course and speed. I don't think we overhired like a lot of companies did in the past, even though we added, almost doubled the size of the company in the last 18 months. So we're still hiring, but very kind of targeted to certain roles going forward 'cause I do think we're kind of at critical mass in some of the other functions. >> You disclose headcount or no? >> We do not. >> You don't, okay. And never have? >> Not that I'm aware of, no. >> Okay, on the macro, I don't know if security's recession proof, but it's less susceptible, let's say. I've had Nikesh Arora on recently, we're at Palo Alto's Ignite, and he was saying, "Look," it's just like you were saying, "Larger deal's a little harder." A lot of times customers, he was saying customers are breaking larger deals into smaller deals, more POCs, more approvals, more people to get through the approval, not whole, blah, blah, blah. Now they're a different animal, I understand, but are you seeing similar trends, and how are you dealing with that? >> Yeah, I think the exact same trends, and I think it's just in a world where spending a dollar matters, I think a lot more oversight comes into play, a lot more reviewers, and can you shave it down here? Can you reduce the scope of the project to save money there? And I think it just caused a lot of those things. I think, in the large enterprise, I think most of those deals for companies like us and Palo and CrowdStrike and kind of the upper tier companies, they'll still go through. I think they'll just going to take a lot longer, and, yeah, maybe they're 80% of what they would've been otherwise, but there's still a lot of business to be had out there. >> So how are you dealing with that? I mean, you're talking about you double the size of the company. Is it kind of more focused on go-to-market, more sort of, maybe not overlay, but sort of SE types that are going to be doing more handholding. How have you dealt with that? Or have you just sort of said, "Hey, it is what it is, and we're not going to, we're not going to tactically respond to. We got long-term direction"? >> Yeah, I think it's more the latter. I think for us, it's we've gone through all these things before. It just takes longer now. So a lot of the steps we're taking are the same steps. We're still involved in a lot of POCs, we're involved in a lot of demos, and I don't think that changed. It's just the time between your POC and when someone sends you the PO, there's five more people now got to review things and go through a budget committee and all sorts of stuff like that. I think where we're probably focused more now is adding more and more capabilities just so we continue to be on the front foot of innovation and being relevant to the market, and trying to create more differentiators for us and the competitors. That's something that's just built into our culture, and we don't want to slow that down. And so even though the business is still doing extremely, extremely well, we want to keep investing in kind of technology. >> So the deal size, is it fair to say the initial deal size for new accounts, while it may be smaller, you're adding more capabilities, and so over time, your average contract values will go up? Are you seeing that trend? Or am I- >> Well, I would say I don't even necessarily see our average deal size has gotten smaller. I think in total, it's probably gotten a little bigger. I think what happens is when something like this happens, the old cream rises to the top thing, I think, comes into play, and you'll see some organizations instead of doing a deal with three or four vendors, they may want to pick one or two and really kind of put a lot of energy behind that. For them, they're maybe spending a little less money, but for those vendors who are amongst those getting chosen, I think they're doing pretty good. So our average deal size is pretty stable. For us, it's just a temporal thing. It's just the larger deals take a little bit longer. I don't think we're seeing much of a deal velocity difference in our mid-market commercial spaces, but in the large enterprise it's a little bit slower. But for us, we have ambitious plans in our strategy or on how we want to execute and what we want to build, and so I think we want to just continue to make sure we go down that path technically. >> So I have some questions on sort of the target markets and the cohorts you're going after, and I have some product questions. I know we're somewhat limited on time, but the historical focus has been on SMB, and I know you guys have gone in into enterprise. I'm curious as to how that's going. Any guidance you can give me on mix? Or when I talk to the big guys, right, you know who they are, the big managed service providers, MSSPs, and they're like, "Poo poo on Arctic Wolf," like, "Oh, they're (groans)." I said, "Yeah, that's what they used to say about the PC. It's just a toy. Or Microsoft SQL Server." But so I kind of love that narrative for you guys, but I'm curious from your words as to, what is that enterprise? How's the historical business doing, and how's the entrance into the enterprise going? What kind of hurdles are you having, blockers are you having to remove? Any color you can give me there would be super helpful. >> Yeah, so I think our commercial S&B business continues to do really good. Our mid-market is a very strong market for us. And I think while a lot of companies like to focus purely on large enterprise, there's a lot more mid-market companies, and a much larger piece of the IT puzzle collectively is in mid-market than it is large enterprise. That being said, we started to get pulled into the large enterprise not because we're a toy but because we're quite a comprehensive service. And so I think what we're trying to do from a roadmap perspective is catch up with some of the kind of capabilities that a large enterprise would want from us that a potential mid-market customer wouldn't. In some case, it's not doing more. It's just doing it different. Like, so we have a very kind of hands-on engagement with some of our smaller customers, something we call our concierge. Some of the large enterprises want more of a hybrid where they do some stuff and you do some stuff. And so kind of building that capability into the platform is something that's really important for us. Just how we engage with them as far as giving 'em access to their data, the certain APIs they want, things of that nature, what we're building out for large enterprise, but the demand by large enterprise on our business is enormous. And so it's really just us kind of catching up with some of the kind of the features that they want that we lack today, but many of 'em are still signing up with us, obviously, and in lieu of that, knowing that it's coming soon. And so I think if you look at the growth of our large enterprise, it's one of our fastest growing segments, and I think it shows anything but we're a toy. I would be shocked, frankly, if there's an MSSP, and, of course, we don't see ourself as an MSSP, but I'd be shocked if any of them operate a platform at the scale that ours operates. >> Okay, so wow. A lot I want to unpack there. So just to follow up on that last question, you don't see yourself as an MSSP because why, you see yourselves as a technology platform? >> Yes, I mean, the vast, vast, vast majority of what we deliver is our own technology. So we integrate with third-party solutions mostly to bring in that telemetry. So we've built our own platform from the ground up. We have our own threat intelligence, our own detection logic. We do have our own agents and network sensors. MSSP is typically cobbling together other tools, third party off-the-shelf tools to run their SOC. Ours is all homegrown technology. So I have a whole group called Arctic Wolf Labs, is building, just cranking out ML-based detections, building out infrastructure to take feeds in from a variety of different sources. We have a full integration kind of effort where we integrate into other third parties. So when we go into a customer, we can leverage whatever they have, but at the same time, we produce some tech that if they're lacking in a certain area, we can provide that tech, particularly around things like endpoint agents and network sensors and the like. >> What about like identity, doing your own identity? >> So we don't do our own identity, but we take feeds in from things like Okta and Active Directory and the like, and we have detection logic built on top of that. So part of our value add is we were XDR before XDR was the cool thing to talk about, meaning we can look across multiple attack surfaces and come to a security conclusion where most EDR vendors started with looking just at the endpoint, right? And then they called themselves XDR because now they took in a network feed, but they still looked at it as a separate network detection. We actually look at the things across multiple attack surfaces and stitch 'em together to look at that from a security perspective. In some cases we have automatic detections that will fire. In other cases, we can surface some to a security professional who can go start pulling on that thread. >> So you don't need to purchase CrowdStrike software and integrate it. You have your own equivalent essentially. >> Well, we'll take a feed from the CrowdStrike endpoint into our platform. We don't have to rely on their detections and their alerts, and things of that nature. Now obviously anything they discover we pull in as well, it's just additional context, but we have all our own tech behind it. So we operate kind of at an MSSP scale. We have a similar value proposition in the sense that we'll use whatever the customer has, but once that data kind of comes into our pipeline, it's all our own homegrown tech from there. >> But I mean, what I like about the MSSP piece of your business is it's very high touch. It's very intimate. What I like about what you're saying is that it's software-like economics, so software, software-like part of it. >> That's what makes us the unicorn, right? Is we do have, our concierges is very hands-on. We continue to drive automation that makes our concierge security professionals more efficient, but we always want that customer to have that concierge person as, is almost an extension to their security team, or in some cases, for companies that don't even have a security team, as their security team. As we go down the path, as I mentioned, one of the things we want to be able to do is start to have a more flexible model where we can have that high touch if you want it. We can have the high touch on certain occasions, and you can do stuff. We can have low touch, like we can span the spectrum, but we never want to lose our kind of unique value proposition around the concierge, but we also want to make sure that we're providing an interface that any customer would want to use. >> So given that sort of software-like economics, I mean, services companies need this too, but especially in software, things like net revenue retention and churn are super important. How are those metrics looking? What can you share with me there? >> Yeah, I mean, again, we don't share those metrics publicly, but all's I can continue to repeat is, if you looked at all of our financial metrics, I think you would clearly put us in the unicorn category. I think very few companies are going to have the level of growth that we have on the amount of ARR that we have with the net revenue retention and the churn and upsell. All those aspects continue to be very, very strong for us. >> I want to go back to the sort of enterprise conversation. So large enterprises would engage with you as a complement to their existing SOC, correct? Is that a fair statement or not necessarily? >> It's in some cases. In some cases, they're looking to not have a SOC. So we run into a lot of cases where they want to replace their SIEM, and they want a solution like Arctic Wolf to do that. And so there's a poll, I can't remember, I think it was Forrester, IDC, one of them did it a couple years ago, and they found out that 70% of large enterprises do not want to build the SOC, and it's not 'cause they don't need one, it's 'cause they can't afford it, they can't staff it, they don't have the expertise. And you think about if you're a tech company or a bank, or something like that, of course you can do it, but if you're an international plumbing distributor, you're not going to (chuckles), someone's not going to graduate from Stanford with a cybersecurity degree and go, "Cool, I want to go work for a plumbing distributor in their SOC," right? So they're going to have trouble kind of bringing in the right talent, and as a result, it's difficult to go make a multimillion-dollar investment into a SOC if you're not going to get the quality people to operate it, so they turn to companies like us. >> Got it, so, okay, so you're talking earlier about capabilities that large enterprises require that there might be some gaps, you might lack some features. A couple questions there. One is, when you do some of those, I inferred some of that is integrations. Are those integrations sort of one-off snowflakes or are you finding that you're able to scale those across the large enterprises? That's my first question. >> Yeah, so most of the integrations are pretty straightforward. I think where we run into things that are kind of enterprise-centric, they definitely want open APIs, they want access to our platform, which we don't do today, which we are going to be doing, but we don't do that yet today. They want to do more of a SIEM replacement. So we're really kind of what we call an open XDR platform, so there's things that we would need to build to kind of do raw log ingestion. I mean, we do this today. We have raw log ingestion, we have log storage, we have log searching, but there's like some of the compliance scenarios that they need out of their SIEM. We don't do those today. And so that's kind of holding them back from getting off their SIEM and going fully onto a solution like ours. Then the other one is kind of the level of customization, so the ability to create a whole bunch of custom rules, and that ties back to, "I want to get off my SIEM. I've built all these custom rules in my SIEM, and it's great that you guys do all this automatic AI stuff in the background, but I need these very specific things to be executed on." And so trying to build an interface for them to be able to do that and then also simulate it, again, because, no matter how big they are running their SIEM and their SOC... Like, we talked to one of the largest financial institutions in the world. As far as we were told, they have the largest individual company SOC in the world, and we operate almost 15 times their size. So we always have to be careful because this is a cloud-based native platform, but someone creates some rule that then just craters the performance of the whole platform, so we have to build kind of those guardrails around it. So those are the things primarily that the large enterprises are asking for. Most of those issues are not holding them back from coming. They want to know they're coming, and we're working on all of those. >> Cool, and see, just aside, I was talking to CISO the other day, said, "If it weren't for my compliance and audit group, I would chuck my SIEM." I mean, everybody wants to get rid of their SIEM. >> I've never met anyone who likes their SIEM. >> Do you feel like you've achieved product market fit in the larger enterprise or is that still something that you're sorting out? >> So I think we know, like, we're on a path to do that. We're on a provable path to do that, so I don't think there's any surprises left. I think everything that we know we need to do for that is someone's writing code for it today. It's just a matter of getting it through the system and getting into production. So I feel pretty good about it. I think that's why we are seeing such a high growth rate in our large enterprise business, 'cause we share that feedback with some of those key customers. We have a Customer Advisory Board that we share a lot of this information with. So yeah, I mean, I feel pretty good about what we need to do. We're certainly operate at large enterprise scales, so taking in the amount of the volume of data they're going to have and the types of integrations they need. We're comfortable with that. It's just more or less the interfaces that a large enterprise would want that some of the smaller companies don't ask for. >> Do you have enough tenure in the market to get a sense as to stickiness or even indicators that will lead toward retention? Have you been at it long enough in the enterprise or you still, again, figuring that out? >> Yeah, no, I think we've been at it long enough, and our retention rates are extremely high. If anything, kind of our net retention rates, well over 100% 'cause we have opportunities to upsell into new modules and expanding the coverage of what they have today. I think the areas that if you cornered enterprise that use us and things they would complain about are things I just told you about, right? There's still some things I want to do in my Splunk, and I need an API to pull my data out and put it in my Splunk and stuff like that, and those are the things we want to enable. >> Yeah, so I can't wait till you guys go public because you got Snowflake up here, and you got Veritas down here, and I'm very curious as to where you guys go. When's the IPO? You want to tell me that? (chuckling) >> Unfortunately, it's not up to us right now. You got to get the markets- >> Yeah, I hear you. Right, if the market were better. Well, if the market were better, you think you'd be out? >> Yeah, I mean, we'd certainly be a viable candidate to go. >> Yeah, there you go. I have a question for you because I don't have a SOC. I run a small business with my co-CEO. We're like 30, 40 people W-2s, we got another 50 or so contractors, and I'm always like have one eye, sleep with one eye open 'cause of security. What is your ideal SMB customer? Think S. >> Yeah. >> Would I fit? >> Yeah, I mean you're you're right in the sweet spot. I think where the company started and where we still have a lot of value proposition, which is companies like, like you said it, you sleep with one eye open, but you don't have necessarily the technical acumen to be able to do that security for yourself, and that's where we fit in. We bring kind of this whole security, we call it Security Operations Cloud, to bear, and we have some of the best professionals in the world who can basically be your SOC for less than it would cost you to hire somebody right out of college to do IT stuff. And so the value proposition's there. You're going to get the best of the best, providing you a kind of a security service that you couldn't possibly build on your own, and that way you can go to bed at night and close both eyes. >> So (chuckling) I'm sure something else would keep me up. But so in thinking about that, our Amazon bill keeps growing and growing and growing. What would it, and I presume I can engage with you on a monthly basis, right? As a consumption model, or how's the pricing work? >> Yeah, so there's two models that we have. So typically the kind of the monthly billing type of models would be through one of our MSP partners, where they have monthly billing capabilities. Usually direct with us is more of a longer term deal, could be one, two, or three, or it's up to the customer. And so we have both of those engagement models. Were doing more and more and more through MSPs today because of that model you just described, and they do kind of target the very S in the SMB as well. >> I mean, rough numbers, even ranges. If I wanted to go with the MSP monthly, I mean, what would a small company like mine be looking at a month? >> Honestly, I do not even know the answer to that. >> We're not talking hundreds of thousands of dollars a month? >> No. God, no. God, no. No, no, no. >> I mean, order of magnitude, we're talking thousands, tens of thousands? >> Thousands, on a monthly basis. Yeah. >> Yeah, yeah. Thousands per month. So if I were to budget between 20 and $50,000 a year, I'm definitely within the envelope. Is that fair? I mean, I'm giving a wide range >> That's fair. just to try to make- >> No, that's fair. >> And if I wanted to go direct with you, I would be signing up for a longer term agreement, correct, like I do with Salesforce? >> Yeah, yeah, a year. A year would, I think, be the minimum for that, and, yeah, I think the budget you set aside is kind of right in the sweet spot there. >> Yeah, I'm interested, I'm going to... Have a sales guy call me (chuckles) somehow. >> All right, will do. >> No, I'm serious. I want to start >> I will. >> investigating these things because we sell to very large organizations. I mean, name a tech company. That's our client base, except for Arctic Wolf. We should talk about that. And increasingly they're paranoid about data protection agreements, how you're protecting your data, our data. We write a lot of software and deliver it as part of our services, so it's something that's increasingly important. It's certainly a board level discussion and beyond, and most large organizations and small companies oftentimes don't think about it or try not to. They just put their head in the sand and, "We don't want to be doing that," so. >> Yeah, I will definitely have someone get in touch with you. >> Cool. Let's see. Anything else you can tell me on the product side? Are there things that you're doing that we talked about, the gaps at the high end that you're, some of the features that you're building in, which was super helpful. Anything in the SMB space that you want to share? >> Yeah, I think the biggest thing that we're doing technically now is really trying to drive more and more automation and efficiency through our operations, and that comes through really kind of a generous use of AI. So building models around more efficient detections based upon signal, but also automating the actions of our operators so we can start to learn through the interface. When they do A and B, they always do C. Well, let's just do C for them, stuff like that. Then also building more automation as far as the response back to third-party solutions as well so we can remediate more directly on third-party products without having to get into the consoles or having our customers do it. So that's really just trying to drive efficiency in the system, and that helps provide better security outcomes but also has a big impact on our margins as well. >> I know you got to go, but I want to show you something real quick. I have data. I do a weekly program called "Breaking Analysis," and I have a partner called ETR, Enterprise Technology Research, and they have a platform. I don't know if you can see this. They have a survey platform, and each quarter, they do a survey of about 1,500 IT decision makers. They also have a survey on, they call ETS, Emerging Technology Survey. So it's private companies. And I don't want to go into it too much, but this is a sentiment graph. This is net sentiment. >> Just so you know, all I see is a white- >> Yeah, just a white bar. >> Oh, that's weird. Oh, whiteboard. Oh, here we go. How about that? >> There you go. >> Yeah, so this is a sentiment graph. So this is net sentiment and this is mindshare. And if I go to Arctic Wolf... So it's typical security, right? The 8,000 companies. And when I go here, what impresses me about this is you got a decent mindshare, that's this axis, but you've also got an N in the survey. It's about 1,500 in the survey, It's 479 Arctic Wolf customers responded to this. 57% don't know you. Oh, sorry, they're aware of you, but no plan to evaluate; 19% plan to evaluate, 7% are evaluating; 11%, no plan to utilize even though they've evaluated you; and 1% say they've evaluated you and plan to utilize. It's a small percentage, but actually it's not bad in the random sample of the world about that. And so obviously you want to get that number up, but this is a really impressive position right here that I wanted to just share with you. I do a lot of analysis weekly, and this is a really, it's completely independent survey, and you're sort of separating from the pack, as you can see. So kind of- >> Well, it's good to see that. And I think that just is a further indicator of what I was telling you. We continue to have a strong financial performance. >> Yeah, in a good market. Okay, well, thanks you guys. And hey, if I can get this recording, Hannah, I may even figure out how to write it up. (chuckles) That would be super helpful. >> Yes. We'll get that up. >> And David or Hannah, if you can send me David's contact info so I can get a salesperson in touch with him. (Hannah chuckling) >> Yeah, great. >> Yeah, we'll work on that as well. Thanks so much for both your time. >> Thanks a lot. It was great talking with you. >> Thanks, you guys. Great to meet you. >> Thank you. >> Bye. >> Bye.

Published Date : Feb 15 2023

SUMMARY :

I think for us, we also have the ability I don't think we overhired And never have? and how are you dealing with that? I think they'll just going to that are going to be So a lot of the steps we're and so I think we want to just continue and the cohorts you're going after, And so I think if you look at the growth So just to follow up but at the same time, we produce some tech and Active Directory and the like, So you don't need to but we have all our own tech behind it. like about the MSSP piece one of the things we want So given that sort of of growth that we have on the So large enterprises would engage with you kind of bringing in the right I inferred some of that is integrations. and it's great that you guys do to get rid of their SIEM. I've never met anyone I think everything that we and expanding the coverage to where you guys go. You got to get the markets- Well, if the market were Yeah, I mean, we'd certainly I have a question for you and that way you can go to bed I can engage with you because of that model you just described, the MSP monthly, I mean, know the answer to that. No. God, no. Thousands, on a monthly basis. I mean, I'm giving just to try to make- is kind of right in the sweet spot there. Yeah, I'm interested, I'm going to... I want to start because we sell to very get in touch with you. doing that we talked about, of our operators so we can start to learn I don't know if you can see this. Oh, here we go. from the pack, as you can see. And I think that just I may even figure out how to write it up. if you can send me David's contact info Thanks so much for both your time. great talking with you. Great to meet you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

HannahPERSON

0.99+

two modelsQUANTITY

0.99+

threeQUANTITY

0.99+

Arctic Wolf LabsORGANIZATION

0.99+

oneQUANTITY

0.99+

80%QUANTITY

0.99+

70%QUANTITY

0.99+

Arctic WolfORGANIZATION

0.99+

twoQUANTITY

0.99+

AmazonORGANIZATION

0.99+

30QUANTITY

0.99+

PaloORGANIZATION

0.99+

479QUANTITY

0.99+

halfQUANTITY

0.99+

19%QUANTITY

0.99+

first questionQUANTITY

0.99+

ForresterORGANIZATION

0.99+

50QUANTITY

0.99+

8,000 companiesQUANTITY

0.99+

ThousandsQUANTITY

0.99+

1%QUANTITY

0.99+

7%QUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

57%QUANTITY

0.99+

IDCORGANIZATION

0.99+

CrowdStrikeORGANIZATION

0.99+

todayDATE

0.99+

A yearQUANTITY

0.99+

one eyeQUANTITY

0.99+

bothQUANTITY

0.99+

both eyesQUANTITY

0.99+

each quarterQUANTITY

0.99+

less than oneQUANTITY

0.98+

11%QUANTITY

0.98+

OneQUANTITY

0.98+

five more peopleQUANTITY

0.98+

axisORGANIZATION

0.98+

thousandsQUANTITY

0.98+

tens of thousandsQUANTITY

0.97+

VeritasORGANIZATION

0.97+

about 1,500 IT decision makersQUANTITY

0.97+

20QUANTITY

0.97+

a yearQUANTITY

0.96+

SalesforceORGANIZATION

0.96+

ETSORGANIZATION

0.96+

StanfordORGANIZATION

0.96+

40 peopleQUANTITY

0.95+

over 100%QUANTITY

0.95+

couple years agoDATE

0.95+

CISOORGANIZATION

0.94+

four vendorsQUANTITY

0.94+

$50,000 a yearQUANTITY

0.93+

about 1,500QUANTITY

0.92+

Enterprise Technology ResearchORGANIZATION

0.92+

almost 15 timesQUANTITY

0.91+

couple questionsQUANTITY

0.91+

CrowdStrikeTITLE

0.9+

hundreds of thousands of dollars a monthQUANTITY

0.9+

ETRORGANIZATION

0.88+

last 18 monthsDATE

0.87+

SQL ServerTITLE

0.84+

three securityQUANTITY

0.84+

Breaking AnalysisTITLE

0.82+

Thousands per monthQUANTITY

0.8+

XDRTITLE

0.79+

a monthQUANTITY

0.74+

SIEMTITLE

0.74+

ArcticORGANIZATION

0.74+

How to Make a Data Fabric "Smart": A Technical Demo With Jess Jowdy


 

>> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities including data exploration, business intelligence natural language processing, and machine learning directly within the fabric, makes it faster and easier for organizations to gain new insights and power intelligence, predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi. Yeah, thank you so much for having me. And so for this demo we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software or adverse reaction warnings from a clinical risk grouping application and so much more. So I'm really going to be assimilating a patient logging on in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here and I'm going to be looking for information where the last name of this patient is Simmons and their medical record number their patient identifier in the system is 32345. And so as you can see I have this single JSON payload that showed up here of just relevant clinical information for my patient whose last name is Simmons all within a single response. So fantastic, right? Typically though when we see responses that look like this there is an assumption that this service is interacting with a single backend system and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here we have our data fabric coordinator which is going to be in charge of this refinement and analysis those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end we do also support full lifecycle API management within this platform. When you're dealing with APIs I always like to make a little shout out on this that you really want to make sure you have enough like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what contact. >> Can I just interrupt you for a second? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you can have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry and API securities are really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of security. >> And that's been designed in, >> Absolutely, yes. it's not a bolt-on as they like to say. Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like FIRE, interactions with a homegrown enterprise data warehouse for instance may use SQL for a cloud-based solutions managed by a vendor. They may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and and applications. And I'm about to log out so I'm going to keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST or SOAP or SQL or FTP regardless of that protocol there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as, in healthcare we have H7, we have FIRE we have CCDs across the industry. JSON is, you know, really hitting a market strong now and XML, payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR I'm leveraging a standard healthcare messaging format known as FIRE, which is a rest based protocol. And then when I'm working with my health record management system I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So let's, why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out of the box or black box approach to be able to develop things that are specific to their data fabric or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you cannot, you not only get an opportunity to view how we're establishing these connections or how we're building out these processes but you have the opportunity to inject your own kind of processing your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out of the box code that is provided in this data fabric platform from IRIS combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out of the box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. >> It's a lot here. You know, actually, if I could >> I can pause. >> If I just want to sort of play that back. So we went through the connect and the collect phase. >> And the collect, yes, we're going into refine. So it's a good place to stop. >> Yeah, so before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know the ability to bring in different dev tools. We heard about FIRE, which of course big in healthcare. >> Absolutely. >> And that's the standard and then SQL for traditional kind of structured data and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely, and I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refine. >> We're going into refinement, exciting. So how do we actually do refinement? Where does refinement happen and how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator or stands for smart data fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information it's aggregating it and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like and as you can see it follows a flow chart like structure. So there's a start, there is an end and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL Logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce or we make this data fabric a bit smarter and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging 'cause you need to be able to know, you know, if there was an issue, where did that issue happen, in which connected process and how did it affect the other processes that are related to it. In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric to when data was sent back out from that smart data fabric. So I didn't record the time but I bet if you recorded the time it was this time that we sent that request in. And you can see my patient's name and their medical record number here and you can see that that instigated four different calls to four different systems and they're represented by these arrows going out. So we sent something to chart script to our health record management system, to our clinical risk grouping application into my EMR through their FIRE server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems and we bundle them together. And for my FIRE lovers, here's our FIRE bundle that we got back from our FIRE server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping or errors that were thrown alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Etcher, who did what, when, where what did the machine do? What went wrong and where did that go wrong? >> Exactly. >> Right in your fingertips. >> Right, and I'm a visual person so a bunch of log files to me is not the most helpful. Well, being able to see this happened at this time in this location gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information, it's transforming that data, in a format that your consumer's not going to understand it's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? This all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well we can keep going. 'Cause right now, I mean we can, oh, we're at 18 minutes. God help us. You can cut some of this. (laughs) I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this but essentially if we go back to our coordinator here we can see here's that original that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at and we're running it through a machine learning model that exists within the smart data fabric pipeline and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL like syntax to be able to construct and execute these predictions. So it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge. >> Yes. >> Right, because it directly affects the cost of for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment or you know, as an outpatient perhaps to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely, absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day which is what makes this so exciting. >> Awesome demo. >> Thank you. >> Fantastic people, are you cool? If people want to get in touch with you? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy and we'd love to hear from you. I always love talking about this topic, so would be happy to engage on that. >> Great stuff, thank you Jess, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment we're going to dig into the use cases where data fabric is driving business value. Stay right there.

Published Date : Feb 15 2023

SUMMARY :

for organizations to gain new insights And to that end we do also So you were showing hundreds of these APIs in the healthcare industry So the way that we handle that it's not a bolt-on as they like to say. that data fabric to ensure that we're able It's a lot here. So we went through the So it's a good place to stop. the ability to bring And so you have a rich collection So that platform needs to we're going into refine. that are related to it. so a bunch of log files to of being able to create this technology to support Anything else you want to show us? So in this scenario, we're Well that readmission and the patient, you know. to that smart data fabric So you can find me on you Jess, appreciate it. because in the next segment

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jessica JowdyPERSON

0.99+

Joe LichtenbergPERSON

0.99+

InterSystemsORGANIZATION

0.99+

Jess JowdyPERSON

0.99+

ScottPERSON

0.99+

JessPERSON

0.99+

18 minutesQUANTITY

0.99+

hundredsQUANTITY

0.99+

32345OTHER

0.99+

PythonTITLE

0.99+

SimmonsPERSON

0.99+

eachQUANTITY

0.99+

IRISORGANIZATION

0.99+

third segmentQUANTITY

0.99+

EtcherORGANIZATION

0.99+

todayDATE

0.99+

LinkedInORGANIZATION

0.98+

SQLTITLE

0.98+

single platformQUANTITY

0.98+

oneQUANTITY

0.98+

JSONTITLE

0.96+

each data sourceQUANTITY

0.96+

singleQUANTITY

0.95+

one stepQUANTITY

0.94+

one stepQUANTITY

0.94+

single backendQUANTITY

0.92+

single responseQUANTITY

0.9+

two moreQUANTITY

0.85+

single payloadQUANTITY

0.84+

SQL LogicTITLE

0.84+

a secondQUANTITY

0.83+

IRISTITLE

0.83+

four different segmentsQUANTITY

0.82+

PostmanPERSON

0.78+

FIRETITLE

0.77+

SOAPTITLE

0.76+

four different applicationsQUANTITY

0.74+

one stopQUANTITY

0.74+

PostmanTITLE

0.73+

one payloadQUANTITY

0.72+

each ofQUANTITY

0.71+

RESTTITLE

0.7+

Healthcare Field EngineeringORGANIZATION

0.67+

next 30 daysDATE

0.65+

fourQUANTITY

0.63+

these APIsQUANTITY

0.62+

secondQUANTITY

0.54+

GodPERSON

0.53+

everyQUANTITY

0.53+

servicesQUANTITY

0.51+

H7COMMERCIAL_ITEM

0.5+

applicationQUANTITY

0.48+

FIREORGANIZATION

0.38+

XMLTITLE

0.38+

theCUBE's New Analyst Talks Cloud & DevOps


 

(light music) >> Hi everybody. Welcome to this Cube Conversation. I'm really pleased to announce a collaboration with Rob Strechay. He's a guest cube analyst, and we'll be working together to extract the signal from the noise. Rob is a long-time product pro, working at a number of firms including AWS, HP, HPE, NetApp, Snowplow. I did a stint as an analyst at Enterprise Strategy Group. Rob, good to see you. Thanks for coming into our Marlboro Studios. >> Well, thank you for having me. It's always great to be here. >> I'm really excited about working with you. We've known each other for a long time. You've been in the Cube a bunch. You know, you're in between gigs, and I think we can have a lot of fun together. Covering events, covering trends. So. let's get into it. What's happening out there? We're sort of exited the isolation economy. Things were booming. Now, everybody's tapping the brakes. From your standpoint, what are you seeing out there? >> Yeah. I'm seeing that people are really looking how to get more out of their data. How they're bringing things together, how they're looking at the costs of Cloud, and understanding how are they building out their SaaS applications. And understanding that when they go in and actually start to use Cloud, it's not only just using the base services anymore. They're looking at, how do I use these platforms as a service? Some are easier than others, and they're trying to understand, how do I get more value out of that relationship with the Cloud? They're also consolidating the number of Clouds that they have, I would say to try to better optimize their spend, and getting better pricing for that matter. >> Are you seeing people unhook Clouds, or just reduce maybe certain Cloud activities and going maybe instead of 60/40 going 90/10? >> Correct. It's more like the 90/10 type of rule where they're starting to say, Hey I'm not going to get rid of Azure or AWS or Google. I'm going to move a portion of this over that I was using on this one service. Maybe I got a great two-year contract to start with on this platform as a service or a database as a service. I'm going to unhook from that and maybe go with an independent. Maybe with something like a Snowflake or a Databricks on top of another Cloud, so that I can consolidate down. But it also gives them more flexibility as well. >> In our last breaking analysis, Rob, we identified six factors that were reducing Cloud consumption. There were factors and customer tactics. And I want to get your take on this. So, some of the factors really, you got fewer mortgage originations. FinTech, obviously big Cloud user. Crypto, not as much activity there. Lower ad spending means less Cloud. And then one of 'em, which you kind of disagreed with was less, less analytics, you know, fewer... Less frequency of calculations. I'll come back to that. But then optimizing compute using Graviton or AMD instances moving to cheaper storage tiers. That of course makes sense. And then optimize pricing plans. Maybe going from On Demand, you know, to, you know, instead of pay by the drink, buy in volume. Okay. So, first of all, do those make sense to you with the exception? We'll come back and talk about the analytics piece. Is that what you're seeing from customers? >> Yeah, I think so. I think that was pretty much dead on with what I'm seeing from customers and the ones that I go out and talk to. A lot of times they're trying to really monetize their, you know, understand how their business utilizes these Clouds. And, where their spend is going in those Clouds. Can they use, you know, lower tiers of storage? Do they really need the best processors? Do they need to be using Intel or can they get away with AMD or Graviton 2 or 3? Or do they need to move in? And, I think when you look at all of these Clouds, they always have pricing curves that are arcs from the newest to the oldest stuff. And you can play games with that. And understanding how you can actually lower your costs by looking at maybe some of the older generation. Maybe your application was written 10 years ago. You don't necessarily have to be on the best, newest processor for that application per se. >> So last, I want to come back to this whole analytics piece. Last June, I think it was June, Dev Ittycheria, who's the-- I call him Dev. Spelled Dev, pronounced Dave. (chuckles softly) Same pronunciation, different spelling. Dev Ittycheria, CEO of Mongo, on the earnings call. He was getting, you know, hit. Things were starting to get a little less visible in terms of, you know, the outlook. And people were pushing him like... Because you're in the Cloud, is it easier to dial down? And he said, because we're the document database, we support transaction applications. We're less discretionary than say, analytics. Well on the Snowflake earnings call, that same month or the month after, they were all over Slootman and Scarpelli. Oh, the Mongo CEO said that they're less discretionary than analytics. And Snowflake was an interesting comment. They basically said, look, we're the Cloud. You can dial it up, you can dial it down, but the area under the curve over a period of time is going to be the same, because they get their customers to commit. What do you say? You disagreed with the notion that people are running their calculations less frequently. Is that because they're trying to do a better job of targeting customers in near real time? What are you seeing out there? >> Yeah, I think they're moving away from using people and more expensive marketing. Or, they're trying to figure out what's my Google ad spend, what's my Meta ad spend? And what they're trying to do is optimize that spend. So, what is the return on advertising, or the ROAS as they would say. And what they're looking to do is understand, okay, I have to collect these analytics that better understand where are these people coming from? How do they get to my site, to my store, to my whatever? And when they're using it, how do they they better move through that? What you're also seeing is that analytics is not only just for kind of the retail or financial services or things like that, but then they're also, you know, using that to make offers in those categories. When you move back to more, you know, take other companies that are building products and SaaS delivered products. They may actually go and use this analytics for making the product better. And one of the big reasons for that is maybe they're dialing back how many product managers they have. And they're looking to be more data driven about how they actually go and build the product out or enhance the product. So maybe they're, you know, an online video service and they want to understand why people are either using or not using the whiteboard inside the product. And they're collecting a lot of that product analytics in a big way so that they can go through that. And they're doing it in a constant manner. This first party type tracking within applications is growing rapidly by customers. >> So, let's talk about who wins in that. So, obviously the Cloud guys, AWS, Google and Azure. I want to come back and unpack that a little bit. Databricks and Snowflake, we reported on our last breaking analysis, it kind of on a collision course. You know, a couple years ago we were thinking, okay, AWS, Snowflake and Databricks, like perfect sandwich. And then of course they started to become more competitive. My sense is they still, you know, compliment each other in the field, right? But, you know, publicly, they've got bigger aspirations, they get big TAMs that they're going after. But it's interesting, the data shows that-- So, Snowflake was off the charts in terms of spending momentum and our EPR surveys. Our partner down in New York, they kind of came into line. They're both growing in terms of market presence. Databricks couldn't get to IPO. So, we don't have as much, you know, visibility on their financials. You know, Snowflake obviously highly transparent cause they're a public company. And then you got AWS, Google and Azure. And it seems like AWS appears to be more partner friendly. Microsoft, you know, depends on what market you're in. And Google wants to sell BigQuery. >> Yeah. >> So, what are you seeing in the public Cloud from a data platform perspective? >> Yeah. I think that was pretty astute in what you were talking about there, because I think of the three, Google is definitely I think a little bit behind in how they go to market with their partners. Azure's done a fantastic job of partnering with these companies to understand and even though they may have Synapse as their go-to and where they want people to go to do AI and ML. What they're looking at is, Hey, we're going to also be friendly with Snowflake. We're also going to be friendly with a Databricks. And I think that, Amazon has always been there because that's where the market has been for these developers. So, many, like Databricks' and the Snowflake's have gone there first because, you know, Databricks' case, they built out on top of S3 first. And going and using somebody's object layer other than AWS, was not as simple as you would think it would be. Moving between those. >> So, one of the financial meetups I said meetup, but the... It was either the CEO or the CFO. It was either Slootman or Scarpelli talking at, I don't know, Merrill Lynch or one of the other financial conferences said, I think it was probably their Q3 call. Snowflake said 80% of our business goes through Amazon. And he said to this audience, the next day we got a call from Microsoft. Hey, we got to do more. And, we know just from reading the financial statements that Snowflake is getting concessions from Amazon, they're buying in volume, they're renegotiating their contracts. Amazon gets it. You know, lower the price, people buy more. Long term, we're all going to make more money. Microsoft obviously wants to get into that game with Snowflake. They understand the momentum. They said Google, not so much. And I've had customers tell me that they wanted to use Google's AI with Snowflake, but they can't, they got to go to to BigQuery. So, honestly, I haven't like vetted that so. But, I think it's true. But nonetheless, it seems like Google's a little less friendly with the data platform providers. What do you think? >> Yeah, I would say so. I think this is a place that Google looks and wants to own. Is that now, are they doing the right things long term? I mean again, you know, you look at Google Analytics being you know, basically outlawed in five countries in the EU because of GDPR concerns, and compliance and governance of data. And I think people are looking at Google and BigQuery in general and saying, is it the best place for me to go? Is it going to be in the right places where I need it? Still, it's still one of the largest used databases out there just because it underpins a number of the Google services. So you almost get, like you were saying, forced into BigQuery sometimes, if you want to use the tech on top. >> You do strategy. >> Yeah. >> Right? You do strategy, you do messaging. Is it the right call by Google? I mean, it's not a-- I criticize Google sometimes. But, I'm not sure it's the wrong call to say, Hey, this is our ace in the hole. >> Yeah. >> We got to get people into BigQuery. Cause, first of all, BigQuery is a solid product. I mean it's Cloud native and it's, you know, by all, it gets high marks. So, why give the competition an advantage? Let's try to force people essentially into what is we think a great product and it is a great product. The flip side of that is, they're giving up some potential partner TAM and not treating the ecosystem as well as one of their major competitors. What do you do if you're in that position? >> Yeah, I think that that's a fantastic question. And the question I pose back to the companies I've worked with and worked for is, are you really looking to have vendor lock-in as your key differentiator to your service? And I think when you start to look at these companies that are moving away from BigQuery, moving to even, Databricks on top of GCS in Google, they're looking to say, okay, I can go there if I have to evacuate from GCP and go to another Cloud, I can stay on Databricks as a platform, for instance. So I think it's, people are looking at what platform as a service, database as a service they go and use. Because from a strategic perspective, they don't want that vendor locking. >> That's where Supercloud becomes interesting, right? Because, if I can run on Snowflake or Databricks, you know, across Clouds. Even Oracle, you know, they're getting into business with Microsoft. Let's talk about some of the Cloud players. So, the big three have reported. >> Right. >> We saw AWSs Cloud growth decelerated down to 20%, which is I think the lowest growth rate since they started to disclose public numbers. And they said they exited, sorry, they said January they grew at 15%. >> Yeah. >> Year on year. Now, they had some pretty tough compares. But nonetheless, 15%, wow. Azure, kind of mid thirties, and then Google, we had kind of low thirties. But, well behind in terms of size. And Google's losing probably almost $3 billion annually. But, that's not necessarily a bad thing by advocating and investing. What's happening with the Cloud? Is AWS just running into the law, large numbers? Do you think we can actually see a re-acceleration like we have in the past with AWS Cloud? Azure, we predicted is going to be 75% of AWS IAS revenues. You know, we try to estimate IAS. >> Yeah. >> Even though they don't share that with us. That's a huge milestone. You'd think-- There's some people who have, I think, Bob Evans predicted a while ago that Microsoft would surpass AWS in terms of size. You know, what do you think? >> Yeah, I think that Azure's going to keep to-- Keep growing at a pretty good clip. I think that for Azure, they still have really great account control, even though people like to hate Microsoft. The Microsoft sellers that are out there making those companies successful day after day have really done a good job of being in those accounts and helping people. I was recently over in the UK. And the UK market between AWS and Azure is pretty amazing, how much Azure there is. And it's growing within Europe in general. In the states, it's, you know, I think it's growing well. I think it's still growing, probably not as fast as it is outside the U.S. But, you go down to someplace like Australia, it's also Azure. You hear about Azure all the time. >> Why? Is that just because of the Microsoft's software state? It's just so convenient. >> I think it has to do with, you know, and you can go with the reasoning they don't break out, you know, Office 365 and all of that out of their numbers is because they have-- They're in all of these accounts because the office suite is so pervasive in there. So, they always have reasons to go back in and, oh by the way, you're on these old SQL licenses. Let us move you up here and we'll be able to-- We'll support you on the old version, you know, with security and all of these things. And be able to move you forward. So, they have a lot of, I guess you could say, levers to stay in those accounts and be interesting. At least as part of the Cloud estate. I think Amazon, you know, is hitting, you know, the large number. Laws of large numbers. But I think that they're also going through, and I think this was seen in the layoffs that they were making, that they're looking to understand and have profitability in more of those services that they have. You know, over 350 odd services that they have. And you know, as somebody who went there and helped to start yet a new one, while I was there. And finally, it went to beta back in September, you start to look at the fact that, that number of services, people, their own sellers don't even know all of their services. It's impossible to comprehend and sell that many things. So, I think what they're going through is really looking to rationalize a lot of what they're doing from a services perspective going forward. They're looking to focus on more profitable services and bringing those in. Because right now it's built like a layer cake where you have, you know, S3 EBS and EC2 on the bottom of the layer cake. And then maybe you have, you're using IAM, the authorization and authentication in there and you have all these different services. And then they call it EMR on top. And so, EMR has to pay for that entire layer cake just to go and compete against somebody like Mongo or something like that. So, you start to unwind the costs of that. Whereas Azure, went and they build basically ground up services for the most part. And Google kind of falls somewhere in between in how they build their-- They're a sort of layer cake type effect, but not as many layers I guess you could say. >> I feel like, you know, Amazon's trying to be a platform for the ecosystem. Yes, they have their own products and they're going to sell. And that's going to drive their profitability cause they don't have to split the pie. But, they're taking a piece of-- They're spinning the meter, as Ziyas Caravalo likes to say on every time Snowflake or Databricks or Mongo or Atlas is, you know, running on their system. They take a piece of the action. Now, Microsoft does that as well. But, you look at Microsoft and security, head-to-head competitors, for example, with a CrowdStrike or an Okta in identity. Whereas, it seems like at least for now, AWS is a more friendly place for the ecosystem. At the same time, you do a lot of business in Microsoft. >> Yeah. And I think that a lot of companies have always feared that Amazon would just throw, you know, bodies at it. And I think that people have come to the realization that a two pizza team, as Amazon would call it, is eight people. I think that's, you know, two slices per person. I'm a little bit fat, so I don't know if that's enough. But, you start to look at it and go, okay, if they're going to start out with eight engineers, if I'm a startup and they're part of my ecosystem, do I really fear them or should I really embrace them and try to partner closer with them? And I think the smart people and the smart companies are partnering with them because they're realizing, Amazon, unless they can see it to, you know, a hundred million, $500 million market, they're not going to throw eight to 16 people at a problem. I think when, you know, you could say, you could look at the elastic with OpenSearch and what they did there. And the licensing terms and the battle they went through. But they knew that Elastic had a huge market. Also, you had a number of ecosystem companies building on top of now OpenSearch, that are now domain on top of Amazon as well. So, I think Amazon's being pretty strategic in how they're doing it. I think some of the-- It'll be interesting. I think this year is a payout year for the cuts that they're making to some of the services internally to kind of, you know, how do we take the fat off some of those services that-- You know, you look at Alexa. I don't know how much revenue Alexa really generates for them. But it's a means to an end for a number of different other services and partners. >> What do you make of this ChatGPT? I mean, Microsoft obviously is playing that card. You want to, you want ChatGPT in the Cloud, come to Azure. Seems like AWS has to respond. And we know Google is, you know, sharpening its knives to come up with its response. >> Yeah, I mean Google just went and talked about Bard for the first time this week and they're in private preview or I guess they call it beta, but. Right at the moment to select, select AI users, which I have no idea what that means. But that's a very interesting way that they're marketing it out there. But, I think that Amazon will have to respond. I think they'll be more measured than say, what Google's doing with Bard and just throwing it out there to, hey, we're going into beta now. I think they'll look at it and see where do we go and how do we actually integrate this in? Because they do have a lot of components of AI and ML underneath the hood that other services use. And I think that, you know, they've learned from that. And I think that they've already done a good job. Especially for media and entertainment when you start to look at some of the ways that they use it for helping do graphics and helping to do drones. I think part of their buy of iRobot was the fact that iRobot was a big user of RoboMaker, which is using different models to train those robots to go around objects and things like that, so. >> Quick touch on Kubernetes, the whole DevOps World we just covered. The Cloud Native Foundation Security, CNCF. The security conference up in Seattle last week. First time they spun that out kind of like reinforced, you know, AWS spins out, reinforced from reinvent. Amsterdam's coming up soon, the CubeCon. What should we expect? What's hot in Cubeland? >> Yeah, I think, you know, Kubes, you're going to be looking at how OpenShift keeps growing and I think to that respect you get to see the momentum with people like Red Hat. You see others coming up and realizing how OpenShift has gone to market as being, like you were saying, partnering with those Clouds and really making it simple. I think the simplicity and the manageability of Kubernetes is going to be at the forefront. I think a lot of the investment is still going into, how do I bring observability and DevOps and AIOps and MLOps all together. And I think that's going to be a big place where people are going to be looking to see what comes out of CubeCon in Amsterdam. I think it's that manageability ease of use. >> Well Rob, I look forward to working with you on behalf of the whole Cube team. We're going to do more of these and go out to some shows extract the signal from the noise. Really appreciate you coming into our studio. >> Well, thank you for having me on. Really appreciate it. >> You're really welcome. All right, keep it right there, or thanks for watching. This is Dave Vellante for the Cube. And we'll see you next time. (light music)

Published Date : Feb 7 2023

SUMMARY :

I'm really pleased to It's always great to be here. and I think we can have the number of Clouds that they have, contract to start with those make sense to you And, I think when you look in terms of, you know, the outlook. And they're looking to My sense is they still, you know, in how they go to market And he said to this audience, is it the best place for me to go? You do strategy, you do messaging. and it's, you know, And I think when you start Even Oracle, you know, since they started to to be 75% of AWS IAS revenues. You know, what do you think? it's, you know, I think it's growing well. Is that just because of the And be able to move you forward. I feel like, you know, I think when, you know, you could say, And we know Google is, you know, And I think that, you know, you know, AWS spins out, and I think to that respect forward to working with you Well, thank you for having me on. And we'll see you next time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmazonORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Bob EvansPERSON

0.99+

MicrosoftORGANIZATION

0.99+

HPORGANIZATION

0.99+

AWSORGANIZATION

0.99+

RobPERSON

0.99+

GoogleORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Rob StrechayPERSON

0.99+

New YorkLOCATION

0.99+

SeptemberDATE

0.99+

SeattleLOCATION

0.99+

JanuaryDATE

0.99+

Dev IttycheriaPERSON

0.99+

HPEORGANIZATION

0.99+

NetAppORGANIZATION

0.99+

AmsterdamLOCATION

0.99+

75%QUANTITY

0.99+

UKLOCATION

0.99+

AWSsORGANIZATION

0.99+

JuneDATE

0.99+

SnowplowORGANIZATION

0.99+

eightQUANTITY

0.99+

80%QUANTITY

0.99+

ScarpelliPERSON

0.99+

15%QUANTITY

0.99+

AustraliaLOCATION

0.99+

MongoORGANIZATION

0.99+

SlootmanPERSON

0.99+

two-yearQUANTITY

0.99+

AMDORGANIZATION

0.99+

EuropeLOCATION

0.99+

DatabricksORGANIZATION

0.99+

six factorsQUANTITY

0.99+

threeQUANTITY

0.99+

Merrill LynchORGANIZATION

0.99+

Last JuneDATE

0.99+

five countriesQUANTITY

0.99+

eight peopleQUANTITY

0.99+

U.S.LOCATION

0.99+

last weekDATE

0.99+

16 peopleQUANTITY

0.99+

Databricks'ORGANIZATION

0.99+

Jon Turow, Madrona Venture Group | CloudNativeSecurityCon 23


 

(upbeat music) >> Hello and welcome back to theCUBE. We're here in Palo Alto, California. I'm your host, John Furrier with a special guest here in the studio. As part of our Cloud Native SecurityCon Coverage we had an opportunity to bring in Jon Turow who is the partner at Madrona Venture Partners formerly with AWS and to talk about machine learning, foundational models, and how the future of AI is going to be impacted by some of the innovation around what's going on in the industry. ChatGPT has taken the world by storm. A million downloads, fastest to the million downloads there. Before some were saying it's just a gimmick. Others saying it's a game changer. Jon's here to break it down, and great to have you on. Thanks for coming in. >> Thanks John. Glad to be here. >> Thanks for coming on. So first of all, I'm glad you're here. First of all, because two things. One, you were formerly with AWS, got a lot of experience running projects at AWS. Now a partner at Madrona, a great firm doing great deals, and they had this future at modern application kind of thesis. Now you are putting out some content recently around foundational models. You're deep into computer vision. You were the IoT general manager at AWS among other things, Greengrass. So you know a lot about data. You know a lot about some of this automation, some of the edge stuff. You've been in the middle of all these kind of areas that now seem to be the next wave coming. So I wanted to ask you what your thoughts are of how the machine learning and this new automation wave is coming in, this AI tools are coming out. Is it a platform? Is it going to be smarter? What feeds AI? What's your take on this whole foundational big movement into AI? What's your general reaction to all this? >> So, thanks, Jon, again for having me here. Really excited to talk about these things. AI has been coming for a long time. It's been kind of the next big thing. Always just over the horizon for quite some time. And we've seen really compelling applications in generations before and until now. Amazon and AWS have introduced a lot of them. My firm, Madrona Venture Group has invested in some of those early players as well. But what we're seeing now is something categorically different. That's really exciting and feels like a durable change. And I can try and explain what that is. We have these really large models that are useful in a general way. They can be applied to a lot of different tasks beyond the specific task that the designers envisioned. That makes them more flexible, that makes them more useful for building applications than what we've seen before. And so that, we can talk about the depths of it, but in a nutshell, that's why I think people are really excited. >> And I think one of the things that you wrote about that jumped out at me is that this seems to be this moment where there's been a multiple decades of nerds and computer scientists and programmers and data thinkers around waiting for AI to blossom. And it's like they're scratching that itch. Every year is going to be, and it's like the bottleneck's always been compute power. And we've seen other areas, genome sequencing, all kinds of high computation things where required high forms computing. But now there's no real bottleneck to compute. You got cloud. And so you're starting to see the emergence of a massive acceleration of where AI's been and where it needs to be going. Now, it's almost like it's got a reboot. It's almost a renaissance in the AI community with a whole nother macro environmental things happening. Cloud, younger generation, applications proliferate from mobile to cloud native. It's the perfect storm for this kind of moment to switch over. Am I overreading that? Is that right? >> You're right. And it's been cooking for a cycle or two. And let me try and explain why that is. We have cloud and AWS launch in whatever it was, 2006, and offered more compute to more people than really was possible before. Initially that was about taking existing applications and running them more easily in a bigger scale. But in that period of time what's also become possible is new kinds of computation that really weren't practical or even possible without that vast amount of compute. And so one result that came of that is something called the transformer AI model architecture. And Google came out with that, published a paper in 2017. And what that says is, with a transformer model you can actually train an arbitrarily large amount of data into a model, and see what happens. That's what Google demonstrated in 2017. The what happens is the really exciting part because when you do that, what you start to see, when models exceed a certain size that we had never really seen before all of a sudden they get what we call emerging capabilities of complex reasoning and reasoning outside a domain and reasoning with data. The kinds of things that people describe as spooky when they play with something like ChatGPT. That's the underlying term. We don't as an industry quite know why it happens or how it happens, but we can measure that it does. So cloud enables new kinds of math and science. New kinds of math and science allow new kinds of experimentation. And that experimentation has led to this new generation of models. >> So one of the debates we had on theCUBE at our Supercloud event last month was, what's the barriers to entry for say OpenAI, for instance? Obviously, I weighed in aggressively and said, "The barriers for getting into cloud are high because all the CapEx." And Howie Xu formerly VMware, now at ZScaler, he's an AI machine learning guy. He was like, "Well, you can spend $100 million and replicate it." I saw a quote that set up for 180,000 I can get this other package. What's the barriers to entry? Is ChatGPT or OpenAI, does it have sustainability? Is it easy to get into? What is the market like for AI? I mean, because a lot of entrepreneurs are jumping in. I mean, I just read a story today. San Francisco's got more inbound migration because of the AI action happening, Seattle's booming, Boston with MIT's been working on neural networks for generations. That's what we've found the answer. Get off the neural network, Boston jump on the AI bus. So there's total excitement for this. People are enthusiastic around this area. >> You can think of an iPhone versus Android tension that's happening today. In the iPhone world, there are proprietary models from OpenAI who you might consider as the leader. There's Cohere, there's AI21, there's Anthropic, Google's going to have their own, and a few others. These are proprietary models that developers can build on top of, get started really quickly. They're measured to have the highest accuracy and the highest performance today. That's the proprietary side. On the other side, there is an open source part of the world. These are a proliferation of model architectures that developers and practitioners can take off the shelf and train themselves. Typically found in Hugging face. What people seem to think is that the accuracy and performance of the open source models is something like 18 to 20 months behind the accuracy and performance of the proprietary models. But on the other hand, there's infinite flexibility for teams that are capable enough. So you're going to see teams choose sides based on whether they want speed or flexibility. >> That's interesting. And that brings up a point I was talking to a startup and the debate was, do you abstract away from the hardware and be software-defined or software-led on the AI side and let the hardware side just extremely accelerate on its own, 'cause it's flywheel? So again, back to proprietary, that's with hardware kind of bundled in, bolted on. Is it accelerator or is it bolted on or is it part of it? So to me, I think that the big struggle in understanding this is that which one will end up being right. I mean, is it a beta max versus VHS kind of thing going on? Or iPhone, Android, I mean iPhone makes a lot of sense, but if you're Apple, but is there an Apple moment in the machine learning? >> In proprietary models, here does seem to be a jump ball. That there's going to be a virtuous flywheel that emerges that, for example, all these excitement about ChatGPT. What's really exciting about it is it's really easy to use. The technology isn't so different from what we've seen before even from OpenAI. You mentioned a million users in a short period of time, all providing training data for OpenAI that makes their underlying models, their next generation even better. So it's not unreasonable to guess that there's going to be power laws that emerge on the proprietary side. What I think history has shown is that iPhone, Android, Windows, Linux, there seems to be gravity towards this yin and yang. And my guess, and what other people seem to think is going to be the case is that we're going to continue to see these two poles of AI. >> So let's get into the relationship with data because I've been emerging myself with ChatGPT, fascinated by the ease of use, yes, but also the fidelity of how you query it. And I felt like when I was doing writing SQL back in the eighties and nineties where SQL was emerging. You had to be really a guru at the SQL to get the answers you wanted. It seems like the querying into ChatGPT is a good thing if you know how to talk to it. Labeling whether your input is and it does a great job if you feed it right. If you ask a generic questions like Google. It's like a Google search. It gives you great format, sounds credible, but the facts are kind of wrong. >> That's right. >> That's where general consensus is coming on. So what does that mean? That means people are on one hand saying, "Ah, it's bullshit 'cause it's wrong." But I look at, I'm like, "Wow, that's that's compelling." 'Cause if you feed it the right data, so now we're in the data modeling here, so the role of data's going to be critical. Is there a data operating system emerging? Because if this thing continues to go the way it's going you can almost imagine as you would look at companies to invest in. Who's going to be right on this? What's going to scale? What's sustainable? What could build a durable company? It might not look what like what people think it is. I mean, I remember when Google started everyone thought it was the worst search engine because it wasn't a portal. But it was the best organic search on the planet became successful. So I'm trying to figure out like, okay, how do you read this? How do you read the tea leaves? >> Yeah. There are a few different ways that companies can differentiate themselves. Teams with galactic capabilities to take an open source model and then change the architecture and retrain and go down to the silicon. They can do things that might not have been possible for other teams to do. There's a company that that we're proud to be investors in called RunwayML that provides video accelerated, sorry, AI accelerated video editing capabilities. They were used in everything, everywhere all at once and some others. In order to build RunwayML, they needed a vision of what the future was going to look like and they needed to make deep contributions to the science that was going to enable all that. But not every team has those capabilities, maybe nor should they. So as far as how other teams are going to differentiate there's a couple of things that they can do. One is called prompt engineering where they shape on behalf of their own users exactly how the prompt to get fed to the underlying model. It's not clear whether that's going to be a durable problem or whether like Google, we consumers are going to start to get more intuitive about this. That's one. The second is what's called information retrieval. How can I get information about the world outside, information from a database or a data store or whatever service into these models so they can reason about them. And the third is, this is going to sound funny, but attribution. Just like you would do in a news report or an academic paper. If you can state where your facts are coming from, the downstream consumer or the human being who has to use that information actually is going to be able to make better sense of it and rely better on it. So that's prompt engineering, that's retrieval, and that's attribution. >> So that brings me to my next point I want to dig in on is the foundational model stack that you published. And I'll start by saying that with ChatGPT, if you take out the naysayers who are like throwing cold water on it about being a gimmick or whatever, and then you got the other side, I would call the alpha nerds who are like they can see, "Wow, this is amazing." This is truly NextGen. This isn't yesterday's chatbot nonsense. They're like, they're all over it. It's that everybody's using it right now in every vertical. I heard someone using it for security logs. I heard a data center, hardware vendor using it for pushing out appsec review updates. I mean, I've heard corner cases. We're using it for theCUBE to put our metadata in. So there's a horizontal use case of value. So to me that tells me it's a market there. So when you have horizontal scalability in the use case you're going to have a stack. So you publish this stack and it has an application at the top, applications like Jasper out there. You're seeing ChatGPT. But you go after the bottom, you got silicon, cloud, foundational model operations, the foundational models themselves, tooling, sources, actions. Where'd you get this from? How'd you put this together? Did you just work backwards from the startups or was there a thesis behind this? Could you share your thoughts behind this foundational model stack? >> Sure. Well, I'm a recovering product manager and my job that I think about as a product manager is who is my customer and what problem he wants to solve. And so to put myself in the mindset of an application developer and a founder who is actually my customer as a partner at Madrona, I think about what technology and resources does she need to be really powerful, to be able to take a brilliant idea, and actually bring that to life. And if you spend time with that community, which I do and I've met with hundreds of founders now who are trying to do exactly this, you can see that the stack is emerging. In fact, we first drew it in, not in January 2023, but October 2022. And if you look at the difference between the October '22 and January '23 stacks you're going to see that holes in the stack that we identified in October around tooling and around foundation model ops and the rest are organically starting to get filled because of how much demand from the developers at the top of the stack. >> If you look at the young generation coming out and even some of the analysts, I was just reading an analyst report on who's following the whole data stacks area, Databricks, Snowflake, there's variety of analytics, realtime AI, data's hot. There's a lot of engineers coming out that were either data scientists or I would call data platform engineering folks are becoming very key resources in this area. What's the skillset emerging and what's the mindset of that entrepreneur that sees the opportunity? How does these startups come together? Is there a pattern in the formation? Is there a pattern in the competency or proficiency around the talent behind these ventures? >> Yes. I would say there's two groups. The first is a very distinct pattern, John. For the past 10 years or a little more we've seen a pattern of democratization of ML where more and more people had access to this powerful science and technology. And since about 2017, with the rise of the transformer architecture in these foundation models, that pattern has reversed. All of a sudden what has become broader access is now shrinking to a pretty small group of scientists who can actually train and manipulate the architectures of these models themselves. So that's one. And what that means is the teams who can do that have huge ability to make the future happen in ways that other people don't have access to yet. That's one. The second is there is a broader population of people who by definition has even more collective imagination 'cause there's even more people who sees what should be possible and can use things like the proprietary models, like the OpenAI models that are available off the shelf and try to create something that maybe nobody has seen before. And when they do that, Jasper AI is a great example of that. Jasper AI is a company that creates marketing copy automatically with generative models such as GPT-3. They do that and it's really useful and it's almost fun for a marketer to use that. But there are going to be questions of how they can defend that against someone else who has access to the same technology. It's a different population of founders who has to find other sources of differentiation without being able to go all the way down to the the silicon and the science. >> Yeah, and it's going to be also opportunity recognition is one thing. Building a viable venture product market fit. You got competition. And so when things get crowded you got to have some differentiation. I think that's going to be the key. And that's where I was trying to figure out and I think data with scale I think are big ones. Where's the vulnerability in the stack in terms of gaps? Where's the white space? I shouldn't say vulnerability. I should say where's the opportunity, where's the white space in the stack that you see opportunities for entrepreneurs to attack? >> I would say there's two. At the application level, there is almost infinite opportunity, John, because almost every kind of application is about to be reimagined or disrupted with a new generation that takes advantage of this really powerful new technology. And so if there is a kind of application in almost any vertical, it's hard to rule something out. Almost any vertical that a founder wishes she had created the original app in, well, now it's her time. So that's one. The second is, if you look at the tooling layer that we discussed, tooling is a really powerful way that you can provide more flexibility to app developers to get more differentiation for themselves. And the tooling layer is still forming. This is the interface between the models themselves and the applications. Tools that help bring in data, as you mentioned, connect to external actions, bring context across multiple calls, chain together multiple models. These kinds of things, there's huge opportunity there. >> Well, Jon, I really appreciate you coming in. I had a couple more questions, but I will take a minute to read some of your bios for the audience and we'll get into, I won't embarrass you, but I want to set the context. You said you were recovering product manager, 10 plus years at AWS. Obviously, recovering from AWS, which is a whole nother dimension of recovering. In all seriousness, I talked to Andy Jassy around that time and Dr. Matt Wood and it was about that time when AI was just getting on the radar when they started. So you guys started seeing the wave coming in early on. So I remember at that time as Amazon was starting to grow significantly and even just stock price and overall growth. From a tech perspective, it was pretty clear what was coming, so you were there when this tsunami hit. >> Jon: That's right. >> And you had a front row seat building tech, you were led the product teams for Computer Vision AI, Textract, AI intelligence for document processing, recognition for image and video analysis. You wrote the business product plan for AWS IoT and Greengrass, which we've covered a lot in theCUBE, which extends out to the whole edge thing. So you know a lot about AI/ML, edge computing, IOT, messaging, which I call the law of small numbers that scale become big. This is a big new thing. So as a former AWS leader who's been there and at Madrona, what's your investment thesis as you start to peruse the landscape and talk to entrepreneurs as you got the stack? What's the big picture? What are you looking for? What's the thesis? How do you see this next five years emerging? >> Five years is a really long time given some of this science is only six months out. I'll start with some, no pun intended, some foundational things. And we can talk about some implications of the technology. The basics are the same as they've always been. We want, what I like to call customers with their hair on fire. So they have problems, so urgent they'll buy half a product. The joke is if your hair is on fire you might want a bucket of cold water, but you'll take a tennis racket and you'll beat yourself over the head to put the fire out. You want those customers 'cause they'll meet you more than halfway. And when you find them, you can obsess about them and you can get better every day. So we want customers with their hair on fire. We want founders who have empathy for those customers, understand what is going to be required to serve them really well, and have what I like to call founder-market fit to be able to build the products that those customers are going to need. >> And because that's a good strategy from an emerging, not yet fully baked out requirements definition. >> Jon: That's right. >> Enough where directionally they're leaning in, more than in, they're part of the product development process. >> That's right. And when you're doing early stage development, which is where I personally spend a lot of my time at the seed and A and a little bit beyond that stage often that's going to be what you have to go on because the future is going to be so complex that you can't see the curves beyond it. But if you have customers with their hair on fire and talented founders who have the capability to serve those customers, that's got me interested. >> So if I'm an entrepreneur, I walk in and say, "I have customers that have their hair on fire." What kind of checks do you write? What's the kind of the average you're seeing for seed and series? Probably seed, seed rounds and series As. >> It can depend. I have seen seed rounds of double digit million dollars. I have seen seed rounds much smaller than that. It really depends on what is going to be the right thing for these founders to prove out the hypothesis that they're testing that says, "Look, we have this customer with her hair on fire. We think we can build at least a tennis racket that she can use to start beating herself over the head and put the fire out. And then we're going to have something really interesting that we can scale up from there and we can make the future happen. >> So it sounds like your advice to founders is go out and find some customers, show them a product, don't obsess over full completion, get some sort of vibe on fit and go from there. >> Yeah, and I think by the time founders come to me they may not have a product, they may not have a deck, but if they have a customer with her hair on fire, then I'm really interested. >> Well, I always love the professional services angle on these markets. You go in and you get some business and you understand it. Walk away if you don't like it, but you see the hair on fire, then you go in product mode. >> That's right. >> All Right, Jon, thank you for coming on theCUBE. Really appreciate you stopping by the studio and good luck on your investments. Great to see you. >> You too. >> Thanks for coming on. >> Thank you, Jon. >> CUBE coverage here at Palo Alto. I'm John Furrier, your host. More coverage with CUBE Conversations after this break. (upbeat music)

Published Date : Feb 2 2023

SUMMARY :

and great to have you on. that now seem to be the next wave coming. It's been kind of the next big thing. is that this seems to be this moment and offered more compute to more people What's the barriers to entry? is that the accuracy and the debate was, do you that there's going to be power laws but also the fidelity of how you query it. going to be critical. exactly how the prompt to get So that brings me to my next point and actually bring that to life. and even some of the analysts, But there are going to be questions Yeah, and it's going to be and the applications. the radar when they started. and talk to entrepreneurs the head to put the fire out. And because that's a good of the product development process. that you can't see the curves beyond it. What kind of checks do you write? and put the fire out. to founders is go out time founders come to me and you understand it. stopping by the studio More coverage with CUBE

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmazonORGANIZATION

0.99+

JonPERSON

0.99+

AWSORGANIZATION

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

Andy JassyPERSON

0.99+

2017DATE

0.99+

January 2023DATE

0.99+

Jon TurowPERSON

0.99+

OctoberDATE

0.99+

18QUANTITY

0.99+

MITORGANIZATION

0.99+

$100 millionQUANTITY

0.99+

Palo AltoLOCATION

0.99+

10 plus yearsQUANTITY

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

GoogleORGANIZATION

0.99+

twoQUANTITY

0.99+

October 2022DATE

0.99+

hundredsQUANTITY

0.99+

MadronaORGANIZATION

0.99+

AppleORGANIZATION

0.99+

Madrona Venture PartnersORGANIZATION

0.99+

January '23DATE

0.99+

two groupsQUANTITY

0.99+

Matt WoodPERSON

0.99+

Madrona Venture GroupORGANIZATION

0.99+

180,000QUANTITY

0.99+

October '22DATE

0.99+

JasperTITLE

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

six monthsQUANTITY

0.99+

2006DATE

0.99+

million downloadsQUANTITY

0.99+

Five yearsQUANTITY

0.99+

SQLTITLE

0.99+

last monthDATE

0.99+

two polesQUANTITY

0.99+

firstQUANTITY

0.99+

Howie XuPERSON

0.99+

VMwareORGANIZATION

0.99+

thirdQUANTITY

0.99+

20 monthsQUANTITY

0.99+

GreengrassORGANIZATION

0.99+

Madrona Venture GroupORGANIZATION

0.98+

secondQUANTITY

0.98+

OneQUANTITY

0.98+

SupercloudEVENT

0.98+

RunwayMLTITLE

0.98+

San FranciscoLOCATION

0.98+

ZScalerORGANIZATION

0.98+

yesterdayDATE

0.98+

oneQUANTITY

0.98+

FirstQUANTITY

0.97+

CapExORGANIZATION

0.97+

eightiesDATE

0.97+

ChatGPTTITLE

0.96+

Dr.PERSON

0.96+

Juan Loaiza, Oracle | Building the Mission Critical Supercloud


 

(upbeat music) >> Welcome back to Supercloud two where we're gathering a number of industry luminaries to discuss the future of cloud services. And we'll be focusing on various real world practitioners today, their challenges, their opportunities with an emphasis on data, self-service infrastructure and how organizations are evolving their data and cloud strategies to prepare for that next era of digital innovation. And we really believe that support for multiple cloud estates is a first step of any Supercloud. And in that regard Oracle surprise some folks with its Azure collaboration the Oracle database and exit database services. And to discuss the challenges of developing a mission critical Supercloud we welcome Juan Loaiza, who's the executive vice president of Mission Critical Database Technologies at Oracle. Juan, you're many time CUBE alums so welcome back to the show. Great to see you. >> Great to see you, and happy to be here with you. >> Yeah, thank you. So a lot of people felt that Oracle was resistant to multicloud strategies and preferred to really have everything run just on the Oracle cloud infrastructure, OCI and maybe that was a misperception maybe you guys were misunderstood or maybe you had to change your heart. Take us through the decision to support multiple cloud platforms >> Now we've supported multiple cloud platforms for many years, so I think that was probably a misperception. Oracle database, we partnered up with Amazon very early on in their cloud when they had kind of the the first cloud out there. And we had Oracle database running on their cloud. We have backup, we have a lot of stuff running. So, yeah, part of the philosophy of Oracle has always been we partner with every platform. We're very open we started with SQL and APIs. As we develop new technologies we push them into the SQL standard. So that's always been part of the ecosystem at Oracle. That's how we think we get an advantage by being more open. I think if we try to create this isolated little world it actually hurts us and hurts customers. So for us it's a win-win to be open across the clouds. >> So Supercloud is this concept that we put forth to describe a platform or some people think it's an architecture if you have an opinion, and I'd love to hear it but it provides a programmatically consistent set of services that hosted on heterogeneous cloud providers. And so we look at the Oracle database service for Azure as fitting within this definition. In your view, is this accurate? >> Yeah, I would broaden it. I'd see a little bit more than that. We just think that services should be available from everywhere, right? So, I mean, it's a little bit like if you go back to the pre-internet world, there was things like AOL and CompuServe and those were kind of islands. And if you were on AOL, you really didn't have access to anything on CompuServe and vice versa. And the cloud world has evolved a little bit like that. And we just think that's the wrong model. They shouldn't these clouds are part of the world and they need to be interconnected like all the rest of the world. It's been a long time with telephones internet, everything, everything's interconnected. Everything should work seamlessly together. So that's how we believe if you're running in one cloud and you're running let's say an application, one cloud you want to use a service from another cloud should be completely simple to do that. It shouldn't be, I can only use what's in AOL or CompuServe or whatever else. It should not be isolated. >> Well, we got a long way to go before that Nirvana exists but one example is the Oracle database service with Azure. So what exactly does that service provide? I'm interested in how consistent the service experience is across clouds. Did you create a purpose-built PaaS layer to achieve this common experience? Or is it off the shelf Terraform? Is there unique value in the PaaS layer? Let's dig into some of those questions. I know I just threw six at you. >> Yeah, I mean, so what this is, is what we're trying to do is very simple. Which is, for example, starting with the Oracle database we want to make that seamless to use from anywhere you're running. Whether it's on-prem, on some other cloud, anywhere else you should be able to seamlessly use the Oracle database and it should look like the internet. There's no friction. There's not a lot of hoops you got to jump just because you're trying to use a database that isn't local to you. So it's pretty straightforward. And in terms of things like Azure, it's not easy to do because all these clouds have a lot of kind of very unique technologies. So what we've done is at Oracle is we've said, "Okay we're going to make Oracle database look exactly like if it was running on Azure." That means we'll use the Azure security systems, the identity management systems, the networking, there's things like monitoring and management. So we'll push all these technologies. For example, when we have monitoring event or we have alerts we'll push those into the Azure console. So as a user, it looks to you exactly as if that Oracle database was running inside Azure. Also, the networking is a big challenge across these clouds. So we've basically made that whole thing seamless. So we create the super high bandwidth network between Azure and Oracle. We make sure that's extremely low latency, under two milliseconds round trip. It's all within the local metro region. So it's very fast, very high bandwidth, very low latency. And we take care establishing the links and making sure that it's secure and all that kind of stuff. So at a high level, it looks to you like the database is--even the look and feel of the screens. It's the Azure colors, it's the Azure buttons it's the Azure layout of the screens so it looks like you're running there and we take care of all the technical details underlying that which there's a lot which has taken a lot of work to make it work seamlessly. >> In the magic of that abstraction. Juan, does it happen at the PaaS layer? Could you take us inside that a little bit? Is there intelligence in there that helps you deal with latency or are there any kind of purpose-built functions for this service? >> You could think of it as... I mean it happens at a lot of different layers. It happens at the identity management layer, it happens at the networking layer, it happens at the database layer, it happens at the monitoring layer, at the management layer. So all those things have been integrated. So it's not one thing that you just go and do. You have to integrate all these different services together. You can access files in Azure from the Oracle database. Again, that's completely seamless. You, it's just like if it was local to our cloud you get your Azure files in your kind of S3 equivalent. So yeah, the, it's not one thing. There's a whole lot of pieces to the ecosystem. And what we've done is we've worked on each piece separately to make sure that it's completely seamless and transparent so you don't have to think about it, it just works. >> So you kind of answered my next question which is one of the technical hurdles. It sounds like the technical hurdles are that integration across the entire stack. That's the sort of architecture that you've built. What was the catalyst for this service? >> Yeah, the catalyst is just fulfilling our vision of an open cloud world. It's really like I said, Oracle, from the very beginning has been believed in open standards. Customers should be able to have choice customers should be able to use whatever they want from wherever they want. And we saw that, you know in the new world of cloud that had broken down everybody had their own authentication system management system, monitoring system networking system, configuration system. And it became very difficult. There was a lot of friction to using services across cloud. So we said, "Well, okay we can fix that." It's work, it's significant amount of work but we know how to do it and let's just go do it and make it easy for customers. >> So given Oracle is really your main focus is on mission critical workloads. You talked about this low latency network, I mean but you still have physical distances, so how are you managing that latency? What's the experience been for customers across Azure and OCI? >> Yeah, so it, it's a good point. I mean, latency can be an issue. So the good thing about clouds is we have a lot of cloud data centers. We have dozens and dozens of cloud data centers around the world. And Azure has dozens and dozens of cloud data centers. And in most cases, they're in the same metro region because there's kind of natural metro regions within each country that you want to put your cloud data centers in. So most of our data centers are actually very close to the Azure data centers. There's the kind of northern Virginia, there's London, there's Tokyo I mean, there's natural places where everybody puts their data centers Seoul et cetera. And so that's the real key. So that allows us to put a very high bandwidth and low latency network. The real problems with latency come when you're trying to go along physical distance. If you're trying to connect, you know across the Pacific or you know across the country or something like that, then you can get in trouble with latency within the same metro region. It's extremely fast. It tends to be around one, you know the highest two millisecond that's roundtrip through all the routers and connections and gateways and everything else. With everything taken into consideration, what we guarantee is it's always less than two millisecond which is a very low latency time. So that tends to not be a problem because it's extremely low latency. >> I was going to ask you less than two milliseconds. So, earlier in the program we had Jack Greenfield who runs architecture for Walmart, and he was explaining what we call their Supercloud, and it's runs across Azure, GCP, and they're on-prem. They have this thing called the triplet model. So my question to you is, are you in situations where you guaranteeing that less than two milliseconds do you have situations where you're bringing, you know Exadata Cloud, a customer on-prem to achieve that? Or is this just across clouds? >> Yeah, in this case, we're talking public cloud data center to public cloud data center. >> Oh okay. >> So add your public cloud data center to Oracle Public Cloud data center. They're in the same metro region. We set up the connections, we do all the technology to make it seamless. And from a customer point of view they don't really see the network. Also, remember that SQL is actually designed to have very low bandwidth and latency requirements. So it is a language. So you don't go to the database and say do this one little thing for me. You send it a SQL statement that can actually access lots of data while in the database. So the real latency requirement of a SQL database is within the database. So I need to access all that data fast. So I need very fast access to storage very fast access across node. That's what exit data gives you. But you send one request and that request can do a huge amount of work and then return one answer. And that's kind of the design point of SQL. So SQL is inherently low bandwidth requirements, it was used back in the eighties when we used to have 10 megabit networks and the the biggest companies in the world ran back then. So right now we're talking over hundred hundreds of gigabits. So it's really not much of a challenge. When you're designed to run on 10 megabit to say, okay I'm going to give you 10,000 times what you were designed for it's really, it's a pretty low hurdle jump. >> What about the deployment models? How do you handle this? Is it a single global instance across clouds or do you sort of instantiate in each you got exudate in Azure and exudates in OCI? What's the deployment model look like? >> It's pretty straightforward. So customer decides where they want to run their application and database. So there's natural places where people go. If you're in Tokyo, you're going to choose the local Tokyo data centers for both, you know Microsoft and Oracle. If you're in London, you're going to do that. If you're in California you're going to choose maybe San Jose, something like that. So a customer just chooses. We both have data centers in that metro region. So they create their service on Azure and then they go to our console which looks just like an Azure console and say all right create me a database. And then we choose the closest Oracle data center which is generally a few miles away, and then it it all gets created. So from a customer point of view, it's very straightforward. >> I'm always in awe about how simple you make things sound. All right what about security? You talked a little bit before about identity access how you sort of abstracting the Azure capabilities away so that you've simplified it for your customers but are there any other specific security things that you need to do? How much did you have to abstract the underlying primitives of Azure or OCI to present that common experience to customers? >> Yeah, so there's really two big things. One is the identity management. Like my name is X on Azure and I have this set of privileges. Oracle has its own identity management system, right? So what we didn't want is that you have to kind of like bridge these things yourself. It's a giant pain to do that. So we actually what we call federate across these identity managements. So you put your credentials into Azure and then they automatically get to use the exact same credentials and identity in the Oracle cloud. So again, you don't have to think about it, it just works. And then the second part is that the whole bridging the network. So within a cloud you generally have virtual network that's private to your company. And so at Oracle, we bridge the private network that you created in, for example, Azure to the private network that we create for you in Oracle. So it is still a private network without you having to do a whole bunch of work. So it's just like if you were in your own data center other people can't get into your network. So it's secured at the network level, it's secured at the identity management, and encryption level. And again we did a lot of work to make that seamless for customers and they don't have to worry about it because we did the work. That's really as simple as it gets. >> That's what's Supercloud's supposed to be all about. Alright, we were talking earlier about sort of the misperception around multicloud, your view of Open I think, which is you run the Oracle database, wherever the customer wants to run it. So you got this database service across OCI and Azure customers today, they run Oracle database in AWS. You got heat wave, MySQL, heat wave that you announced on AWS, Google touts a bare metal offering where you can run Oracle on GCP. Do you see a day when you extend an OCI Azure like situation across multiple clouds? Would that bring benefits to customers or will the world of database generally remain largely fenced with maybe a few exceptions like what you're doing with OCI and Azure? I'm particularly interested in your thoughts on egress fees as maybe one of the reasons that there is a barrier to this happening and why maybe these stove pipes, exist today and in the future. What are your thoughts on that? >> Yeah, we're very open to working with everyone else out there. Like I said, we've always been, big believers in customers should have choice and you should be able to run wherever you want. So that's been kind of a founding principle of Oracle. We have the Azure, we did a partnership with them, we're open to doing other partnerships and you're going to see other things coming down the pipe on the topic of egress. Yeah, the large egress fees, it's pretty obvious what goes on with that. Various vendors like to have large egress fees because they want to keep things kind of locked into their cloud. So it's not a very customer friendly thing to do. And I think everybody recognizes that it's really trying to kind of course or put a lot of friction on moving data out of a particular cloud. And that's not what we do. We have very, very low egress fees. So we don't really do that and we don't think anybody else should do that. But I think customers at the end of the day, will win that battle. They're going to have to go back to their vendor and say, well I have choice in clouds and if you're going to impose these limits on me, maybe I'll make a different choice. So that's ultimately how these things get resolved. >> So do you think other cloud providers are going to take a page out of what you're doing with Azure and provide similar solutions? >> Yeah, well I think customers want, I mean, I've talked to a lot of customers, this is what they want, right? I mean, there's really no doubt no customer wants to be locked into a single ecosystem. There's nobody out there that wants that. And as the competition, when they start seeing an open ecosystem evolving they're going to be like, okay, I'd rather go there than the closed ecosystem, and that's going to put pressure on the closed ecosystems. So that's the nature of competition. That's what ultimately will tip the balance on these things. >> So Juan, even though you have this capability of distributing a workload across multiple clouds as in our Supercloud premise it's still something that's relatively new. It's a big decision that maybe many people might consider somewhat of a risk. So I'm curious who's driving the decisions for your initial customers? What do they want to get out of it? What's the decision point there? >> Yeah, I mean, this is generally driven by customers that want a specific technology in a cloud. I think the risk, I haven't seen a lot of people worry too much about the risk. Everybody involved in this is a very well known, very reputable firm. I mean, Oracle's been around for 40 years. We run most of the world's largest companies. I think customers understand we're not going to build a solution that's going to put their technology and their business at risk. And the same thing with Azure and others. So I don't see customers too worried about this is a risky move because it's really not. And you know, everybody understands networking at the end the day networking works. I mean, how does the internet work? It's a known quantity. It's not like it's some brand new invention. What we're really doing is breaking down the barriers to interconnecting things. Automating 'em, making 'em easy. So there's not a whole lot of risk here for customers. And like I said, every single customer in the world loves an open ecosystem. It's just not a question. If you go to a customer would you rather put your technology or your business to run on a closed ecosystem or an open system? It's kind of not even worth asking a question. It's a no-brainer. >> All right, so we got to go. My last question. What do you think of the term "Supercloud"? You think it'll stick? >> We'll see. There's a lot of terms out there and it's always fun to see which terms stick. It's a cool term. I like it, but the decision makers are actually the public, what sticks and what doesn't. It's very hard to predict. >> Yeah well, it's been a lot of fun having you on, Juan. Really appreciate your time and always good to see you. >> All right, Dave, thanks a lot. It's always fun to talk to you. >> You bet. All right, keep it right there. More Supercloud two content from theCUBE Community Dave Vellante for John Furrier. We'll be right back. (upbeat music)

Published Date : Jan 12 2023

SUMMARY :

and cloud strategies to prepare happy to be here with you. just on the Oracle cloud of the ecosystem at Oracle. and I'd love to hear it And the cloud world has Or is it off the shelf Terraform? So at a high level, it looks to you Juan, does it happen at the PaaS layer? it happens at the database layer, So you kind of And we saw that, you know What's the experience been for customers across the Pacific or you know So my question to you is, to public cloud data center. So the real latency requirement and then they go to our console the Azure capabilities away So it's secured at the network level, So you got this database We have the Azure, we did So that's the nature of competition. What's the decision point there? down the barriers to the term "Supercloud"? and it's always fun to and always good to see you. It's always fun to talk to you. Vellante for John Furrier.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
MicrosoftORGANIZATION

0.99+

OracleORGANIZATION

0.99+

DavePERSON

0.99+

WalmartORGANIZATION

0.99+

Juan LoaizaPERSON

0.99+

AmazonORGANIZATION

0.99+

San JoseLOCATION

0.99+

CaliforniaLOCATION

0.99+

Dave VellantePERSON

0.99+

TokyoLOCATION

0.99+

JuanPERSON

0.99+

LondonLOCATION

0.99+

sixQUANTITY

0.99+

10,000 timesQUANTITY

0.99+

Jack GreenfieldPERSON

0.99+

GoogleORGANIZATION

0.99+

second partQUANTITY

0.99+

AWSORGANIZATION

0.99+

less than two millisecondQUANTITY

0.99+

less than two millisecondsQUANTITY

0.99+

OneQUANTITY

0.99+

SQLTITLE

0.99+

10 megabitQUANTITY

0.99+

bothQUANTITY

0.99+

AOLORGANIZATION

0.98+

each pieceQUANTITY

0.98+

MySQLTITLE

0.98+

first cloudQUANTITY

0.98+

singleQUANTITY

0.98+

each countryQUANTITY

0.98+

John FurrierPERSON

0.98+

two big thingsQUANTITY

0.98+

under two millisecondsQUANTITY

0.98+

oneQUANTITY

0.98+

northern VirginiaLOCATION

0.98+

CompuServeORGANIZATION

0.97+

first stepQUANTITY

0.97+

Mission Critical Database TechnologiesORGANIZATION

0.97+

one requestQUANTITY

0.97+

SeoulLOCATION

0.97+

AzureTITLE

0.97+

eachQUANTITY

0.97+

two millisecondQUANTITY

0.97+

AzureORGANIZATION

0.96+

one cloudQUANTITY

0.95+

one thingQUANTITY

0.95+

cloud data centersQUANTITY

0.95+

one answerQUANTITY

0.95+

SupercloudORGANIZATION

0.94+

Analyst Predictions 2023: The Future of Data Management


 

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Doug HenschenPERSON

0.99+

Dave MenningerPERSON

0.99+

DougPERSON

0.99+

CarlPERSON

0.99+

Carl OlofsonPERSON

0.99+

Dave MenningerPERSON

0.99+

Tony BaerPERSON

0.99+

TonyPERSON

0.99+

Dave ValentePERSON

0.99+

CollibraORGANIZATION

0.99+

Curt MonashPERSON

0.99+

Sanjeev MohanPERSON

0.99+

Christian KleinermanPERSON

0.99+

Dave ValentePERSON

0.99+

WalmartORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AWSORGANIZATION

0.99+

SanjeevPERSON

0.99+

Constellation ResearchORGANIZATION

0.99+

IBMORGANIZATION

0.99+

Ventana ResearchORGANIZATION

0.99+

2022DATE

0.99+

HazelcastORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Tony BearPERSON

0.99+

25%QUANTITY

0.99+

2021DATE

0.99+

last yearDATE

0.99+

65%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

todayDATE

0.99+

five-yearQUANTITY

0.99+

TigerGraphORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

two servicesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

DavidPERSON

0.99+

RisingWave LabsORGANIZATION

0.99+

Why Should Customers Care About SuperCloud


 

Hello and welcome back to Supercloud 2 where we examine the intersection of cloud and data in the 2020s. My name is Dave Vellante. Our Supercloud panel, our power panel is back. Maribel Lopez is the founder and principal analyst at Lopez Research. Sanjeev Mohan is former Gartner analyst and principal at Sanjeev Mohan. And Keith Townsend is the CTO advisor. Folks, welcome back and thanks for your participation today. Good to see you. >> Okay, great. >> Great to see you. >> Thanks. Let me start, Maribel, with you. Bob Muglia, we had a conversation as part of Supercloud the other day. And he said, "Dave, I like the work, you got to simplify this a little bit." So he said, quote, "A Supercloud is a platform." He said, "Think of it as a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." And then Nelu Mihai said, "Well, wait a minute. This is just going to create more stove pipes. We need more standards in an architecture," which is kind of what Berkeley Sky Computing initiative is all about. So there's a sort of a debate going on. Is supercloud an architecture, a platform? Or maybe it's just another buzzword. Maribel, do you have a thought on this? >> Well, the easy answer would be to say it's just a buzzword. And then we could just kill the conversation and be done with it. But I think the term, it's more than that, right? The term actually isn't new. You can go back to at least 2016 and find references to supercloud in Cornell University or assist in other documents. So, having said this, I think we've been talking about Supercloud for a while, so I assume it's more than just a fancy buzzword. But I think it really speaks to that undeniable trend of moving towards an abstraction layer to deal with the chaos of what we consider managing multiple public and private clouds today, right? So one definition of the technology platform speaks to a set of services that allows companies to build and run that technology smoothly without worrying about the underlying infrastructure, which really gets back to something that Bob said. And some of the question is where that lives. And you could call that an abstraction layer. You could call it cross-cloud services, hybrid cloud management. So I see momentum there, like legitimate momentum with enterprise IT buyers that are trying to deal with the fact that they have multiple clouds now. So where I think we're moving is trying to define what are the specific attributes and frameworks of that that would make it so that it could be consistent across clouds. What is that layer? And maybe that's what the supercloud is. But one of the things I struggle with with supercloud is. What are we really trying to do here? Are we trying to create differentiated services in the supercloud layer? Is a supercloud just another variant of what AWS, GCP, or others do? You spoken to Walmart about its cloud native platform, and that's an example of somebody deciding to do it themselves because they need to deal with this today and not wait for some big standards thing to happen. So whatever it is, I do think it's something. I think we're trying to maybe create an architecture out of it would be a better way of saying it so that it does get to those set of principles, but it also needs to be edge aware. I think whenever we talk about supercloud, we're always talking about like the big centralized cloud. And I think we need to think about all the distributed clouds that we're looking at in edge as well. So that might be one of the ways that supercloud evolves. >> So thank you, Maribel. Keith, Brian Gracely, Gracely's law, things kind of repeat themselves. We've seen it all before. And so what Muglia brought to the forefront is this idea of a platform where the platform provider is really responsible for the architecture. Of course, the drawback is then you get a a bunch of stove pipes architectures. But practically speaking, that's kind of the way the industry has always evolved, right? >> So if we look at this from the practitioner's perspective and we talk about platforms, traditionally vendors have provided the platforms for us, whether it's distribution of lineage managed by or provided by Red Hat, Windows, servers, .NET, databases, Oracle. We think of those as platforms, things that are fundamental we can build on top. Supercloud isn't today that. It is a framework or idea, kind of a visionary goal to get to a point that we can have a platform or a framework. But what we're seeing repeated throughout the industry in customers, whether it's the Walmarts that's kind of supersized the idea of supercloud, or if it's regular end user organizations that are coming out with platform groups, groups who normalize cloud native infrastructure, AWS multi-cloud, VMware resources to look like one thing internally to their developers. We're seeing this trend that there's a desire for a platform that provides the capabilities of a supercloud. >> Thank you for that. Sanjeev, we often use Snowflake as a supercloud example, and now would presumably would be a platform with an architecture that's determined by the vendor. Maybe Databricks is pushing for a more open architecture, maybe more of that nirvana that we were talking about before to solve for supercloud. But regardless, the practitioner discussions show. At least currently, there's not a lot of cross-cloud data sharing. I think it could be a killer use case, egress charges or a barrier. But how do you see it? Will that change? Will we hide that underlying complexity and start sharing data across cloud? Is that something that you think Snowflake or others will be able to achieve? >> So I think we are already starting to see some of that happen. Snowflake is definitely one example that gets cited a lot. But even we don't talk about MongoDB in this like, but you could have a MongoDB cluster, for instance, with nodes sitting in different cloud providers. So there are companies that are starting to do it. The advantage that these companies have, let's take Snowflake as an example, it's a centralized proprietary platform. And they are building the capabilities that are needed for supercloud. So they're building things like you can push down your data transformations. They have the entire security and privacy suite. Data ops, they're adding those capabilities. And if I'm not mistaken, it'll be very soon, we will see them offer data observability. So it's all works great as long as you are in one platform. And if you want resilience, then Snowflake, Supercloud, great example. But if your primary goal is to choose the most cost-effective service irrespective of which cloud it sits in, then things start falling sideways. For example, I may be a very big Snowflake user. And I like Snowflake's resilience. I can move from one cloud to another cloud. Snowflake does it for me. But what if I want to train a very large model? Maybe Databricks is a better platform for that. So how do I do move my workload from one platform to another platform? That tooling does not exist. So we need server hybrid, cross-cloud, data ops platform. Walmart has done a great job, but they built it by themselves. Not every company is Walmart. Like Maribel and Keith said, we need standards, we need reference architectures, we need some sort of a cost control. I was just reading recently, Accenture has been public about their AWS bill. Every time they get the bill is tens of millions of lines, tens of millions 'cause there are over thousand teams using AWS. If we have not been able to corral a usage of a single cloud, now we're talking about supercloud, we've got multiple clouds, and hybrid, on-prem, and edge. So till we've got some cross-platform tooling in place, I think this will still take quite some time for it to take shape. >> It's interesting. Maribel, Walmart would tell you that their on-prem infrastructure is cheaper to run than the stuff in the cloud. but at the same time, they want the flexibility and the resiliency of their three-legged stool model. So the point as Sanjeev was making about hybrid. It's an interesting balance, isn't it, between getting your lowest cost and at the same time having best of breed and scale? >> It's basically what you're trying to optimize for, as you said, right? And by the way, to the earlier point, not everybody is at Walmart's scale, so it's not actually cheaper for everybody to have the purchasing power to make the cloud cheaper to have it on-prem. But I think what you see almost every company, large or small, moving towards is this concept of like, where do I find the agility? And is the agility in building the infrastructure for me? And typically, the thing that gives you outside advantage as an organization is not how you constructed your cloud computing infrastructure. It might be how you structured your data analytics as an example, which cloud is related to that. But how do you marry those two things? And getting back to sort of Sanjeev's point. We're in a real struggle now where one hand we want to have best of breed services and on the other hand we want it to be really easy to manage, secure, do data governance. And those two things are really at odds with each other right now. So if you want all the knobs and switches of a service like geospatial analytics and big query, you're going to have to use Google tools, right? Whereas if you want visibility across all the clouds for your application of state and understand the security and governance of that, you're kind of looking for something that's more cross-cloud tooling at that point. But whenever you talk to somebody about cross-cloud tooling, they look at you like that's not really possible. So it's a very interesting time in the market. Now, we're kind of layering this concept of supercloud on it. And some people think supercloud's about basically multi-cloud tooling, and some people think it's about a whole new architectural stack. So we're just not there yet. But it's not all about cost. I mean, cloud has not been about cost for a very, very long time. Cloud has been about how do you really make the most of your data. And this gets back to cross-cloud services like Snowflake. Why did they even exist? They existed because we had data everywhere, but we need to treat data as a unified object so that we can analyze it and get insight from it. And so that's where some of the benefit of these cross-cloud services are moving today. Still a long way to go, though, Dave. >> Keith, I reached out to my friends at ETR given the macro headwinds, And you're right, Maribel, cloud hasn't really been about just about cost savings. But I reached out to the ETR, guys, what's your data show in terms of how customers are dealing with the economic headwinds? And they said, by far, their number one strategy to cut cost is consolidating redundant vendors. And a distant second, but still notable was optimizing cloud costs. Maybe using reserve instances, or using more volume buying. Nowhere in there. And I asked them to, "Could you go look and see if you can find it?" Do we see repatriation? And you hear this a lot. You hear people whispering as analysts, "You better look into that repatriation trend." It's pretty big. You can't find it. But some of the Walmarts in the world, maybe even not repatriating, but they maybe have better cost structure on-prem. Keith, what are you seeing from the practitioners that you talk to in terms of how they're dealing with these headwinds? >> Yeah, I just got into a conversation about this just this morning with (indistinct) who is an analyst over at GigaHome. He's reading the same headlines. Repatriation is happening at large scale. I think this is kind of, we have these quiet terms now. We have quiet quitting, we have quiet hiring. I think we have quiet repatriation. Most people haven't done away with their data centers. They're still there. Whether they're completely on-premises data centers, and they own assets, or they're partnerships with QTX, Equinix, et cetera, they have these private cloud resources. What I'm seeing practically is a rebalancing of workloads. Do I really need to pay AWS for this instance of SAP that's on 24 hours a day versus just having it on-prem, moving it back to my data center? I've talked to quite a few customers who were early on to moving their static SAP workloads onto the public cloud, and they simply moved them back. Surprising, I was at VMware Explore. And we can talk about this a little bit later on. But our customers, net new, not a lot that were born in the cloud. And they get to this point where their workloads are static. And they look at something like a Kubernetes, or a OpenShift, or VMware Tanzu. And they ask the question, "Do I need the scalability of cloud?" I might consider being a net new VMware customer to deliver this base capability. So are we seeing repatriation as the number one reason? No, I think internal IT operations are just naturally come to this realization. Hey, I have these resources on premises. The private cloud technologies have moved far along enough that I can just simply move this workload back. I'm not calling it repatriation, I'm calling it rightsizing for the operating model that I have. >> Makes sense. Yeah. >> Go ahead. >> If I missed something, Dave, why we are on this topic of repatriation. I'm actually surprised that we are talking about repatriation as a very big thing. I think repatriation is happening, no doubt, but it's such a small percentage of cloud migration that to me it's a rounding error in my opinion. I think there's a bigger problem. The problem is that people don't know where the cost is. If they knew where the cost was being wasted in the cloud, they could do something about it. But if you don't know, then the easy answer is cloud costs a lot and moving it back to on-premises. I mean, take like Capital One as an example. They got rid of all the data centers. Where are they going to repatriate to? They're all in the cloud at this point. So I think my point is that data observability is one of the places that has seen a lot of traction is because of cost. Data observability, when it first came into existence, it was all about data quality. Then it was all about data pipeline reliability. And now, the number one killer use case is FinOps. >> Maribel, you had a comment? >> Yeah, I'm kind of in violent agreement with both Sanjeev and Keith. So what are we seeing here? So the first thing that we see is that many people wildly overspent in the big public cloud. They had stranded cloud credits, so to speak. The second thing is, some of them still had infrastructure that was useful. So why not use it if you find the right workloads to what Keith was talking about, if they were more static workloads, if it was already there? So there is a balancing that's going on. And then I think fundamentally, from a trend standpoint, these things aren't binary. Everybody, for a while, everything was going to go to the public cloud and then people are like, "Oh, it's kind of expensive." Then they're like, "Oh no, they're going to bring it all on-prem 'cause it's really expensive." And it's like, "Well, that doesn't necessarily get me some of the new features and functionalities I might want for some of my new workloads." So I'm going to put the workloads that have a certain set of characteristics that require cloud in the cloud. And if I have enough capability on-prem and enough IT resources to manage certain things on site, then I'm going to do that there 'cause that's a more cost-effective thing for me to do. It's not binary. That's why we went to hybrid. And then we went to multi just to describe the fact that people added multiple public clouds. And now we're talking about super, right? So I don't look at it as a one-size-fits-all for any of this. >> A a number of practitioners leading up to Supercloud2 have told us that they're solving their cloud complexity by going in monocloud. So they're putting on the blinders. Even though across the organization, there's other groups using other clouds. You're like, "In my group, we use AWS, or my group, we use Azure. And those guys over there, they use Google. We just kind of keep it separate." Are you guys hearing this in your view? Is that risky? Are they missing out on some potential to tap best of breed? What do you guys think about that? >> Everybody thinks they're monocloud. Is anybody really monocloud? It's like a group is monocloud, right? >> Right. >> This genie is out of the bottle. We're not putting the genie back in the bottle. You might think your monocloud and you go like three doors down and figure out the guy or gal is on a fundamentally different cloud, running some analytics workload that you didn't know about. So, to Sanjeev's earlier point, they don't even know where their cloud spend is. So I think the concept of monocloud, how that's actually really realized by practitioners is primary and then secondary sources. So they have a primary cloud that they run most of their stuff on, and that they try to optimize. And we still have forked workloads. Somebody decides, "Okay, this SAP runs really well on this, or these analytics workloads run really well on that cloud." And maybe that's how they parse it. But if you really looked at it, there's very few companies, if you really peaked under the hood and did an analysis that you could find an actual monocloud structure. They just want to pull it back in and make it more manageable. And I respect that. You want to do what you can to try to streamline the complexity of that. >> Yeah, we're- >> Sorry, go ahead, Keith. >> Yeah, we're doing this thing where we review AWS service every day. Just in your inbox, learn about a new AWS service cursory. There's 238 AWS products just on the AWS cloud itself. Some of them are redundant, but you get the idea. So the concept of monocloud, I'm in filing agreement with Maribel on this that, yes, a group might say I want a primary cloud. And that primary cloud may be the AWS. But have you tried the licensed Oracle database on AWS? It is really tempting to license Oracle on Oracle Cloud, Microsoft on Microsoft. And I can't get RDS anywhere but Amazon. So while I'm driven to desire the simplicity, the reality is whether be it M&A, licensing, data sovereignty. I am forced into a multi-cloud management style. But I do agree most people kind of do this one, this primary cloud, secondary cloud. And I guarantee you're going to have a third cloud or a fourth cloud whether you want to or not via shadow IT, latency, technical reasons, et cetera. >> Thank you. Sanjeev, you had a comment? >> Yeah, so I just wanted to mention, as an organization, I'm complete agreement, no organization is monocloud, at least if it's a large organization. Large organizations use all kinds of combinations of cloud providers. But when you talk about a single workload, that's where the program arises. As Keith said, the 238 services in AWS. How in the world am I going to be an expert in AWS, but then say let me bring GCP or Azure into a single workload? And that's where I think we probably will still see monocloud as being predominant because the team has developed its expertise on a particular cloud provider, and they just don't have the time of the day to go learn yet another stack. However, there are some interesting things that are happening. For example, if you look at a multi-cloud example where Oracle and Microsoft Azure have that interconnect, so that's a beautiful thing that they've done because now in the newest iteration, it's literally a few clicks. And then behind the scene, your .NET application and your Oracle database in OCI will be configured, the identities in active directory are federated. And you can just start using a database in one cloud, which is OCI, and an application, your .NET in Azure. So till we see this kind of a solution coming out of the providers, I think it's is unrealistic to expect the end users to be able to figure out multiple clouds. >> Well, I have to share with you. I can't remember if he said this on camera or if it was off camera so I'll hold off. I won't tell you who it is, but this individual was sort of complaining a little bit saying, "With AWS, I can take their best AI tools like SageMaker and I can run them on my Snowflake." He said, "I can't do that in Google. Google forces me to go to BigQuery if I want their excellent AI tools." So he was sort of pushing, kind of tweaking a little bit. Some of the vendor talked that, "Oh yeah, we're so customer-focused." Not to pick on Google, but I mean everybody will say that. And then you say, "If you're so customer-focused, why wouldn't you do a ABC?" So it's going to be interesting to see who leads that integration and how broadly it's applied. But I digress. Keith, at our first supercloud event, that was on August 9th. And it was only a few months after Broadcom announced the VMware acquisition. A lot of people, myself included said, "All right, cuts are coming." Generally, Tanzu is probably going to be under the radar, but it's Supercloud 22 and presumably VMware Explore, the company really... Well, certainly the US touted its Tanzu capabilities. I wasn't at VMware Explore Europe, but I bet you heard similar things. Hawk Tan has been blogging and very vocal about cross-cloud services and multi-cloud, which doesn't happen without Tanzu. So what did you hear, Keith, in Europe? What's your latest thinking on VMware's prospects in cross-cloud services/supercloud? >> So I think our friend and Cube, along host still be even more offended at this statement than he was when I sat in the Cube. This was maybe five years ago. There's no company better suited to help industries or companies, cross-cloud chasm than VMware. That's not a compliment. That's a reality of the industry. This is a very difficult, almost intractable problem. What I heard that VMware Europe were customers serious about this problem, even more so than the US data sovereignty is a real problem in the EU. Try being a company in Switzerland and having the Swiss data solvency issues. And there's no local cloud presence there large enough to accommodate your data needs. They had very serious questions about this. I talked to open source project leaders. Open source project leaders were asking me, why should I use the public cloud to host Kubernetes-based workloads, my projects that are building around Kubernetes, and the CNCF infrastructure? Why should I use AWS, Google, or even Azure to host these projects when that's undifferentiated? I know how to run Kubernetes, so why not run it on-premises? I don't want to deal with the hardware problems. So again, really great questions. And then there was always the specter of the problem, I think, we all had with the acquisition of VMware by Broadcom potentially. 4.5 billion in increased profitability in three years is a unbelievable amount of money when you look at the size of the problem. So a lot of the conversation in Europe was about industry at large. How do we do what regulators are asking us to do in a practical way from a true technology sense? Is VMware cross-cloud great? >> Yeah. So, VMware, obviously, to your point. OpenStack is another way of it. Actually, OpenStack, uptake is still alive and well, especially in those regions where there may not be a public cloud, or there's public policy dictating that. Walmart's using OpenStack. As you know in IT, some things never die. Question for Sanjeev. And it relates to this new breed of data apps. And Bob Muglia and Tristan Handy from DBT Labs who are participating in this program really got us thinking about this. You got data that resides in different clouds, it maybe even on-prem. And the machine polls data from different systems. No humans involved, e-commerce, ERP, et cetera. It creates a plan, outcomes. No human involvement. Today, you're on a CRM system, you're inputting, you're doing forms, you're, you're automating processes. We're talking about a new breed of apps. What are your thoughts on this? Is it real? Is it just way off in the distance? How does machine intelligence fit in? And how does supercloud fit? >> So great point. In fact, the data apps that you're talking about, I call them data products. Data products first came into limelight in the last couple of years when Jamal Duggan started talking about data mesh. I am taking data products out of the data mesh concept because data mesh, whether data mesh happens or not is analogous to data products. Data products, basically, are taking a product management view of bringing data from different sources based on what the consumer needs. We were talking earlier today about maybe it's my vacation rentals, or it may be a retail data product, it may be an investment data product. So it's a pre-packaged extraction of data from different sources. But now I have a product that has a whole lifecycle. I can version it. I have new features that get added. And it's a very business data consumer centric. It uses machine learning. For instance, I may be able to tell whether this data product has stale data. Who is using that data? Based on the usage of the data, I may have a new data products that get allocated. I may even have the ability to take existing data products, mash them up into something that I need. So if I'm going to have that kind of power to create a data product, then having a common substrate underneath, it can be very useful. And that could be supercloud where I am making API calls. I don't care where the ERP, the CRM, the survey data, the pricing engine where they sit. For me, there's a logical abstraction. And then I'm building my data product on top of that. So I see a new breed of data products coming out. To answer your question, how early we are or is this even possible? My prediction is that in 2023, we will start seeing more of data products. And then it'll take maybe two to three years for data products to become mainstream. But it's starting this year. >> A subprime mortgages were a data product, definitely were humans involved. All right, let's talk about some of the supercloud, multi-cloud players and what their future looks like. You can kind of pick your favorites. VMware, Snowflake, Databricks, Red Hat, Cisco, Dell, HP, Hashi, IBM, CloudFlare. There's many others. cohesive rubric. Keith, I wanted to start with CloudFlare because they actually use the term supercloud. and just simplifying what they said. They look at it as taking serverless to the max. You write your code and then you can deploy it in seconds worldwide, of course, across the CloudFlare infrastructure. You don't have to spin up containers, you don't go to provision instances. CloudFlare worries about all that infrastructure. What are your thoughts on CloudFlare this approach and their chances to disrupt the current cloud landscape? >> As Larry Ellison said famously once before, the network is the computer, right? I thought that was Scott McNeley. >> It wasn't Scott McNeley. I knew it was on Oracle Align. >> Oracle owns that now, owns that line. >> By purpose or acquisition. >> They should have just called it cloud. >> Yeah, they should have just called it cloud. >> Easier. >> Get ahead. >> But if you think about the CloudFlare capability, CloudFlare in its own right is becoming a decent sized cloud provider. If you have compute out at the edge, when we talk about edge in the sense of CloudFlare and points of presence, literally across the globe, you have all of this excess computer, what do you do with it? First offering, let's disrupt data in the cloud. We can't start the conversation talking about data. When they say we're going to give you object-oriented or object storage in the cloud without egress charges, that's disruptive. That we can start to think about supercloud capability of having compute EC2 run in AWS, pushing and pulling data from CloudFlare. And now, I've disrupted this roach motel data structure, and that I'm freely giving away bandwidth, basically. Well, the next layer is not that much more difficult. And I think part of CloudFlare's serverless approach or supercloud approaches so that they don't have to commit to a certain type of compute. It is advantageous. It is a feature for me to be able to go to EC2 and pick a memory heavy model, or a compute heavy model, or a network heavy model, CloudFlare is taken away those knobs. and I'm just giving code and allowing that to run. CloudFlare has a massive network. If I can put the code closest using the CloudFlare workers, if I can put that code closest to where the data is at or residing, super compelling observation. The question is, does it scale? I don't get the 238 services. While Server List is great, I have to know what I'm going to build. I don't have a Cognito, or RDS, or all these other services that make AWS, GCP, and Azure appealing from a builder's perspective. So it is a very interesting nascent start. It's great because now they can hide compute. If they don't have the capacity, they can outsource that maybe at a cost to one of the other cloud providers, but kind of hiding the compute behind the surplus architecture is a really unique approach. >> Yeah. And they're dipping their toe in the water. And they've announced an object store and a database platform and more to come. We got to wrap. So I wonder, Sanjeev and Maribel, if you could maybe pick some of your favorites from a competitive standpoint. Sanjeev, I felt like just watching Snowflake, I said, okay, in my opinion, they had the right strategy, which was to run on all the clouds, and then try to create that abstraction layer and data sharing across clouds. Even though, let's face it, most of it might be happening across regions if it's happening, but certainly outside of an individual account. But I felt like just observing them that anybody who's traditional on-prem player moving into the clouds or anybody who's a cloud native, it just makes total sense to write to the various clouds. And to the extent that you can simplify that for users, it seems to be a logical strategy. Maybe as I said before, what multi-cloud should have been. But are there companies that you're watching that you think are ahead in the game , or ones that you think are a good model for the future? >> Yes, Snowflake, definitely. In fact, one of the things we have not touched upon very much, and Keith mentioned a little bit, was data sovereignty. Data residency rules can require that certain data should be written into certain region of a certain cloud. And if my cloud provider can abstract that or my database provider, then that's perfect for me. So right now, I see Snowflake is way ahead of this pack. I would not put MongoDB too far behind. They don't really talk about this thing. They are in a different space, but now they have a lakehouse, and they've got all of these other SQL access and new capabilities that they're announcing. So I think they would be quite good with that. Oracle is always a dark forest. Oracle seems to have revived its Cloud Mojo to some extent. And it's doing some interesting stuff. Databricks is the other one. I have not seen Databricks. They've been very focused on lakehouse, unity, data catalog, and some of those pieces. But they would be the obvious challenger. And if they come into this space of supercloud, then they may bring some open source technologies that others can rely on like Delta Lake as a table format. >> Yeah. One of these infrastructure players, Dell, HPE, Cisco, even IBM. I mean, I would be making my infrastructure as programmable and cloud friendly as possible. That seems like table stakes. But Maribel, any companies that stand out to you that we should be paying attention to? >> Well, we already mentioned a bunch of them, so maybe I'll go a slightly different route. I'm watching two companies pretty closely to see what kind of traction they get in their established companies. One we already talked about, which is VMware. And the thing that's interesting about VMware is they're everywhere. And they also have the benefit of having a foot in both camps. If you want to do it the old way, the way you've always done it with VMware, they got all that going on. If you want to try to do a more cross-cloud, multi-cloud native style thing, they're really trying to build tools for that. So I think they have really good access to buyers. And that's one of the reasons why I'm interested in them to see how they progress. The other thing, I think, could be a sleeping horse oddly enough is Google Cloud. They've spent a lot of work and time on Anthos. They really need to create a certain set of differentiators. Well, it's not necessarily in their best interest to be the best multi-cloud player. If they decide that they want to differentiate on a different layer of the stack, let's say they want to be like the person that is really transformative, they talk about transformation cloud with analytics workloads, then maybe they do spend a good deal of time trying to help people abstract all of the other underlying infrastructure and make sure that they get the sexiest, most meaningful workloads into their cloud. So those are two people that you might not have expected me to go with, but I think it's interesting to see not just on the things that might be considered, either startups or more established independent companies, but how some of the traditional providers are trying to reinvent themselves as well. >> I'm glad you brought that up because if you think about what Google's done with Kubernetes. I mean, would Google even be relevant in the cloud without Kubernetes? I could argue both sides of that. But it was quite a gift to the industry. And there's a motivation there to do something unique and different from maybe the other cloud providers. And I'd throw in Red Hat as well. They're obviously a key player and Kubernetes. And Hashi Corp seems to be becoming the standard for application deployment, and terraform, or cross-clouds, and there are many, many others. I know we're leaving lots out, but we're out of time. Folks, I got to thank you so much for your insights and your participation in Supercloud2. Really appreciate it. >> Thank you. >> Thank you. >> Thank you. >> This is Dave Vellante for John Furrier and the entire Cube community. Keep it right there for more content from Supercloud2.

Published Date : Jan 10 2023

SUMMARY :

And Keith Townsend is the CTO advisor. And he said, "Dave, I like the work, So that might be one of the that's kind of the way the that we can have a Is that something that you think Snowflake that are starting to do it. and the resiliency of their and on the other hand we want it But I reached out to the ETR, guys, And they get to this point Yeah. that to me it's a rounding So the first thing that we see is to Supercloud2 have told us Is anybody really monocloud? and that they try to optimize. And that primary cloud may be the AWS. Sanjeev, you had a comment? of a solution coming out of the providers, So it's going to be interesting So a lot of the conversation And it relates to this So if I'm going to have that kind of power and their chances to disrupt the network is the computer, right? I knew it was on Oracle Align. Oracle owns that now, Yeah, they should have so that they don't have to commit And to the extent that you And if my cloud provider can abstract that that stand out to you And that's one of the reasons Folks, I got to thank you and the entire Cube community.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
KeithPERSON

0.99+

Dave VellantePERSON

0.99+

Jamal DugganPERSON

0.99+

Nelu MihaiPERSON

0.99+

IBMORGANIZATION

0.99+

MaribelPERSON

0.99+

Bob MugliaPERSON

0.99+

CiscoORGANIZATION

0.99+

DellORGANIZATION

0.99+

EuropeLOCATION

0.99+

OracleORGANIZATION

0.99+

Tristan HandyPERSON

0.99+

Keith TownsendPERSON

0.99+

Larry EllisonPERSON

0.99+

Brian GracelyPERSON

0.99+

BobPERSON

0.99+

HPORGANIZATION

0.99+

AWSORGANIZATION

0.99+

EquinixORGANIZATION

0.99+

QTXORGANIZATION

0.99+

WalmartORGANIZATION

0.99+

Maribel LopezPERSON

0.99+

August 9thDATE

0.99+

DavePERSON

0.99+

GracelyPERSON

0.99+

AmazonORGANIZATION

0.99+

WalmartsORGANIZATION

0.99+

Red HatORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

SanjeevPERSON

0.99+

MicrosoftORGANIZATION

0.99+

HashiORGANIZATION

0.99+

GigaHomeORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

2023DATE

0.99+

Hawk TanPERSON

0.99+

GoogleORGANIZATION

0.99+

two companiesQUANTITY

0.99+

two thingsQUANTITY

0.99+

BroadcomORGANIZATION

0.99+

SwitzerlandLOCATION

0.99+

SnowflakeTITLE

0.99+

SnowflakeORGANIZATION

0.99+

HPEORGANIZATION

0.99+

twoQUANTITY

0.99+

238 servicesQUANTITY

0.99+

two peopleQUANTITY

0.99+

2016DATE

0.99+

GartnerORGANIZATION

0.99+

tens of millionsQUANTITY

0.99+

three yearsQUANTITY

0.99+

DBT LabsORGANIZATION

0.99+

fourth cloudQUANTITY

0.99+

Bob Muglia, George Gilbert & Tristan Handy | How Supercloud will Support a new Class of Data Apps


 

(upbeat music) >> Hello, everybody. This is Dave Vellante. Welcome back to Supercloud2, where we're exploring the intersection of data analytics and the future of cloud. In this segment, we're going to look at how the Supercloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications, they're not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend, George Gilbert, former Gartner Analyst, Wiki Bond market analyst, former equities analyst as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs and Bob Muglia, who's the former President of Microsoft's Enterprise business and former CEO of Snowflake. Welcome all, gentlemen. Thank you for coming on the program. >> Good to be here. >> Thanks for having us. >> Hey, look, I'm going to start actually with the SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input, some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? >> Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think it can be tightened a bit, and I think it's helpful to do it in as short a way as possible. And so in the last day I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. This is one day old, so gimme a break if it's going to change. And of course we have to follow the industry, and so that, and whatever the industry decides, but let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. >> Boom. Nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that or what are your thoughts on the whole SuperCloud concept? >> So as I read through this, I fully realize that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the Confluence, Snowflake, you know, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this, and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out Snowflake for being an example of this, but I think that there are like many folks, myself included, for whom this world like fully exists today. >> Yeah, I think that's a fair, I dunno if it's criticism, but people observe, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Burns Lee saying, well, this is what the internet was supposed to be. It was supposed to be Web 2.0, so maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you? When people talk about data products, is that what they mean? Are we talking about something more, different? What are data apps to you? >> Well, to understand data apps, it's useful to contrast them to something, and I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytic services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. >> Got it. You know, George, you and I were talking about, you know, it comes back to SuperCloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? >> Oh, geez. This is like a conversation that I don't know has an end. It's like been, I write a substack, and there's like this little community of people who all write substack. We argue with each other about these kinds of things. Like, you know, as many different takes on this question as you can find, but the way that I think about it is that data products are atomic units of functionality that are fundamentally data driven in nature. So a data product can be as simple as an interactive dashboard that is like actually had design thinking put into it and serves a particular user group and has like actually gone through kind of a product development life cycle. And then a data app or data application is a kind of cohesive end-to-end experience that often encompasses like many different data products. So from my perspective there, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in, you know, software engineering for, you know, as long as there have been computers. >> You know, Jamak Dagani oftentimes uses the, you know, she doesn't name Spotify, but I think it's Spotify as that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take, like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data, and then you got this system that kind of orchestrates the business process, like runs a forecast. But in this data driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain and then actually building a plan? George, is that how you see it? >> I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or driver. But the app then looks at the location of all the drivers in proximity, and it matches a driver to a rider. It calculates an ETA to the rider. It calculates an ETA then to the destination, and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to, you know, calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that, how does that take us down that path? >> So what we've seen in the past, I dunno, decade or so, is that one of the most successful business models in infrastructure is taking hard things and rolling 'em up behind APIs. You take messaging, you take payments, and you all of a sudden increase the capability of kind of your median application developer. And you say, you know, previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments, and now you can focus on your business logic, and just create the thing. One of, interestingly, one of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse, inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily, but in fact that's not the case. It's actually quite challenging to, and involves a lot of data engineering pipeline and all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes, and they don't need to. >> So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that, that's a well scoped sort of API that people can work within. How much can you extend that to say non-calculated business rules or governance information like data reliability rules, things like that, or even, you know, features for an AIML feature store. In other words, it starts, you started pragmatically, but how far can you grow? >> Bob is waiting with bated breath to answer this question. I'm, just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of a thing, get it out there, and see if people use it. But the idea that, the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. And what we're trying to do here is make it very easy to access the logical manifestation of the data, and metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that. And I'm sure that there will be more kind of logical structures as well. >> So, Bob, chime in on this. You know, what's your thoughts on the right architecture behind this, and how do we get there? >> Yeah, well first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never going to go away. It's very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're going to move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing, and you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step. And learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits and any form of computers. But what we do is what I would call implicit modeling, which means that it's the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happen and during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing. A number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs. You know, some of these incredible machine learning programs in foundation models like GPT and DALL-E and all of the things that are happening in these global scale models, but also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that, that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. >> Oh, oh wow. There's a lot to unpack there. So let me ask the Colombo question, Tristan, we've been making fun of your youth. We're just, we're just jealous. Colombo, I'll explain it offline maybe. >> I watch Colombo. >> Okay. All right, good. So but today if you think about the application stack and the data stack, which is largely an analytics pipeline. They're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack cause the data stack today is so, you know, hard to manage. But but do those worlds have to come together? And you know, through that model, I guess abstraction or translation that Bob was just describing, how do you guys think about that? Who wants to take that? >> I think it's inevitable that data and AI are going to become closer together? I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse portmanteau or not. There's also, there's a next generation of data tech that is still in the like early stage of being developed. There's a company that I love that is essentially Cross Cloud Lambda, and it's just a wonderful abstraction for computing. So I think that, you know, people have been predicting that these worlds are going to come together for awhile. A16Z wrote a great post on this back in I think 2020, predicting this, and I've been predicting this since since 2020. But what's not clear is the timeline, but I think that this is still just as inevitable as it's been. >> Who's that that does Cross Cloud? >> Let me follow up on. >> Who's that, Tristan, that does Cross Cloud Lambda? Can you name names? >> Oh, they're called Modal Labs. >> Modal Labs, yeah, of course. All right, go ahead, George. >> Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data centric, where data's not locked inside or behind the APIs of different applications so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics you're going to add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time then why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative? Or where, what type of limitations would you face? >> Well, I think that it is the way the world will evolve is to start with the modern data stack and, you know, which is operational applications going through a data pipeline into some form of data lake, data warehouse, the Lakehouse, whatever you want to call it. And then, you know, this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is of course building their data warehouse. Certainly Google, Microsoft and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not going to go away because that infrastructure is relatively new. It's just being deployed by companies, and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you need to, if you want to model all, if you want to talk about all the semantics of even data, you need to think about all of the relationships. You need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, you know, a large financial organization that is building out these semantic layers. They're further along than many companies are. And you know, I asked what they're building it on, and you know, it's not surprising they're using a, they're using combinations of some form of search together with, you know, textual based search together with a document oriented database. In this case it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They don't really, they can't manage the complicated relationships that are required. They can't issue the queries that are required. And so a new generation of database needs to be developed. And fortunately, you know, that is happening. The world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a big query, you know, you get tens of different joins coming together, and that query is going to take a really long time. Well, fortunately, technology is evolving, and it's possible with new join algorithms, worst case, optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important, and I think that will be a requirement to have this semantically reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run I think we'll see a new set of databases to support these models. >> So Tristan, you got to respond to that, right? You got to, so take the example of Snowflake. We know it doesn't deal well with complex joins, but they're, they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem, and others, but please, your thoughts on what Bob just shared. >> Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? >> No, I can talk about it. I mean, we can talk about it. Look, the company I've been working with is Relational AI, and they're doing this work to actually first of all work across the industry with academics and research, you know, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published, just like SQL, the underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that. And those algorithms actually go back to system R and what IBM developed in the 1970s. We're just, there's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, you know, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what, to Codd, and what he wrote, he defined two things. He defined a relational calculus and relational algebra. And essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics. And it's possible to have systems that can do everything and that can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. >> So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic layer using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry, if not more. But the wonderful thing about DBT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine grain detail, more procedural, then we're going to support that too. And so I'm excited about all of it. >> Yeah, so interpreting that short answer, you're basically saying, cause Bob was just kind of pointing to you as incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? >> I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to like use different workflows to actually make use of that technology. So yeah, yeah, we are a very incremental company. We're going to continue being that way. >> Well, I think Bob was using incremental as a pejorative. I mean, I, but to your point, a lot. >> No, I don't think so. I want to stop that. No, I don't think it's pejorative at all. I think incremental, incremental is usually the most successful path. >> Yes, of course. >> In my experience. >> We agree, we agree on that. >> Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's going to go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now you can argue whether it's two years, three years, five years, or 10 years, but I'd be shocked if it didn't happen in 10 years. >> Yeah, so we all agree that incremental is less disruptive. Boom, but Tristan, you're, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision, and then Bob, and I'm inferring from Bob's comments that maybe you don't think that's the case, but please. >> No, no, no. I think that, so Bob, let me put words into your mouth and you tell me if you disagree, DBT is completely useless in a world where a large scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large scale computing platforms. And to the extent that those platforms extend their functionality to bring more capabilities, we will also service those capabilities. >> Let me try and bridge the two. >> Yeah, yeah, so Bob, Bob, Bob, do you concur with what Tristan just said? >> Absolutely, I mean there's nothing to argue with in what Tristan just said. >> I wanted. >> And it's what he's doing. It'll continue to, I believe he'll continue to do it, and I think it's a very good thing for the industry. You know, I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about Cosmos with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay. That's the purpose of it. It becomes a poor man's join engine. And I kind of go, I know there's a better answer than that. I know there is, but that's kind of where we are state of the art right now. >> George, we got to wrap it. So give us the last word here. Go ahead, George. >> Okay, I just, I think there's a way to tie together what Tristan and Bob are both talking about, and I want them to validate it, which is for five years we're going to be adding or some number of years more and more semantics to the operational and analytic data that we have, starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by not having, by having the knowledge graph store sort of the joins, all the complex relationships among the data, but having the semantics in the DBT layer? >> Well, I think this, okay, I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they want to ask questions, they want to use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored and structured, or at least metadata is stored and structured. Again, I'm not saying that anytime in the next 10 years, SQL is going to go away. In fact, more SQL will be written in the future than has been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean that's what they are today. They're incredibly powerful at working with large amounts of data, and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. >> But Tristan, bring us home here. It sounds like, let me put pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a supercloud, sort of where data lives irrespective of physical location. We're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. >> 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers, that we can join data across different physical regions and that we as practitioners are going to kind of start forgetting about details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh I think really starts to get very, very critical as a concept. >> Guys, great conversation. It was really awesome to have you. I can't thank you enough for spending time with us. Really appreciate it. >> Thanks a lot. >> All right. This is Dave Vellante for George Gilbert, John Furrier, and the entire Cube community. Keep it right there for more content. You're watching SuperCloud2. (upbeat music)

Published Date : Jan 4 2023

SUMMARY :

and the future of cloud. And Bob, you have some really and I think it's helpful to do it I'm going to go back and And I noticed that you is that what they mean? that we're familiar with, you know, it comes back to SuperCloud, is that data products are George, is that how you see it? that don't require a human to is that one of the most And DBT has, you know, the And I'm sure that there will be more on the right architecture is that in the next 10 years, So let me ask the Colombo and the data stack, which is that is still in the like Modal Labs, yeah, of course. and that sits above the and that query is going to So Tristan, you got to and that can store all of the that there is to do for the pointing to you as incremental, but allowing practitioners to I mean, I, but to your point, a lot. the most successful path. that that's the way the that you believe you have the architecture and you tell me if you disagree, there's nothing to argue with And I kind of go, I know there's George, we got to wrap it. and more of those semantics and the semantics that that requires. is that whatever happens in the future, and that we as practitioners I can't thank you enough John Furrier, and the

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TristanPERSON

0.99+

George GilbertPERSON

0.99+

JohnPERSON

0.99+

GeorgePERSON

0.99+

Steve MullaneyPERSON

0.99+

KatiePERSON

0.99+

David FloyerPERSON

0.99+

CharlesPERSON

0.99+

Mike DooleyPERSON

0.99+

Peter BurrisPERSON

0.99+

ChrisPERSON

0.99+

Tristan HandyPERSON

0.99+

BobPERSON

0.99+

Maribel LopezPERSON

0.99+

Dave VellantePERSON

0.99+

Mike WolfPERSON

0.99+

VMwareORGANIZATION

0.99+

MerimPERSON

0.99+

Adrian CockcroftPERSON

0.99+

AmazonORGANIZATION

0.99+

BrianPERSON

0.99+

Brian RossiPERSON

0.99+

Jeff FrickPERSON

0.99+

Chris WegmannPERSON

0.99+

Whole FoodsORGANIZATION

0.99+

EricPERSON

0.99+

Chris HoffPERSON

0.99+

Jamak DaganiPERSON

0.99+

Jerry ChenPERSON

0.99+

CaterpillarORGANIZATION

0.99+

John WallsPERSON

0.99+

Marianna TesselPERSON

0.99+

JoshPERSON

0.99+

EuropeLOCATION

0.99+

JeromePERSON

0.99+

GoogleORGANIZATION

0.99+

Lori MacVittiePERSON

0.99+

2007DATE

0.99+

SeattleLOCATION

0.99+

10QUANTITY

0.99+

fiveQUANTITY

0.99+

Ali GhodsiPERSON

0.99+

Peter McKeePERSON

0.99+

NutanixORGANIZATION

0.99+

Eric HerzogPERSON

0.99+

IndiaLOCATION

0.99+

MikePERSON

0.99+

WalmartORGANIZATION

0.99+

five yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

Kit ColbertPERSON

0.99+

PeterPERSON

0.99+

DavePERSON

0.99+

Tanuja RanderyPERSON

0.99+

Brian Gracely, The Cloudcast | Does the World Really Need Supercloud?


 

(upbeat music) >> Welcome back to Supercloud 2 this is Dave Vellante. We're here exploring the intersection of data and analytics and the future of cloud. And in this segment, we're going to look at the evolution of cloud, and try to test some of the Supercloud concepts and assumptions with Brian Gracely, is the founder and co-host along with Aaron Delp of the popular Cloudcast program. Amazing series, if you're not already familiar with it. The Cloudcast is one of the best ways to keep up with so many things going on in our industry. Enterprise tech, platform engineering, business models, obviously, cloud developer trends, crypto, Web 3.0. Sorry Brian, I know that's a sore spot, but Brian, thanks for coming >> That's okay. >> on the program, really appreciate it. >> Yeah, great to be with you, Dave. Happy New Year, and great to be back with everybody with SiliconANGLE again this year. >> Yeah, we love having you on. We miss working with you day-to-day, but I want to start with Gracely's theorem, which basically says, I'm going to paraphrase. For the most part, nothing new gets introduced in the enterprise tech business, patterns repeat themselves, maybe get applied in new ways. And you know this industry well, when something comes out that's new, if you take virtualization, for example, been around forever with mainframes, but then VMware applied it, solve a real problem in the client service system. And then it's like, "Okay, this is awesome." We get really excited and then after a while we pushed the architecture, we break things, introduce new things to fix the things that are broken and start adding new features. And oftentimes you do that through acquisitions. So, you know, has the cloud become that sort of thing? And is Supercloud sort of same wine, new bottle, following Gracely's theorem? >> Yeah, I think there's some of both of it. I hate to be the sort of, it depends sort of answer but, I think to a certain extent, you know, obviously Cloud in and of itself was, kind of revolutionary in that, you know, it wasn't that you couldn't rent things in the past, it was just being able to do it at scale, being able to do it with such amazing self-service. And then, you know, kind of proliferation of like, look at how many services I can get from, from one cloud, whether it was Amazon or Azure or Google. And then, you know, we, we slip back into the things that we know, we go, "Oh, well, okay, now I can get computing on demand, but, now it's just computing." Or I can get database on demand and it's, you know, it's got some of the same limitations of, of say, of database, right? It's still, you know, I have to think about IOPS and I have to think about caching, and other stuff. So, I think we do go through that and then we, you know, we have these sort of next paradigms that come along. So, you know, serverless was another one of those where it was like, okay, it seems sort of new. I don't have to, again, it was another level of like, I don't have to think about anything. And I was able to do that because, you know, there was either greater bandwidth available to me, or compute got cheaper. And what's been interesting is not the sort of, that specific thing, serverless in and of itself is just another way of doing compute, but the fact that it now gets applied as, sort of a no-ops model to, you know, again, like how do I provision a database? How do I think about, you know, do I have to think about the location of a service? Does that just get taken care of for me? So I think the Supercloud concept, and I did a thing and, and you and I have talked about it, you know, behind the scenes that maybe the, maybe a better name is Super app for something like Snowflake or other, but I think we're, seeing these these sort of evolutions over and over again of what were the big bottlenecks? How do we, how do we solve those bottlenecks? And I think the big thing here is, it's never, it's very rarely that you can take the old paradigm of what the thing was, the concept was, and apply it to the new model. So, I'll just give you an example. So, you know, something like VMware, which we all know, wildly popular, wildly used, but when we apply like a Supercloud concept of VMware, the concept of VMware has always been around a cluster, right? It's some finite number of servers, you sort of manage it as a cluster. And when you apply that to the cloud and you say, okay, there's, you know, for example, VMware in the cloud, it's still the same concept of a cluster of VMware. But yet when you look at some of these other services that would fit more into the, you know, Supercloud kind of paradigm, whether it's a Snowflake or a MongoDB Atlas or maybe what CloudFlare is doing at the edge, those things get rid of some of those old paradigms. And I think that's where stuff, you start to go, "Oh, okay, this is very different than before." Yes, it's still computing or storage, or data access, but there's a whole nother level of something that we didn't carry forward from the previous days. And that really kind of breaks the paradigm. And so that's the way I think I've started to think about, are these things really brand new? Yes and no, but I think it's when you can see that big, that thing that you didn't leave behind isn't there anymore, you start to get some really interesting new innovation come out of it. >> Yeah. And that's why, you know, lift and shift is okay, when you talk to practitioners, they'll say, "You know, I really didn't change my operating model. And so I just kind of moved it into the cloud. there were some benefits, but it was maybe one zero not three zeros that I was looking for." >> Right. >> You know, we always talk about what's great about cloud, the agility, and all the other wonderful stuff that we know, what's not working in cloud, you know, tie it into multi-cloud, you know, in terms of, you hear people talk about multi-cloud by accident, okay, that's true. >> Yep. >> What's not great about cloud. And then I want to get into, you know, is multi-cloud really a problem or is it just sort of vendor hype? But, but what's not working in cloud? I mean, you mentioned serverless and serverless is kind of narrow, right, for a lot of stateless apps, right? But, what's not great about cloud? >> Well, I think there's a few things that if you ask most people they don't love about cloud. I think, we can argue whether or not sort of this consolidation around a few cloud providers has been a good thing or a bad thing. I think, regardless of that, you know, we are seeing, we are hearing more and more people that say, look, you know, the experience I used to have with cloud when I went to, for example, an Amazon and there was, you know, a dozen services, it was easy to figure out what was going on. It was easy to figure out what my billing looked like. You know, now they've become so widespread, the number of services they have, you know, the number of stories you just hear of people who went, "Oh, I started a service over in US West and I can't find it anymore 'cause it's on a different screen. And I, you know, I just got billed for it." Like, so I think the sprawl of some of the clouds has gotten, has created a user experience that a lot of people are frustrated with. I think that's one thing. And we, you know, we see people like Digital Ocean and we see others who are saying, "Hey, we're going to be that simplified version." So, there's always that yin and yang. I think people are super frustrated at network costs, right? So, you know, and that's kind of at a lot of, at the center of maybe why we do or don't see more of these Supercloud services is just, you know, in the data center as an application owner, I didn't have to think about, well where, where does this go to? Where are my users? Yes, somebody took care of it, but when those things become front and center, that's super frustrating. That's the one area that we've seen absolutely no cost savings, cost reduction. So I think that frustrates people a lot. And then I think the third piece is just, you know, we're, we went from super centralized IT organizations, which, you know, for decades was how it worked. It was part of the reason why the cloud expanded and became a thing, right? Sort of shadow IT and I can't get things done. And then, now what we've seen is sort of this proliferation of little pockets of groups that are your IT, for lack of a better thing, whether they're called platform engineering or SRE or DevOps. But we have this, expansion, explosion if you will, of groups that, if I'm an app dev team, I go, "Hey, you helped me make this stuff run, but then the team next to you has another group and they have another group." And so you see this explosion of, you know, we don't have any standards in the company anymore. And, so sort of self-service has created its own nightmare to a certain extent for a lot of larger companies. >> Yeah. Thank you for that. So, you know, I want, I want to explore this multi-cloud, you know, by accident thing and is a real problem. You hear that a lot from vendors and we've been talking about Supercloud as this unifying layer across cloud. You know, but when you talk to customers, a lot of them are saying, "Yes, we have multiple clouds in our organization, but my group, we have mono cloud, we know the security, edicts, we know how to, you know, deal with the primitives, whether it's, you know, S3 or Azure Blob or whatever it is. And we're very comfortable with this." It's, that's how we're simplifying. So, do you think this is really a problem? Does it have merit that we need that unifying layer across clouds, or is it just too early for that? >> I think, yeah, I think what you, what you've laid out is basically how the world has played out. People have picked a cloud for a specific application or a series of applications. Yeah, and I think if you talk to most companies, they would tell you, you know, holistically, yes, we're multi-cloud, not, maybe not necessarily on, I don't necessarily love the phrase where people say like, well it happened by accident. I think it happened on purpose, but we got to multi-cloud, not in the way that maybe that vendors, you know, perceived, you know, kind of laid out a map for. So it was, it was, well you will lay out this sort of Supercloud framework. We didn't call it that back then, we just called it sort of multi-cloud. Maybe it was Kubernetes or maybe it was whatever. And different groups, because central IT kind of got disbanded or got fragmented. It turned into, go pick the best cloud for your application, for what you need to do for the business. And then, you know, multiple years later it was like, "Oh, hold on, I've got 20% in Google and 50% in AWS and I've got 30% in Azure. And, you know, it's, yeah, it's been evolution. I don't know that it's, I don't know if it's a mistake. I think it's now groups trying to figure out like, should I make sense of it? You know, should I try and standardize and I backwards standardize some stuff? I think that's going to be a hard thing for, for companies to do. 'cause I think they feel okay with where the applications are. They just happen to be in multiple clouds. >> I want to run something by you, and you guys, you and Aaron have talked about this. You know, still depending on who, which keynote you listen to, small percentage of the workloads are actually in cloud. And when you were with us at Wikibon, I think we called it true private cloud, and we looked at things like Nutanix and there were a lot of other examples of companies that were trying to replicate the hyperscale experience on Prem. >> Yeah. >> And, we would evaluate that, you know, beyond virtualization, and so we sort of defined that and, but I think what's, maybe what's more interesting than Supercloud across clouds is if you include that, that on Prem estate, because that's where most of the work is being done, that's where a lot of the proprietary tools have been built, a lot of data, a lot of software. So maybe there's this concept of sending that true private cloud to true hybrid cloud. So I actually think hybrid cloud in some cases is the more interesting use case for so-called Supercloud. What are your thoughts on that? >> Yeah, I think there's a couple aspects too. I think, you know, if we were to go back five or six years even, maybe even a little further and look at like what a data center looked like, even if it was just, "Hey we're a data center that runs primarily on VMware. We use some of their automation". Versus what you can, even what you can do in your data center today. The, you know, the games that people have seen through new types of automation through Kubernetes, through get ops, and a number of these things, like they've gotten significantly further along in terms of I can provision stuff really well, I can do multi-tenancy, I can do self-service. Is it, you know, is it still hard? Yeah. Because those things are hard to do, but there's been significant progress there. I don't, you know, I still look for kind of that, that killer application, that sort of, you know, lighthouse use case of, hybrid applications, you know, between data center and between cloud. I think, you know, we see some stuff where, you know, backup is a part of it. So you use the cloud for storage, maybe you use the cloud for certain kinds of resiliency, especially on maybe front end load balancing and stuff. But I think, you know, I think what we get into is, this being hung up on hybrid cloud or multi-cloud as a term and go like, "Look, what are you trying to measure? Are you trying to measure, you know, efficiency of of of IT usage? Are you trying to measure how quickly can I give these business, you know, these application teams that are part of a line of business resources that they need?" I think if we start measuring that way, we would look at, you know, you'd go, "Wow, it used to be weeks and months. Now we got rid of these boards that have to review everything every time I want to do a change management type of thing." We've seen a lot more self-service. I think those are the things we want to measure on. And then to your point of, you know, where does, where do these Supercloud applications fit in? I think there are a bunch of instances where you go, "Look, I have a, you know, global application, I have a thing that has to span multiple regions." That's where the Supercloud concept really comes into play. We used to do it in the data center, right? We'd had all sorts of technologies to help with that, I think you can now start to do it in the cloud. >> You know, one of the other things, trying to understand, your thoughts on this, do you think that you, you again have talked about this, like I'm with you. It's like, how is it that Google's losing, you know, 3 billion dollars a year, whatever. I mean, because when you go back and look at Amazon, when they were at that level of revenue where Google is today, they were making money, you know, and they were actually growing faster, by the way. So it's kind of interesting what's happened with Google. But, the reason I bring that up is, trying to understand if you think the hyperscalers will ever be motivated to create standards across clouds, and that may be a play for Google. I mean, obviously with Kubernetes it was like a Hail Mary and kind of made them relevant. Where would Google be without Kubernetes? But then did it achieve the objectives? We could have that conversation some other time, but do you think the hyperscalers will actually say, "Okay, we're going to lean in and create these standards across clouds." Because customers would love that, I would think, but it would sub-optimize their competitive advantage. What are your thoughts? >> I think, you know, on the surface, I would say they, they probably aren't. I think if you asked 'em the question, they would say, "Well, you know, first and foremost, you know, we do deliver standards, so we deliver a, you know, standard SQL interface or a SQL you know, or a standard Kubernetes API or whatever. So, in that, from that perspective, you know, we're not locking you into, you know, an Amazon specific database, or a Google specific database." You, you can argue about that, but I think to a certain extent, like they've been very good about, "Hey, we're going to adopt the standards that people want." A lot of times the open source standards. I think the problem is, let's say they did come up with a standard for it. I think you still have the problem of the costs of migration and you know, the longer you've, I think their bet is basically the longer you've been in some cloud. And again, the more data you sort of compile there, the data gravity concept, there's just going to be a natural thing that says, okay, the hurdle to get over to say, "Look, we want to move this to another cloud", becomes so cost prohibitive that they don't really have to worry about, you know, oh, I'm going to get into a war of standards. And so far I think they sort of realize like that's the flywheel that the cloud creates. And you know, unless they want to get into a world where they just cut bandwidth costs, like it just kind of won't happen. You know, I think we've even seen, and you know, the one example I'll use, and I forget the name of it off the top of my head, but there's a, there's a Google service. I think it's like BigQuery external or something along those lines, that allows you to say, "Look, you can use BigQuery against like S3 buckets and against other stuff." And so I think the cloud providers have kind of figured out, I'm never going to get the application out of that other guy's cloud or you know, the other cloud. But maybe I'm going to have to figure out some interesting ways to sort of work with it. And, you know, it's a little bit, it's a little janky, but that might be, you know, a moderate step that sort of gets customers where they want to be. >> Yeah. Or you know, it'd be interesting if you ever see AWS for example, running its database in other clouds, you started, even Oracle is doing that with, with with Azure, which is a form of Supercloud. My last question for you is, I want to get you thinking about sort of how the future plays out. You know, think about some of the companies that we've put forth this Supercloud, and by the way, this has been a criticism of the concept. Charles Fitzer, "Everything is Supercloud!" Which if true would defeat the purpose of course. >> Right. >> And so right with the community effort, we really tried to put some guardrails down on the essential characteristics, the deployment models, you know, so for example, running across multiple clouds with a purpose build pass, creating a common experience, metadata intelligence that solves a specific problem. I mean, the example I often use is Snowflake's governed data sharing. But yeah, Snowflake, Databricks, CloudFlare, Cohesity, you know, I just mentioned Oracle and Azure, these and others, they certainly claim to have that common experience across clouds. But my question is, again, I come back to, do customers need this capability? You know, is Mono Cloud the way to solve that problem? What's your, what are your thoughts on how this plays out in the future of, I guess, PAs, apps and cloud? >> Yeah, I think a couple of things. So, from a technology perspective, I think, you know, the companies you name, the services you've named, have sort of proven that the concept is viable and it's viable at a reasonable size, right? These aren't completely niche businesses, right? They're multi-billion dollar businesses. So, I think there's a subset of applications that, you know, maybe a a bigger than a niche set of applications that are going to use these types of things. A lot of what you talked about is very data centric, and that's, that's fine. That's that layer is, figuring that out. I think we'll see messaging types of services, so like Derek Hallison's, Caya Company runs a, sort of a Supercloud for messaging applications. So I think there'll be places where it makes a ton of sense. I think, the thing that I'm not sure about, and because again, we've been now 10 plus years of sort of super low, you know, interest rates in terms of being able to do things, is a lot of these things come out of research that have been done previously. Then they get turned into maybe somewhat of an open source project, and then they can become something. You know, will we see as much investment into the next Snowflake if, you know, the interest rates are three or four times that they used to be, do we, do we see VCs doing it? So that's the part that worries me a little bit, is I think we've seen what's possible. I think, you know, we've seen companies like what those services are. I think I read yesterday Snowflake was saying like, their biggest customers are growing at 30, like 50 or 60%. Like the, value they get out of it is becoming exponential. And it's just a matter of like, will the economics allow the next big thing to happen? Because some of these things are pretty, pretty costly, you know, expensive to get started. So I'm bullish on the idea. I don't know that it becomes, I think it's okay that it's still sort of, you know, niche plus, plus in terms of the size of it. Because, you know, if we think about all of IT it's still, you know, even microservices is a small part of bigger things. But I'm still really bullish on the idea. I like that it's been proven. I'm a little wary, like a lot of people have the economics of, you know, what might slow things down a little bit. But yeah, I, think the future is going to involve Supercloud somewhere, whatever people end up calling it. And you and I discussed that. (laughs) But I don't, I don't think it goes away. I don't think it's, I don't think it's a fad. I think it is something that people see tremendous value and it's just, it's got to be, you know, for what you're trying to do, your application specific thing. >> You're making a great point on the funding of innovation and we're entering a new era of public policy as well. R and D tax credit is now is shifting. >> Yeah. >> You know, you're going to have to capitalize that over five years now. And that's something that goes back to the 1950s and many people would argue that's at least in part what has helped the United States be so, you know, competitive in tech. But Brian, always great to talk to you. Thanks so much for participating in the program. Great to see you. >> Thanks Dave, appreciate it. Good luck with the rest of the show. >> Thank you. All right, this is Dave Vellante for John Furrier, the entire Cube community. Stay tuned for more content from Supercloud2.

Published Date : Jan 4 2023

SUMMARY :

of the popular Cloudcast program. Yeah, great to be with you, Dave. So, you know, has the cloud I think to a certain extent, you know, when you talk to cloud, you know, tie it into you know, is multi-cloud And we, you know, So, you know, I want, I want And then, you know, multiple you and Aaron have talked about this. And, we would evaluate that, you know, But I think, you know, I money, you know, and I think, you know, on the is, I want to get you Cohesity, you know, I just of sort of super low, you know, on the funding of innovation the United States be so, you Good luck with the rest of the show. the entire Cube community.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Aaron DelpPERSON

0.99+

DavePERSON

0.99+

BrianPERSON

0.99+

Dave VellantePERSON

0.99+

Dave VellantePERSON

0.99+

Charles FitzerPERSON

0.99+

AmazonORGANIZATION

0.99+

Brian GracelyPERSON

0.99+

GoogleORGANIZATION

0.99+

Caya CompanyORGANIZATION

0.99+

30%QUANTITY

0.99+

50%QUANTITY

0.99+

AaronPERSON

0.99+

60%QUANTITY

0.99+

John FurrierPERSON

0.99+

20%QUANTITY

0.99+

threeQUANTITY

0.99+

AWSORGANIZATION

0.99+

50QUANTITY

0.99+

third pieceQUANTITY

0.99+

BigQueryTITLE

0.99+

1950sDATE

0.99+

10 plus yearsQUANTITY

0.99+

OracleORGANIZATION

0.99+

SnowflakeTITLE

0.99+

DatabricksORGANIZATION

0.99+

CohesityORGANIZATION

0.99+

SiliconANGLEORGANIZATION

0.99+

NutanixORGANIZATION

0.99+

WikibonORGANIZATION

0.98+

Digital OceanORGANIZATION

0.98+

SnowflakeORGANIZATION

0.98+

fiveQUANTITY

0.98+

SnowflakeEVENT

0.98+

30QUANTITY

0.98+

six yearsQUANTITY

0.98+

this yearDATE

0.98+

four timesQUANTITY

0.98+

yesterdayDATE

0.98+

US WestLOCATION

0.97+

todayDATE

0.97+

one thingQUANTITY

0.97+

over five yearsQUANTITY

0.97+

S3TITLE

0.96+

CloudFlareORGANIZATION

0.95+

Super appTITLE

0.94+

SupercloudORGANIZATION

0.94+

oneQUANTITY

0.93+

Supercloud2ORGANIZATION

0.93+

AzureORGANIZATION

0.92+

CloudFlareTITLE

0.91+

one areaQUANTITY

0.91+

bothQUANTITY

0.9+

a dozen servicesQUANTITY

0.9+

New YearEVENT

0.9+

MongoDB AtlasTITLE

0.89+

KubernetesTITLE

0.89+

VMwareTITLE

0.88+

SQLTITLE

0.88+

PremORGANIZATION

0.88+

firstQUANTITY

0.88+

multiple years laterDATE

0.88+

3 billion dollars a yearQUANTITY

0.86+

MaryTITLE

0.84+

AzureTITLE

0.84+

CubeORGANIZATION

0.83+

The CloudcastORGANIZATION

0.8+

one cloudQUANTITY

0.78+

Veronika Durgin, Saks | The Future of Cloud & Data


 

(upbeat music) >> Welcome back to Supercloud 2, an open collaborative where we explore the future of cloud and data. Now, you might recall last August at the inaugural Supercloud event we validated the technical feasibility and tried to further define the essential technical characteristics, and of course the deployment models of so-called supercloud. That is, sets of services that leverage the underlying primitives of hyperscale clouds, but are creating new value on top of those clouds for organizations at scale. So we're talking about capabilities that fundamentally weren't practical or even possible prior to the ascendancy of the public clouds. And so today at Supercloud 2, we're digging further into the topic with input from real-world practitioners. And we're exploring the intersection of data and cloud, And importantly, the realities and challenges of deploying technology for a new business capability. I'm pleased to have with me in our studios, west of Boston, Veronika Durgin, who's the head of data at Saks. Veronika, welcome. Great to see you. Thanks for coming on. >> Thank you so much. Thank you for having me. So excited to be here. >> And so we have to say upfront, you're here, these are your opinions. You're not representing Saks in any way. So we appreciate you sharing your depth of knowledge with us. >> Thank you, Dave. Yeah, I've been doing data for a while. I try not to say how long anymore. It's been a while. But yeah, thank you for having me. >> Yeah, you're welcome. I mean, one of the highlights of this past year for me was hanging out at the airport with you after the Snowflake Summit. And we were just chatting about sort of data mesh, and you were saying, "Yeah, but." There was a yeah, but. You were saying there's some practical realities of actually implementing these things. So I want to get into some of that. And I guess starting from a perspective of how data has changed, you've seen a lot of the waves. I mean, even if we go back to pre-Hadoop, you know, that would shove everything into an Oracle database, or, you know, Hadoop was going to save our data lives. And the cloud came along and, you know, that was kind of a disruptive force. And, you know, now we see things like, whether it's Snowflake or Databricks or these other platforms on top of the clouds. How have you observed the change in data and the evolution over time? >> Yeah, so I started as a DBA in the data center, kind of like, you know, growing up trying to manage whatever, you know, physical limitations a server could give us. So we had to be very careful of what we put in our database because we were limited. We, you know, purchased that piece of hardware, and we had to use it for the next, I don't know, three to five years. So it was only, you know, we focused on only the most important critical things. We couldn't keep too much data. We had to be super efficient. We couldn't add additional functionality. And then Hadoop came along, which is like, great, we can dump all the data there, but then we couldn't get data out of it. So it was like, okay, great. Doesn't help either. And then the cloud came along, which was incredible. I was probably the most excited person. I'm lying, but I was super excited because I no longer had to worry about what I can actually put in my database. Now I have that, you know, scalability and flexibility with the cloud. So okay, great, that data's there, and I can also easily get it out of it, which is really incredible. >> Well, but so, I'm inferring from what you're saying with Hadoop, it was like, okay, no schema on write. And then you got to try to make sense out of it. But so what changed with the cloud? What was different? >> So I'll tell a funny story. I actually successfully avoided Hadoop. The only time- >> Congratulations. >> (laughs) I know, I'm like super proud of it. I don't know how that happened, but the only time I worked for a company that had Hadoop, all I remember is that they were running jobs that were taking over 24 hours to get data out of it. And they were realizing that, you know, dumping data without any structure into this massive thing that required, you know, really skilled engineers wasn't really helpful. So what changed, and I'm kind of thinking of like, kind of like how Snowflake started, right? They were marketing themselves as a data warehouse. For me, moving from SQL Server to Snowflake was a non-event. It was comfortable, I knew what it was, I knew how to get data out of it. And I think that's the important part, right? Cloud, this like, kind of like, vague, high-level thing, magical, but the reality is cloud is the same as what we had on prem. So it's comfortable there. It's not scary. You don't need super new additional skills to use it. >> But you're saying what's different is the scale. So you can throw resources at it. You don't have to worry about depreciating your hardware over three to five years. Hey, I have an asset that I have to take advantage of. Is that the big difference? >> Absolutely. Actually, from kind of like operational perspective, which it's funny. Like, I don't have to worry about it. I use what I need when I need it. And not to take this completely in the opposite direction, people stop thinking about using things in a very smart way, right? You like, scale and you walk away. And then, you know, the cool thing about cloud is it's scalable, but you also should not use it when you don't need it. >> So what about this idea of multicloud. You know, supercloud sort of tries to go beyond multicloud. it's like multicloud by accident. And now, you know, whether it's M&A or, you know, some Skunkworks is do, hey, I like Google's tools, so I'm going to use Google. And then people like you are called on to, hey, how do we clean up this mess? And you know, you and I, at the airport, we were talking about data mesh. And I love the concept. Like, doesn't matter if it's a data lake or a data warehouse or a data hub or an S3 bucket. It's just a node on the mesh. But then, of course, you've got to govern it. You've got to give people self-serve. But this multicloud is a reality. So from your perspective, from a practitioner's perspective, what are the advantages of multicloud? We talk about the disadvantages all the time. Kind of get that, but what are the advantages? >> So I think the first thing when I think multicloud, I actually think high-availability disaster recovery. And maybe it's just how I grew up in the data center, right? We were always worried that if something happened in one area, we want to make sure that we can bring business up very quickly. So to me that's kind of like where multicloud comes to mind because, you know, you put your data, your applications, let's pick on AWS for a second and, you know, US East in AWS, which is the busiest kind of like area that they have. If it goes down, for my business to continue, I would probably want to move it to, say, Azure, hypothetically speaking, again, or Google, whatever that is. So to me, and probably again based on my background, disaster recovery high availability comes to mind as multicloud first, but now the other part of it is that there are, you know, companies and tools and applications that are being built in, you know, pick your cloud. How do we talk to each other? And more importantly, how do we data share? You know, I work with data. You know, this is what I do. So if, you know, I want to get data from a company that's using, say, Google, how do we share it in a smooth way where it doesn't have to be this crazy, I don't know, SFTP file moving. So that's where I think supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> So you kind of answered my next question, is do you see use cases going beyond H? I mean, the HADR was, remember, that was the original cloud use case. That and bursting, you know, for, you know, Thanksgiving or, you know, for Black Friday. So you see an opportunity to go beyond that with practical use cases. >> Absolutely. I think, you know, we're getting to a world where every company is a data company. We all collect a lot of data. We want to use it for whatever that is. It doesn't necessarily mean sell it, but use it to our competitive advantage. So how do we do it in a very smooth, easy way, which opens additional opportunities for companies? >> You mentioned data sharing. And that's obviously, you know, I met you at Snowflake Summit. That's a big thing of Snowflake's. And of course, you've got Databricks trying to do similar things with open technology. What do you see as the trade-offs there? Because Snowflake, you got to come into their party, you're in their world, and you're kind of locked into that world. Now they're trying to open up. You know, and of course, Databricks, they don't know our world is wide open. Well, we know what that means, you know. The governance. And so now you're seeing, you saw Amazon come out with data clean rooms, which was, you know, that was a good idea that Snowflake had several years before. It's good. It's good validation. So how do you think about the trade-offs between kind of openness and freedom versus control? Is the latter just far more important? >> I'll tell you it depends, right? It's kind of like- >> Could be insulting to that. >> Yeah, I know. It depends because I don't know the answer. It depends, I think, because on the use case and application, ultimately every company wants to make money. That's the beauty of our like, capitalistic economy, right? We're driven 'cause we want to make money. But from the use, you know, how do I sell a product to somebody who's in Google if I am in AWS, right? It's like, we're limiting ourselves if we just do one cloud. But again, it's difficult because at the same time, every cloud provider wants for you to be locked in their cloud, which is why probably, you know, whoever has now data sharing because they want you to stay within their ecosystem. But then again, like, companies are limited. You know, there are applications that are starting to be built on top of clouds. How do we ensure that, you know, I can use that application regardless what cloud, you know, my company is using or I just happen to like. >> You know, and it's true they want you to stay in their ecosystem 'cause they'll make more money. But as well, you think about Apple, right? Does Apple do it 'cause they can make more money? Yes, but it's also they have more control, right? Am I correct that technically it's going to be easier to govern that data if it's all the sort of same standard, right? >> Absolutely. 100%. I didn't answer that question. You have to govern and you have to control. And honestly, it's like it's not like a nice-to-have anymore. There are compliances. There are legal compliances around data. Everybody at some point wants to ensure that, you know, and as a person, quite honestly, you know, not to be, you know, I don't like when my data's used when I don't know how. Like, it's a little creepy, right? So we have to come up with standards around that. But then I also go back in the day. EDI, right? Electronic data interchange. That was figured out. There was standards. Companies were sending data to each other. It was pretty standard. So I don't know. Like, we'll get there. >> Yeah, so I was going to ask you, do you see a day where open standards actually emerge to enable that? And then isn't that the great disruptor to sort of kind of the proprietary stack? >> I think so. I think for us to smoothly exchange data across, you know, various systems, various applications, we'll have to agree to have standards. >> From a developer perspective, you know, back to the sort of supercloud concept, one of the the components of the essential characteristics is you've got this PaaS layer that provides consistency across clouds, and it has unique attributes specific to the purpose of that supercloud. So in the instance of Snowflake, it's data sharing. In the case of, you know, VMware, it might be, you know, infrastructure or self-serve infrastructure that's consistent. From a developer perspective, what do you hear from developers in terms of what they want? Are we close to getting that across clouds? >> I think developers always want freedom and ability to engineer. And oftentimes it's not, (laughs) you know, just as an engineer, I always want to build something, and it's not always for the, to use a specific, you know, it's something I want to do versus what is actually applicable. I think we'll land there, but not because we are, you know, out of the kindness of our own hearts. I think as a necessity we will have to agree to standards, and that that'll like, move the needle. Yeah. >> What are the limitations that you see of cloud and this notion of, you know, even cross cloud, right? I mean, this one cloud can't do it all. You know, but what do you see as the limitations of clouds? >> I mean, it's funny, I always think, you know, again, kind of probably my background, I grew up in the data center. We were physically limited by space, right? That there's like, you can only put, you know, so many servers in the rack and, you know, so many racks in the data center, and then you run out space. Earth has a limited space, right? And we have so many data centers, and everybody's collecting a lot of data that we actually want to use. We're not just collecting for the sake of collecting it anymore. We truly can't take advantage of it because servers have enough power, right, to crank through it. We will run enough space. So how do we balance that? How do we balance that data across all the various data centers? And I know I'm like, kind of maybe talking crazy, but until we figure out how to build a data center on the Moon, right, like, we will have to figure out how to take advantage of all the compute capacity that we have across the world. >> And where does latency fit in? I mean, is it as much of a problem as people sort of think it is? Maybe it depends too. It depends on the use case. But do multiple clouds help solve that problem? Because, you know, even AWS, $80 billion company, they're huge, but they're not everywhere. You know, they're doing local zones, they're doing outposts, which is, you know, less functional than their full cloud. So maybe I would choose to go to another cloud. And if I could have that common experience, that's an advantage, isn't it? >> 100%, absolutely. And potentially there's some maybe pricing tiers, right? So we're talking about latency. And again, it depends on your situation. You know, if you have some sort of medical equipment that is very latency sensitive, you want to make sure that data lives there. But versus, you know, I browse on a website. If the website takes a second versus two seconds to load, do I care? Not exactly. Like, I don't notice that. So we can reshuffle that in a smart way. And I keep thinking of ways. If we have ways for data where it kind of like, oh, you are stuck in traffic, go this way. You know, reshuffle you through that data center. You know, maybe your data will live there. So I think it's totally possible. I know, it's a little crazy. >> No, I like it, though. But remember when you first found ways, you're like, "Oh, this is awesome." And then now it's like- >> And it's like crowdsourcing, right? Like, it's smart. Like, okay, maybe, you know, going to pick on US East for Amazon for a little bit, their oldest, but also busiest data center that, you know, periodically goes down. >> But then you lose your competitive advantage 'cause now it's like traffic socialism. >> Yeah, I know. >> Right? It happened the other day where everybody's going this way up. There's all the Wazers taking. >> And also again, compliance, right? Every country is going down the path of where, you know, data needs to reside within that country. So it's not as like, socialist or democratic as we wish for it to be. >> Well, that's a great point. I mean, when you just think about the clouds, the limitation, now you go out to the edge. I mean, everybody talks about the edge in IoT. Do you actually think that there's like a whole new stove pipe that's going to get created. And does that concern you, or do you think it actually is going to be, you know, connective tissue with all these clouds? >> I honestly don't know. I live in a practical world of like, how does it help me right now? How does it, you know, help me in the next five years? And mind you, in five years, things can change a lot. Because if you think back five years ago, things weren't as they are right now. I mean, I really hope that somebody out there challenges things 'cause, you know, the whole cloud promise was crazy. It was insane. Like, who came up with it? Why would I do that, right? And now I can't imagine the world without it. >> Yeah, I mean a lot of it is same wine, new bottle. You know, but a lot of it is different, right? I mean, technology keeps moving us forward, doesn't it? >> Absolutely. >> Veronika, it was great to have you. Thank you so much for your perspectives. If there was one thing that the industry could do for your data life that would make your world better, what would it be? >> I think standards for like data sharing, data marketplace. I would love, love, love nothing else to have some agreed upon standards. >> I had one other question for you, actually. I forgot to ask you this. 'Cause you were saying every company's a data company. Every company's a software company. We're already seeing it, but how prevalent do you think it will be that companies, you've seen some of it in financial services, but companies begin to now take their own data, their own tooling, their own software, which they've developed internally, and point that to the outside world? Kind of do what AWS did. You know, working backwards from the customer and saying, "Hey, we did this for ourselves. We can now do this for the rest of the world." Do you see that as a real trend, or is that Dave's pie in the sky? >> I think it's a real trend. Every company's trying to reinvent themselves and come up with new products. And every company is a data company. Every company collects data, and they're trying to figure out what to do with it. And again, it's not necessarily to sell it. Like, you don't have to sell data to monetize it. You can use it with your partners. You can exchange data. You know, you can create products. Capital One I think created a product for Snowflake pricing. I don't recall, but it just, you know, they built it for themselves, and they decided to kind of like, monetize on it. And I'm absolutely 100% on board with that. I think it's an amazing idea. >> Yeah, Goldman is another example. Nasdaq is basically taking their exchange stack and selling it around the world. And the cloud is available to do that. You don't have to build your own data center. >> Absolutely. Or for good, right? Like, we're talking about, again, we live in a capitalist country, but use data for good. We're collecting data. We're, you know, analyzing it, we're aggregating it. How can we use it for greater good for the planet? >> Veronika, thanks so much for coming to our Marlborough studios. Always a pleasure talking to you. >> Thank you so much for having me. >> You're really welcome. All right, stay tuned for more great content. From Supercloud 2, this is Dave Vellante. We'll be right back. (upbeat music)

Published Date : Dec 27 2022

SUMMARY :

and of course the deployment models Thank you so much. So we appreciate you sharing your depth But yeah, thank you for having me. And the cloud came along and, you know, So it was only, you know, And then you got to try I actually successfully avoided Hadoop. you know, dumping data So you can throw resources at it. And then, you know, the And you know, you and I, at the airport, to mind because, you know, That and bursting, you know, I think, you know, And that's obviously, you know, But from the use, you know, You know, and it's true they want you to ensure that, you know, you know, various systems, In the case of, you know, VMware, but not because we are, you know, and this notion of, you know, can only put, you know, which is, you know, less But versus, you know, But remember when you first found ways, Like, okay, maybe, you know, But then you lose your It happened the other day the path of where, you know, is going to be, you know, How does it, you know, help You know, but a lot of Thank you so much for your perspectives. to have some agreed upon standards. I forgot to ask you this. I don't recall, but it just, you know, And the cloud is available to do that. We're, you know, analyzing Always a pleasure talking to you. From Supercloud 2, this is Dave Vellante.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

VeronikaPERSON

0.99+

Veronika DurginPERSON

0.99+

AWSORGANIZATION

0.99+

AppleORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

100%QUANTITY

0.99+

two secondsQUANTITY

0.99+

SaksORGANIZATION

0.99+

$80 billionQUANTITY

0.99+

AmazonORGANIZATION

0.99+

threeQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

last AugustDATE

0.99+

Capital OneORGANIZATION

0.99+

OracleORGANIZATION

0.99+

M&AORGANIZATION

0.99+

SkunkworksORGANIZATION

0.99+

five yearsQUANTITY

0.99+

NasdaqORGANIZATION

0.98+

Supercloud 2EVENT

0.98+

EarthLOCATION

0.98+

DatabricksORGANIZATION

0.98+

SupercloudEVENT

0.98+

todayDATE

0.98+

Snowflake SummitEVENT

0.98+

US EastLOCATION

0.98+

five years agoDATE

0.97+

SQL ServerTITLE

0.97+

first thingQUANTITY

0.96+

BostonLOCATION

0.95+

Black FridayEVENT

0.95+

HadoopTITLE

0.95+

over 24 hoursQUANTITY

0.95+

oneQUANTITY

0.94+

firstQUANTITY

0.94+

supercloudORGANIZATION

0.94+

one thingQUANTITY

0.93+

MoonLOCATION

0.93+

ThanksgivingEVENT

0.93+

over threeQUANTITY

0.92+

one other questionQUANTITY

0.91+

one cloudQUANTITY

0.9+

one areaQUANTITY

0.9+

SnowflakeTITLE

0.89+

multicloudORGANIZATION

0.86+

AzureORGANIZATION

0.85+

Supercloud 2ORGANIZATION

0.83+

> 100%QUANTITY

0.82+

GoldmanORGANIZATION

0.81+

SnowflakeEVENT

0.8+

a secondQUANTITY

0.73+

several years beforeDATE

0.72+

this past yearDATE

0.71+

secondQUANTITY

0.7+

MarlboroughLOCATION

0.7+

supercloudTITLE

0.66+

next five yearsDATE

0.65+

multicloudTITLE

0.59+

PaaSTITLE

0.55+

Kevin Miller and Ed Walsh | AWS re:Invent 2022 - Global Startup Program


 

hi everybody welcome back to re invent 2022. this is thecube's exclusive coverage we're here at the satellite set it's up on the fifth floor of the Venetian Conference Center and this is part of the global startup program the AWS startup showcase series that we've been running all through last year and and into this year with AWS and featuring some of its its Global Partners Ed wallson series the CEO of chaos search many times Cube Alum and Kevin Miller there's also a cube Alum vice president GM of S3 at AWS guys good to see you again yeah great to see you Dave hi Kevin this is we call this our Super Bowl so this must be like your I don't know uh World Cup it's a pretty big event yeah it's the World Cup for sure yeah so a lot of S3 talk you know I mean that's what got us all started in 2006 so absolutely what's new in S3 yeah it's been a great show we've had a number of really interesting launches over the last few weeks and a few at the show as well so you know we've been really focused on helping customers that are running Mass scale data Lakes including you know whether it's structured or unstructured data we actually announced just a few just an hour ago I think it was a new capability to give customers cross-account access points for sharing data securely with other parts of the organization and that's something that we'd heard from customers is as they are growing and have more data sets and they're looking to to get more out of their data they are increasingly looking to enable multiple teams across their businesses to access those data sets securely and that's what we provide with cross-count access points we also launched yesterday our multi-region access point failover capabilities and so again this is where customers have data sets and they're using multiple regions for certain critical workloads they're now able to to use that to fail to control the failover between different regions in AWS and then one other launch I would just highlight is some improvements we made to storage lens which is our really a very novel and you need capability to help customers really understand what storage they have where who's accessing it when it's being accessed and we added a bunch of new metrics storage lens has been pretty exciting for a lot of customers in fact we looked at the data and saw that customers who have adopted storage lens typically within six months they saved more than six times what they had invested in turning storage lens on and certainly in this environment right now we have a lot of customers who are it's pretty top of mind they're looking for ways to optimize their their costs in the cloud and take some of those savings and be able to reinvest them in new innovation so pretty exciting with the storage lens launch I think what's interesting about S3 is that you know pre-cloud Object Store was this kind of a niche right and then of course you guys announced you know S3 in 2006 as I said and okay great you know cheap and deep storage simple get put now the conversations about how to enable value from from data absolutely analytics and it's just a whole new world and Ed you've talked many times I love the term yeah we built chaos search on the on the shoulders of giants right and so the under underlying that is S3 but the value that you can build on top of that has been key and I don't think we've talked about his shoulders and Giants but we've talked about how we literally you know we have a big Vision right so hard to kind of solve the challenge to analytics at scale we really focus on the you know the you know Big Data coming environment get analytics so we talk about the on the shoulders Giants obviously Isaac Newton's you know metaphor of I learned from everything before and we layer on top so really when you talk about all the things come from S3 like I just smile because like we picked it up naturally we went all in an S3 and this is where I think you're going Dave but everyone is so let's just cut the chase like so any of the data platforms you're using S3 is what you're building but we did it a little bit differently so at first people using a cold storage like you said and then they ETL it up into a different platforms for analytics of different sorts now people are using it closer they're doing caching layers and cashing out and they're that's where but that's where the attributes of a scale or reliability are what we did is we actually make S3 a database so literally we have no persistence outside that three and that kind of comes in so it's working really well with clients because most of the thing is we pick up all these attributes of scale reliability and it shows up in the clients environments and so when you launch all these new scalable things we just see it like our clients constantly comment like one of our biggest customers fintech in uh Europe they go to Black Friday again black Friday's not one days and they lose scale from what is it 58 terabytes a day and they're going up to 187 terabytes a day and we don't Flinch they say how do you do that well we built our platform on S3 as long as you can stream it to S3 so they're saying I can't overrun S3 and it's a natural play so it's it's really nice that but we take out those attributes but same thing that's why we're able to you know help clients get you know really you know Equifax is a good example maybe they're able to consolidate 12 their divisions on one platform we couldn't have done that without the scale and the performance of what you can get S3 but also they saved 90 I'm able to do that but that's really because the only persistence is S3 and what you guys are delivering but and then we really for focus on shoulders Giants we're doing on top of that innovating on top of your platforms and bringing that out so things like you know we have a unique data representation that makes it easy to ingest this data because it's kind of coming at you four v's of big data we allow you to do that make it performant on s3h so now you're doing hot analytics on S3 as if it's just a native database in memory but there's no memory SSC caching and then multi-model once you get it there don't move it leverage it in place so you know elasticsearch access you know Cabana grafana access or SQL access with your tools so we're seeing that constantly but we always talk about on the shoulders of giants but even this week I get comments from our customers like how did you do that and most of it is because we built on top of what you guys provided so it's really working out pretty well and you know we talk a lot about digital transformation of course we had the pleasure sitting down with Adam solipski prior John Furrier flew to Seattle sits down his annual one-on-one with the AWS CEO which is kind of cool yeah it was it's good it's like study for the test you know and uh and so but but one of the interesting things he said was you know we're one of our challenges going forward is is how do we go Beyond digital transformation into business transformation like okay well that's that's interesting I was talking to a customer today AWS customer and obviously others because they're 100 year old company and they're basically their business was they call them like the Uber for for servicing appliances when your Appliance breaks you got to get a person to serve it a service if it's out of warranty you know these guys do that so they got to basically have a you know a network of technicians yeah and they gotta deal with the customers no phone right so they had a completely you know that was a business transformation right they're becoming you know everybody says they're coming a software company but they're building it of course yeah right on the cloud so wonder if you guys could each talk about what's what you're seeing in terms of changing not only in the sort of I.T and the digital transformation but also the business transformation yeah I know I I 100 agree that I think business transformation is probably that one of the top themes I'm hearing from customers of all sizes right now even in this environment I think customers are looking for what can I do to drive top line or you know improve bottom line or just improve my customer experience and really you know sort of have that effect where I'm helping customers get more done and you know it is it is very tricky because to do that successfully the customers that are doing that successfully I think are really getting into the lines of businesses and figuring out you know it's probably a different skill set possibly a different culture different norms and practices and process and so it's it's a lot more than just a like you said a lot more than just the technology involved but when it you know we sort of liquidate it down into the data that's where absolutely we see that as a critical function for lines of businesses to become more comfortable first off knowing what data sets they have what data they they could access but possibly aren't today and then starting to tap into those data sources and then as as that progresses figuring out how to share and collaborate with data sets across a company to you know to correlate across those data sets and and drive more insights and then as all that's being done of course it's important to measure the results and be able to really see is this what what effect is this having and proving that effect and certainly I've seen plenty of customers be able to show you know this is a percentage increase in top or bottom line and uh so that pattern is playing out a lot and actually a lot of how we think about where we're going with S3 is related to how do we make it easier for customers to to do everything that I just described to have to understand what data they have to make it accessible and you know it's great to have such a great ecosystem of partners that are then building on top of that and innovating to help customers connect really directly with the businesses that they're running and driving those insights well and customers are hours today one of the things I loved that Adam said he said where Amazon is strategically very very patient but tactically we're really impatient and the customers out there like how are you going to help me increase Revenue how are you going to help me cut costs you know we were talking about how off off camera how you know software can actually help do that yeah it's deflationary I love the quote right so software's deflationary as costs come up how do you go drive it also free up the team and you nail it it's like okay everyone wants to save money but they're not putting off these projects in fact the digital transformation or the business it's actually moving forward but they're getting a little bit bigger but everyone's looking for creative ways to look at their architecture and it becomes larger larger we talked about a couple of those examples but like even like uh things like observability they want to give this tool set this data to all the developers all their sres same data to all the security team and then to do that they need to find a way an architect should do that scale and save money simultaneously so we see constantly people who are pairing us up with some of these larger firms like uh or like keep your data dog keep your Splunk use us to reduce the cost that one and one is actually cheaper than what you have but then they use it either to save money we're saving 50 to 80 hard dollars but more importantly to free up your team from the toil and then they they turn around and make that budget neutral and then allowed to get the same tools to more people across the org because they're sometimes constrained of getting the access to everyone explain that a little bit more let's say I got a Splunk or data dog I'm sifting through you know logs how exactly do you help so it's pretty simple I'll use dad dog example so let's say using data dog preservability so it's just your developers your sres managing environments all these platforms are really good at being a monitoring alerting type of tool what they're not necessarily great at is keeping the data for longer periods like the log data the bigger data that's where we're strong what you see is like a data dog let's say you're using it for a minister for to keep 30 days of logs which is not enough like let's say you're running environment you're finding that performance issue you kind of want to look to last quarter in last month in or maybe last Black Friday so 30 days is not enough but will charge you two eighty two dollars and eighty cents a gigabyte don't focus on just 280 and then if you just turn the knob and keep seven days but keep two years of data on us which is on S3 it goes down to 22 cents plus our list price of 80 cents goes to a dollar two compared to 280. so here's the thing what they're able to do is just turn a knob get more data we do an integration so you can go right from data dog or grafana directly into our platform so the user doesn't see it but they save money A lot of times they don't just save the money now they use that to go fund and get data dog to a lot more people make sense so it's a creativity they're looking at it and they're looking at tools we see the same thing with a grafana if you look at the whole grafana play which is hey you can't put it in one place but put Prometheus for metrics or traces we fit well with logs but they're using that to bring down their costs because a lot of this data just really bogs down these applications the alerting monitoring are good at small data they're not good at the big data which is what we're really good at and then the one and one is actually less than you paid for the one so it and it works pretty well so things are really unpredictable right now in the economy you know during the pandemic we've sort of lockdown and then the stock market went crazy we're like okay it's going to end it's going to end and then it looked like it was going to end and then it you know but last year it reinvented just just in that sweet spot before Omicron so we we tucked it in which which was awesome right it was a great great event we really really missed one physical reinvent you know which was very rare so that's cool but I've called it the slingshot economy it feels like you know you're driving down the highway and you got to hit the brakes and then all of a sudden you're going okay we're through it Oh no you're gonna hit the brakes again yeah so it's very very hard to predict and I was listening to jassy this morning he was talking about yeah consumers they're still spending but what they're doing is they're they're shopping for more features they might be you know buying a TV that's less expensive you know more value for the money so okay so hopefully the consumer spending will get us out of this but you don't really know you know and I don't yeah you know we don't seem to have the algorithms we've never been through something like this before so what are you guys seeing in terms of customer Behavior given that uncertainty well one thing I would highlight that I think particularly going back to what we were just talking about as far as business and digital transformation I think some customers are still appreciating the fact that where you know yesterday you may have had to to buy some Capital put out some capital and commit to something for a large upfront expenditure is that you know today the value of being able to experiment and scale up and then most importantly scale down and dynamically based on is the experiment working out am I seeing real value from it and doing that on a time scale of a day or a week or a few months that is so important right now because again it gets to I am looking for a ways to innovate and to drive Top Line growth but I I can't commit to a multi-year sort of uh set of costs to to do that so and I think plenty of customers are finding that even a few months of experimentation gives them some really valuable insight as far as is this going to be successful or not and so I think that again just of course with S3 and storage from day one we've been elastic pay for what you use if you're not using the storage you don't get charged for it and I think that particularly right now having the applications and the rest of the ecosystem around the storage and the data be able to scale up and scale down is is just ever more important and when people see that like typically they're looking to do more with it so if they find you usually find these little Department projects but they see a way to actually move faster and save money I think it is a mix of those two they're looking to expand it which can be a nightmare for sales Cycles because they take longer but people are looking well why don't you leverage this and go across division so we do see people trying to leverage it because they're still I don't think digital transformation is slowing down but a lot more to be honest a lot more approvals at this point for everything it is you know Adam and another great quote in his in his keynote he said if you want to save money the Cloud's a place to do it absolutely and I read an article recently and I was looking through and I said this is the first time you know AWS has ever seen a downturn because the cloud was too early back then I'm like you weren't paying attention in 2008 because that was the first major inflection point for cloud adoption where CFO said okay stop the capex we're going to Opex and you saw the cloud take off and then 2010 started this you know amazing cycle that we really haven't seen anything like it where they were doubling down in Investments and they were real hardcore investment it wasn't like 1998 99 was all just going out the door for no clear reason yeah so that Foundation is now in place and I think it makes a lot of sense and it could be here for for a while where people are saying Hey I want to optimize and I'm going to do that on the cloud yeah no I mean I've obviously I certainly agree with Adam's quote I think really that's been in aws's DNA from from day one right is that ability to scale costs with with the actual consumption and paying for what you use and I think that you know certainly moments like now are ones that can really motivate change in an organization in a way that might not have been as palatable when it just it didn't feel like it was as necessary yeah all right we got to go give you a last word uh I think it's been a great event I love all your announcements I think this is wonderful uh it's been a great show I love uh in fact how many people are here at reinvent north of 50 000. yeah I mean I feel like it was it's as big if not bigger than 2019. people have said ah 2019 was a record when you count out all the professors I don't know it feels it feels as big if not bigger so there's great energy yeah it's quite amazing and uh and we're thrilled to be part of it guys thanks for coming on thecube again really appreciate it face to face all right thank you for watching this is Dave vellante for the cube your leader in Enterprise and emerging Tech coverage we'll be right back foreign

Published Date : Dec 7 2022

SUMMARY :

across a company to you know to

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Ed WalshPERSON

0.99+

Kevin MillerPERSON

0.99+

two yearsQUANTITY

0.99+

2006DATE

0.99+

2008DATE

0.99+

seven daysQUANTITY

0.99+

AdamPERSON

0.99+

AmazonORGANIZATION

0.99+

30 daysQUANTITY

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

50QUANTITY

0.99+

Adam solipskiPERSON

0.99+

Dave vellantePERSON

0.99+

twoQUANTITY

0.99+

eighty centsQUANTITY

0.99+

EuropeLOCATION

0.99+

22 centsQUANTITY

0.99+

KevinPERSON

0.99+

80 centsQUANTITY

0.99+

SeattleLOCATION

0.99+

12QUANTITY

0.99+

2010DATE

0.99+

Isaac NewtonPERSON

0.99+

DavePERSON

0.99+

Super BowlEVENT

0.99+

a dayQUANTITY

0.99+

Venetian Conference CenterLOCATION

0.99+

fifth floorQUANTITY

0.99+

UberORGANIZATION

0.99+

World CupEVENT

0.99+

last yearDATE

0.99+

last quarterDATE

0.99+

yesterdayDATE

0.99+

S3TITLE

0.99+

last monthDATE

0.99+

more than six timesQUANTITY

0.99+

2019DATE

0.98+

PrometheusTITLE

0.98+

six monthsQUANTITY

0.98+

280QUANTITY

0.98+

pandemicEVENT

0.98+

Black FridayEVENT

0.97+

an hour agoDATE

0.97+

todayDATE

0.97+

58 terabytes a dayQUANTITY

0.97+

100 year oldQUANTITY

0.97+

this morningDATE

0.97+

a weekQUANTITY

0.97+

Ed wallsonPERSON

0.97+

threeQUANTITY

0.96+

EquifaxORGANIZATION

0.96+

jassyPERSON

0.96+

one platformQUANTITY

0.96+

this yearDATE

0.96+

grafanaTITLE

0.96+

one daysQUANTITY

0.95+

first timeQUANTITY

0.95+

oneQUANTITY

0.95+

black FridayEVENT

0.93+

this weekDATE

0.92+

first major inflectionQUANTITY

0.91+

one placeQUANTITY

0.91+

SQLTITLE

0.9+

lastDATE

0.89+

StoreTITLE

0.89+

Tomer Shiran, Dremio | AWS re:Invent 2022


 

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

Paul GillenPERSON

0.99+

Paul GillinPERSON

0.99+

AmazonORGANIZATION

0.99+

TomerPERSON

0.99+

Tomer ShiranPERSON

0.99+

ToronPERSON

0.99+

Las VegasLOCATION

0.99+

70%QUANTITY

0.99+

Monday nightDATE

0.99+

VegasLOCATION

0.99+

fourth dayQUANTITY

0.99+

PaulPERSON

0.99+

last yearDATE

0.99+

AWSORGANIZATION

0.99+

dozensQUANTITY

0.99+

GoogleORGANIZATION

0.99+

10 different teamsQUANTITY

0.99+

DremioPERSON

0.99+

early this yearDATE

0.99+

SQL Query EngineTITLE

0.99+

The CubeTITLE

0.99+

TuesdayDATE

0.99+

2023DATE

0.99+

oneQUANTITY

0.98+

a year agoDATE

0.98+

next quarterDATE

0.98+

S3TITLE

0.98+

a quarter agoDATE

0.98+

twiceQUANTITY

0.98+

OracleORGANIZATION

0.98+

second thingQUANTITY

0.98+

DrioORGANIZATION

0.98+

couple days agoDATE

0.98+

bothQUANTITY

0.97+

DRIOORGANIZATION

0.97+

2022DATE

0.97+

Lake HouseORGANIZATION

0.96+

thousands of peopleQUANTITY

0.96+

WednesdayDATE

0.96+

SparkTITLE

0.96+

200 xQUANTITY

0.96+

firstQUANTITY

0.96+

DrioTITLE

0.95+

DremeoORGANIZATION

0.95+

two optionsQUANTITY

0.94+

about three hoursQUANTITY

0.94+

day twoQUANTITY

0.94+

s3TITLE

0.94+

Apache IcebergORGANIZATION

0.94+

a minute agoDATE

0.94+

Silicon AngleORGANIZATION

0.94+

hundred percentQUANTITY

0.93+

ApacheORGANIZATION

0.93+

single teamQUANTITY

0.93+

GitHubORGANIZATION

0.91+

this morningDATE

0.9+

a dozen different enginesQUANTITY

0.89+

IcebergTITLE

0.87+

RedshiftTITLE

0.87+

lastDATE

0.87+

this yearDATE

0.86+

first interactionQUANTITY

0.85+

two flavorsQUANTITY

0.84+

ThursdayDATE

0.84+

AzureORGANIZATION

0.84+

DR. CloudORGANIZATION

0.84+

SQL ServerTITLE

0.83+

four conferencesQUANTITY

0.82+

coming yearDATE

0.82+

over over a dozen different businessQUANTITY

0.81+

one vendorQUANTITY

0.8+

PolyORGANIZATION

0.79+

JamarPERSON

0.77+

GIORGANIZATION

0.77+

InreORGANIZATION

0.76+

Dr.ORGANIZATION

0.73+

Lake houseORGANIZATION

0.71+

ArcticORGANIZATION

0.71+

a yearQUANTITY

0.7+

a minuteQUANTITY

0.7+

SQLTITLE

0.69+

AWS Reinvent 2022EVENT

0.69+

subsecondQUANTITY

0.68+

DMLTITLE

0.68+

Itamar Ankorion, Qlik & Peter MacDonald, Snowflake | AWS re:Invent 2022


 

(upbeat music) >> Hello, welcome back to theCUBE's AWS RE:Invent 2022 Coverage. I'm John Furrier, host of theCUBE. Got a great lineup here, Itamar Ankorion SVP Technology Alliance at Qlik and Peter McDonald, vice President, cloud partnerships and business development Snowflake. We're going to talk about bringing SAP data to life, for joint Snowflake, Qlik and AWS Solution. Gentlemen, thanks for coming on theCUBE Really appreciate it. >> Thank you. >> Thank you, great meeting you John. >> Just to get started, introduce yourselves to the audience, then going to jump into what you guys are doing together, unique relationship here, really compelling solution in cloud. Big story about applications and scale this year. Let's introduce yourselves. Peter, we'll start with you. >> Great. I'm Peter MacDonald. I am vice president of Cloud Partners and business development here at Snowflake. On the Cloud Partner side, that means I manage AWS relationship along with Microsoft and Google Cloud. What we do together in terms of complimentary products, GTM, co-selling, things like that. Importantly, working with other third parties like Qlik for joint solutions. On business development, it's negotiating custom commercial partnerships, large companies like Salesforce and Dell, smaller companies at most for our venture portfolio. >> Thanks Peter and hi John. It's great to be back here. So I'm Itamar Ankorion and I'm the senior vice president responsible for technology alliances here at Qlik. With that, own strategic alliances, including our key partners in the cloud, including Snowflake and AWS. I've been in the data and analytics enterprise software market for 20 plus years, and my main focus is product management, marketing, alliances, and business development. I joined Qlik about three and a half years ago through the acquisition of Attunity, which is now the foundation for Qlik data integration. So again, we focus in my team on creating joint solution alignment with our key partners to provide more value to our customers. >> Great to have both you guys, senior executives in the industry on theCUBE here, talking about data, obviously bringing SAP data to life is the theme of this segment, but this reinvent, it's all about the data, big data end-to-end story, a lot about data being intrinsic as the CEO says on stage around in the organizations in all aspects. Take a minute to explain what you guys are doing as from a company standpoint. Snowflake and Qlik and the solutions, why here at AWS? Peter, we'll start with you at Snowflake, what you guys do as a company, your mission, your focus. >> That was great, John. Yeah, so here at Snowflake, we focus on the data platform and until recently, data platforms required expensive on-prem hardware appliances. And despite all that expense, customers had capacity constraints, inexpensive maintenance, and had limited functionality that all impeded these organizations from reaching their goals. Snowflake is a cloud native SaaS platform, and we've become so successful because we've addressed these pain points and have other new special features. For example, securely sharing data across both the organization and the value chain without copying the data, support for new data types such as JSON and structured data, and also advance in database data governance. Snowflake integrates with complimentary AWS services and other partner products. So we can enable holistic solutions that include, for example, here, both Qlik and AWS SageMaker, and comprehend and bring those to joint customers. Our customers want to convert data into insights along with advanced analytics platforms in AI. That is how they make holistic data-driven solutions that will give them competitive advantage. With Snowflake, our approach is to focus on customer solutions that leverage data from existing systems such as SAP, wherever they are in the cloud or on-premise. And to do this, we leverage partners like Qlik native US to help customers transform their businesses. We provide customers with a premier data analytics platform as a result. Itamar, why don't you talk about Qlik a little bit and then we can dive into the specific SAP solution here and some trends >> Sounds great, Peter. So Qlik provides modern data integration and analytics software used by over 38,000 customers worldwide. Our focus is to help our customers turn data into value and help them close the gap between data all the way through insight and action. We offer click data integration and click data analytics. Click data integration helps to automate the data pipelines to deliver data to where they want to use them in real-time and make the data ready for analytics and then Qlik data analytics is a robust platform for analytics and business intelligence has been a leader in the Gartner Magic Quadrant for over 11 years now in the market. And both of these come together into what we call Qlik Cloud, which is our SaaS based platform. So providing a more seamless way to consume all these services and accelerate time to value with customer solutions. In terms of partnerships, both Snowflake and AWS are very strategic to us here at Qlik, so we have very comprehensive investment to ensure strong joint value proposition to we can bring to our mutual customers, everything from aligning our roadmaps through optimizing and validating integrations, collaborating on best practices, packaging joint solutions like the one we'll talk about today. And with that investment, we are an elite level, top level partner with Snowflake. We fly that our technology is Snowflake-ready across the entire product set and we have hundreds of joint customers together and with AWS we've also partnered for a long time. We're here to reinvent. We've been here with the first reinvent since the inaugural one, so it kind of gives you an idea for how long we've been working with AWS. We provide very comprehensive integration with AWS data analytics services, and we have several competencies ranging from data analytics to migration and modernization. So that's our focus and again, we're excited about working with Snowflake and AWS to bring solutions together to market. >> Well, I'm looking forward to unpacking the solutions specifically, and congratulations on the continued success of both your companies. We've been following them obviously for a very long time and seeing the platform evolve beyond just SaaS and a lot more going on in cloud these days, kind of next generation emerging. You know, we're seeing a lot of macro trends that are going to be powering some of the things we're going to get into real quickly. But before we get into the solution, what are some of those power dynamics in the industry that you're seeing in trends specifically that are impacting your customers that are taking us down this road of getting more out of the data and specifically the SAP, but in general trends and dynamics. What are you hearing from your customers? Why do they care? Why are they going down this road? Peter, we'll start with you. >> Yeah, I'll go ahead and start. Thanks. Yeah, I'd say we continue to see customers being, being very eager to transform their businesses and they know they need to leverage technology and data to do so. They're also increasingly depending upon the cloud to bring that agility, that elasticity, new functionality necessary to react in real-time to every evolving customer needs. You look at what's happened over the last three years, and boy, the macro environment customers, it's all changing so fast. With our partnerships with AWS and Qlik, we've been able to bring to market innovative solutions like the one we're announcing today that spans all three companies. It provides a holistic solution and an integrated solution for our customer. >> Itamar let's get into it, you've been with theCUBE, you've seen the journey, you have your own journey, many, many years, you've seen the waves. What's going on now? I mean, what's the big wave? What's the dynamic powering this trend? >> Yeah, in a nutshell I'll call it, it's all about time. You know, it's time to value and it's about real-time data. I'll kind of talk about that a bit. So, I mean, you hear a lot about the data being the new oil, but it's definitely, we see more and more customers seeing data as their critical enabler for innovation and digital transformation. They look for ways to monetize data. They look as the data as the way in which they can innovate and bring different value to the customers. So we see customers want to use more data so to get more value from data. We definitely see them wanting to do it faster, right, than before. And we definitely see them looking for agility and automation as ways to accelerate time to value, and also reduce overall costs. I did mention real-time data, so we definitely see more and more customers, they want to be able to act and make decisions based on fresh data. So yesterday's data is just not good enough. >> John: Yeah. >> It's got to be down to the hour, down to the minutes and sometimes even lower than that. And then I think we're also seeing customers look to their core business systems where they have a lot of value, like the SAP, like mainframe and thinking, okay, our core data is there, how can we get more value from this data? So that's key things we see all the time with customers. >> Yeah, we did a big editorial segment this year on, we called data as code. Data as code is kind of a riff on infrastructure as code and you start to see data becoming proliferating into all aspects, fresh data. It's not just where you store it, it's how you share it, it's how you turn it into an application intrinsically involved in all aspects. This is the big theme this year and that's driving all the conversations here at RE:Invent. And I'm guaranteeing you, it's going to happen for another five and 10 years. It's not stopping. So I got to get into the solution, you guys mentioned SAP and you've announced the solution by Qlik, Snowflake and AWS for your customers using SAP. Can you share more about this solution? What's unique about it? Why is it important and why now? Peter, Itamar, we'll start with you first. >> Let me jump in, this is really, I'll jump because I'm excited. We're very excited about this solution and it's also a solution by the way and again, we've seen proven customer success with it. So to your point, it's ready to scale, it's starting, I think we're going to see a lot of companies doing this over the next few years. But before we jump to the solution, let me maybe take a few minutes just to clarify the need, why we're seeing, why we're seeing customers jump to do this. So customers that use SAP, they use it to manage the core of their business. So think order processing, management, finance, inventory, supply chain, and so much more. So if you're running SAP in your company, that data creates a great opportunity for you to drive innovation and modernization. So what we see customers want to do, they want to do more with their data and more means they want to take SAP with non-SAP data and use it together to drive new insights. They want to use real-time data to drive real-time analytics, which they couldn't do to date. They want to bring together descriptive with predictive analytics. So adding machine learning in AI to drive more value from the data. And naturally they want to do it faster. So find ways to iterate faster on their solutions, have freedom with the data and agility. And I think this is really where cloud data platforms like Snowflake and AWS, you know, bring that value to be able to drive that. Now to do that you need to unlock the SAP data, which is a lot of also where Qlik comes in because typical challenges these customers run into is the complexity, inherent in SAP data. Tens of thousands of tables, proprietary formats, complex data models, licensing restrictions, and more than, you have performance issues, they usually run into how do we handle the throughput, the volumes while maintaining lower latency and impact. Where do we find knowledge to really understand how to get all this done? So these are the things we've looked at when we came together to create a solution and make it unique. So when you think about its uniqueness, because we put together a lot, and I'll go through three, four key things that come together to make this unique. First is about data delivery. How do you have the SAP data delivery? So how do you get it from ECC, from HANA from S/4HANA, how do you deliver the data and the metadata and how that integration well into Snowflake. And what we've done is we've focused a lot on optimizing that process and the continuous ingestion, so the real-time ingestion of the data in a way that works really well with the Snowflake system, data cloud. Second thing is we looked at SAP data transformation, so once the data arrives at Snowflake, how do we turn it into being analytics ready? So that's where data transformation and data worth automation come in. And these are all elements of this solution. So creating derivative datasets, creating data marts, and all of that is done by again, creating an optimized integration that pushes down SQL based transformations, so they can be processed inside Snowflake, leveraging its powerful engine. And then the third element is bringing together data visualization analytics that can also take all the data now that in organizing inside Snowflake, bring other data in, bring machine learning from SageMaker, and then you go to create a seamless integration to bring analytic applications to life. So these are all things we put together in the solution. And maybe the last point is we actually took the next step with this and we created something we refer to as solution accelerators, which we're really, really keen about. Think about this as prepackaged templates for common business analytic needs like order to cash, finance, inventory. And we can either dig into that a little more later, but this gets the next level of value to the customers all built into this joint solution. >> Yeah, I want to get to the accelerators, but real quick, Peter, your reaction to the solution, what's unique about it? And obviously Snowflake, we've been seeing the progression data applications, more developers developing on top of Snowflake, data as code kind of implies developer ecosystem. This is kind of interesting. I mean, you got partnering with Qlik and AWS, it's kind of a developer-like thinking real solution. What's unique about this SAP solution that's, that's different than what customers can get anywhere else or not? >> Yeah, well listen, I think first of all, you have to start with the idea of the solution. This are three companies coming together to build a holistic solution that is all about, you know, creating a great opportunity to turn SAP data into value this is Itamar was talking about, that's really what we're talking about here and there's a lot of technology underneath it. I'll talk more about the Snowflake technology, what's involved here, and then cover some of the AWS pieces as well. But you know, we're focusing on getting that value out and accelerating time to value for our joint customers. As Itamar was saying, you know, there's a lot of complexity with the SAP data and a lot of value there. How can we manage that in a prepackaged way, bringing together best of breed solutions with proven capabilities and bringing this to market quickly for our joint customers. You know, Snowflake and AWS have been strong partners for a number of years now, and that's not only on how Snowflake runs on top of AWS, but also how we integrate with their complementary analytics and then all products. And so, you know, we want to be able to leverage those in addition to what Qlik is bringing in terms of the data transformations, bringing data out of SAP in the visualization as well. All very critical. And then we want to bring in the predictive analytics, AWS brings and what Sage brings. We'll talk about that a little bit later on. Some of the technologies that we're leveraging are some of our latest cutting edge technologies that really make things easier for both our partners and our customers. For example, Qlik leverages Snowflakes recently released Snowpark for Python functionality to push down those data transformations from clicking the Snowflake that Itamar's mentioning. And while we also leverage Snowpark for integrations with Amazon SageMaker, but there's a lot of great new technology that just makes this easy and compelling for customers. >> I think that's the big word, easy button here for what may look like a complex kind of integration, kind of turnkey, really, really compelling example of the modern era we're living in, as we always say in theCUBE. You mentioned accelerators, SAP accelerators. Can you give an example of how that works with the technology from the third party providers to deliver this business value Itamar, 'cause that was an interesting comment. What's the example? Give an example of this acceleration. >> Yes, certainly. I think this is something that really makes this truly, truly unique in the industry and again, a great opportunity for customers. So we kind talked earlier about there's a lot of things that need to be done with SP data to turn it to value. And these accelerator, as the name suggests, are designed to do just that, to kind of jumpstart the process and reduce the time and the risk involved in such project. So again, these are pre-packaged templates. We basically took a lot of knowledge, and a lot of configurations, best practices about to get things done and we put 'em together. So think about all the steps, it includes things like data extraction, so already knowing which tables, all the relevant tables that you need to get data from in the contexts of the solution you're looking for, say like order to cash, we'll get back to that one. How do you continuously deliver that data into Snowflake in an in efficient manner, handling things like data type mappings, metadata naming conventions and transformations. The data models you build all the way to data mart definitions and all the transformations that the data needs to go through moving through steps until it's fully analytics ready. And then on top of that, even adding a library of comprehensive analytic dashboards and integrations through machine learning and AI and put all of that in a way that's in pre-integrated and tested to work with Snowflake and AWS. So this is where again, you get this entire recipe that's ready. So take for example, I think I mentioned order to cash. So again, all these things I just talked about, I mean, for those who are not familiar, I mean order to cash is a critical business process for every organization. So especially if you're in retail, manufacturing, enterprise, it's a big... This is where, you know, starting with booking a sales order, following by fulfilling the order, billing the customer, then managing the accounts receivable when the customer actually pays, right? So this all process, you got sales order fulfillment and the billing impacts customer satisfaction, you got receivable payments, you know, the impact's working capital, cash liquidity. So again, as a result this order to cash process is a lifeblood for many businesses and it's critical to optimize and understand. So the solution accelerator we created specifically for order to cash takes care of understanding all these aspects and the data that needs to come with it. So everything we outline before to make the data available in Snowflake in a way that's really useful for downstream analytics, along with dashboards that are already common for that, for that use case. So again, this enables customers to gain real-time visibility into their sales orders, fulfillment, accounts receivable performance. That's what the Excel's are all about. And very similarly, we have another one for example, for finance analytics, right? So this will optimize financial data reporting, helps customers get insights into P&L, financial risk of stability or inventory analytics that helps with, you know, improve planning and inventory management, utilization, increased efficiencies, you know, so in supply chain. So again, these accelerators really help customers get a jumpstart and move faster with their solutions. >> Peter, this is the easy button we just talked about, getting things going, you know, get the ball rolling, get some acceleration. Big part of this are the three companies coming together doing this. >> Yeah, and to build on what Itamar just said that the SAP data obviously has tremendous value. Those sales orders, distribution data, financial data, bringing that into Snowflake makes it easily accessible, but also it enables it to be combined with other data too, is one of the things that Snowflake does so well. So you can get a full view of the end-to-end process and the business overall. You know, for example, I'll just take one, you know, one example that, that may not come to mind right away, but you know, looking at the impact of weather conditions on supply chain logistics is relevant and material and have interest to our customers. How do you bring those different data sets together in an easy way, bringing the data out of SAP, bringing maybe other data out of other systems through Qlik or through Snowflake, directly bringing data in from our data marketplace and bring that all together to make it work. You know, fundamentally organizational silos and the data fragmentation exist otherwise make it really difficult to drive modern analytics projects. And that in turn limits the value that our customers are getting from SAP data and these other data sets. We want to enable that and unleash. >> Yeah, time for value. This is great stuff. Itamar final question, you know, what are customers using this? What do you have? I'm sure you have customers examples already using the solution. Can you share kind of what these examples look like in the use cases and the value? >> Oh yeah, absolutely. Thank you. Happy to. We have customers across different, different sectors. You see manufacturing, retail, energy, oil and gas, CPG. So again, customers in those segments, typically sectors typically have SAP. So we have customers in all of them. A great example is like Siemens Energy. Siemens Energy is a global provider of gas par services. You know, over what, 28 billion, 30 billion in revenue. 90,000 employees. They operate globally in over 90 countries. So they've used SAP HANA as a core system, so it's running on premises, multiple locations around the world. And what they were looking for is a way to bring all these data together so they can innovate with it. And the thing is, Peter mentioned earlier, not just the SAP data, but also bring other data from other systems to bring it together for more value. That includes finance data, these logistics data, these customer CRM data. So they bring data from over 20 different SAP systems. Okay, with Qlik data integration, feeding that into Snowflake in under 20 minutes, 24/7, 365, you know, days a year. Okay, they get data from over 20,000 tables, you know, over million, hundreds of millions of records daily going in. So it is a great example of the type of scale, scalability, agility and speed that they can get to drive these kind of innovation. So that's a great example with Siemens. You know, another one comes to mind is a global manufacturer. Very similar scenario, but you know, they're using it for real-time executive reporting. So it's more like feasibility to the production data as well as for financial analytics. So think, think, think about everything from audit to texts to innovate financial intelligence because all the data's coming from SAP. >> It's a great time to be in the data business again. It keeps getting better and better. There's more data coming. It's not stopping, you know, it's growing so fast, it keeps coming. Every year, it's the same story, Peter. It's like, doesn't stop coming. As we wrap up here, let's just get customers some information on how to get started. I mean, obviously you're starting to see the accelerators, it's a great program there. What a great partnership between the two companies and AWS. How can customers get started to learn about the solution and take advantage of it, getting more out of their SAP data, Peter? >> Yeah, I think the first place to go to is talk to Snowflake, talk to AWS, talk to our account executives that are assigned to your account. Reach out to them and they will be able to educate you on the solution. We have packages up very nicely and can be deployed very, very quickly. >> Well gentlemen, thank you so much for coming on. Appreciate the conversation. Great overview of the partnership between, you know, Snowflake and Qlik and AWS on a joint solution. You know, getting more out of the SAP data. It's really kind of a key, key solution, bringing SAP data to life. Thanks for coming on theCUBE. Appreciate it. >> Thank you. >> Thank you John. >> Okay, this is theCUBE coverage here at RE:Invent 2022. I'm John Furrier, your host of theCUBE. Thanks for watching. (upbeat music)

Published Date : Dec 1 2022

SUMMARY :

bringing SAP data to life, great meeting you John. then going to jump into what On the Cloud Partner side, and I'm the senior vice and the solutions, and the value chain and accelerate time to value that are going to be powering and data to do so. What's the dynamic powering this trend? You know, it's time to value all the time with customers. and that's driving all the and it's also a solution by the way I mean, you got partnering and bringing this to market of the modern era we're living in, that the data needs to go through getting things going, you know, Yeah, and to build in the use cases and the value? agility and speed that they can get It's a great time to be to educate you on the solution. key solution, bringing SAP data to life. Okay, this is theCUBE

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

AWSORGANIZATION

0.99+

PeterPERSON

0.99+

DellORGANIZATION

0.99+

John FurrierPERSON

0.99+

SiemensORGANIZATION

0.99+

Peter MacDonaldPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Peter McDonaldPERSON

0.99+

QlikORGANIZATION

0.99+

28 billionQUANTITY

0.99+

two companiesQUANTITY

0.99+

TensQUANTITY

0.99+

three companiesQUANTITY

0.99+

Siemens EnergyORGANIZATION

0.99+

20 plus yearsQUANTITY

0.99+

yesterdayDATE

0.99+

SnowflakeORGANIZATION

0.99+

Itamar AnkorionPERSON

0.99+

third elementQUANTITY

0.99+

FirstQUANTITY

0.99+

threeQUANTITY

0.99+

ItamarPERSON

0.99+

over 20,000 tablesQUANTITY

0.99+

bothQUANTITY

0.99+

90,000 employeesQUANTITY

0.99+

firstQUANTITY

0.99+

SalesforceORGANIZATION

0.99+

Cloud PartnersORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

over 38,000 customersQUANTITY

0.99+

under 20 minutesQUANTITY

0.99+

10 yearsQUANTITY

0.99+

fiveQUANTITY

0.99+

ExcelTITLE

0.99+

oneQUANTITY

0.99+

over 11 yearsQUANTITY

0.98+

SnowparkTITLE

0.98+

Second thingQUANTITY

0.98+

Roland Lee & Hawn Nguyen Loughren | AWS re:Invent 2022 - Global Startup Program


 

>>Good afternoon everybody. I'm John Walls and welcome back to our coverage here on the cube of AWS Reinvent 22. We are bringing you another segment with the Global Startup Program, which is part of the AWS Start Showcase, and it's a pleasure to welcome two new guests here to the showcase. First, immediately to my right Han w lre. Good to see you Han. Good to see you. The leader of the Enterprise Solutions Architecture at aws. And on the far right, Rolin Lee, who is the co-founder and CEO of Heim Doll Data. Roland, good to see you. Great >>To be here. >>All right, good. Thanks for joining us. Well first off, for those at home, I may not be familiar with Heim Doll. What do you do? Why are you here? But I'll let you take it from there. >>Well, we're one of the sponsors here at AWS and great to be here. We offer a data access layer in the form of a proxy, and what it does is it provides complete visibility and the capability to enhance the interaction between the application and one's current database. And as a result, you'll, the customer will improve database scale, database security and availability. And all these features don't require any application changes. So that's sort of our marketing pitch, if you will, all these types of features to improve the experience of managing a database without any application >>Changes. And, and where's the cloud come into play then, for you then, where, where did it come into play for you? >>So we started out actually helping out customers on premise, and a lot of enterprise customers are moving over to the cloud, and it was just a natural progression to do that. And so aws, which is a key part of ours, partners with us to help solve customer problems, especially on the database side, as the application being application performance tends to have issues between the interaction between the application database and we're solving that issue. >>Right. Sohan, I mean, Roan just touched on it about OnPrem, right? There's still some kickers and screamers out there that, that don't, haven't bought in or, or they're about to, but you're about to get 'em. I, I'm sure. But talk about that, that conversion or that transition, if you would, from going OnPrem into a hybrid environment or to into the, the bigger cloud environment and and how difficult that is sometimes. Yes. Maybe to get people to, to make that kind of a leap. >>Well, I would say that a lot of customers are wanting to focus more on product innovation experimentation, and also in terms of having to manage servers and patching, you know, it's to take away from that initiative that they're trying to do. So with aws, we provide undifferentiated heavy lifting so that they can focus on product innovation. And one of the areas talking about Heim is that from the database side, we do provide Amazon rds, which is database and also Aurora, to give them that lift so they don't have to worry about patching servers and setting up provisioning servers as well. >>Right. So Roland, can you get the idea across to people very simply, let us take care of the, the hard stuff and, and that will free you up to do your product innovations, to do your experimentations to, to really free up your team, basically to do the fun stuff and, and let us sweat over the, the, the details basically. Right? >>Exactly. Our, our motto is not only why build when, when you can buy. So a lot of it has to do with offering the, the value in terms of price and the features such as it's gonna benefit a team. Large companies like amazon.com, Google, they have huge teams that can build data access layers and proxies. And what we're trying to do here is commercialize those cuz those are built in house and it's not readily available for customers to use. And you'd need some type of interface between the application and the database. And we provide that sort of why build when you can buy. >>Well, I was gonna say why h right? I mean what's your special sauce? Because everybody's got something, obviously a market differentiator that you're bringing into place here. So you started to touch on a little bit there for me, but, but dive a little deeper there. I mean, what, what is it that, that you're bringing to the table with AWS that you think puts you above the crowd? >>Well, lemme give you a use case here. In typical events like let's say Black Friday where there's a surge traffic that can overwhelm the database, the Heim doll data access layer database proxy provides an auto scaling distributed architecture such that it can absorb those surges and traffic and help scale the database while keeping the data fresh and up to date. And so basically traffic based on season time of day, we can, we can adjust automatically and all these types of features that we offer, most notably automated query caching, ReadWrite split for asset compliance don't require any code changes, which typically requires the application developer to make those changes. So we're saving months, maybe years of development and maintenance. >>Yeah, a lot of gray hairs too, right? Yeah, you're, you're solving a lot of problems there. What about database trends in just in general Hunt, if you will. I mean, this is your space, right? I mean, what we're hearing about from Heindel, you know, in terms of solutions they're providing, but what are you seeing just from the macro level in terms of what people are doing and thinking about the database and how it relates to the cloud? Right. >>And some of the things that we're seeing is that we're seeing an explosion of data, relevant data that customers need to be able to consume and also process as well. So with the explosion of data, there's also, we see customers trying to modernize their application as well through microservices, which does change the design patterns of like the applications we call the access data patterns as well. So again, going back to that, a differentiated heavy lifting, we do have something called purpose built databases, right? It's the right tool for the right purpose. And so it depends on what their like rpo, rto their access to data pattern. Is it a base, is it an acid? So we want to be able to provide them the options to build and also innovate. So with that, that's why we have the Amazon rds, the also the, we also have Redshift, we also have Aurora and et cetera. The Rediff is more of the BI side, but usually when you ingest the data, you have some level of processing to get more insight. So with that, that's why customers are moving more of towards the managed service so that they can give that lift and then focusing on that product and innovation. Yeah. >>Have we kind of caught up or are we catching up to this just the tsunami of data to begin with, right? Because I mean, that was it, you know, what, seven, eight years ago when, when that data became kind of, or becoming king and, and reams and reams and reams and all, you know, can't handle it, right? And, and are we now able to manage that process and manage that flow and get the right data into the right hands at the right time? We're doing better with that. >>I would say that it, it definitely has grown in size of the amount of data that we're ingesting. And so with the scalability and agility of the cloud, we're able to, I would say, adapt to the rapid changes and ingestions of the data. So, so that's why we have things like Aurora servers to have that or auto scale so they can do like MySQL or Postgres and then they can still, like what you know, I'm trying to do is basically don't have to co do like any code changes. It would be a data migration. They still use the same underlying database on also mechanisms, but here we're providing them at scale on the cloud. >>Yeah. Our proxies, they must have for all databases. I mean, is that, is that essential these days? >>Well, good question John. I would say yes. And this is often built in house, as I mentioned, for large companies, they do build some type of data access layer or proxy and, or some utilize some orm, some object relational map to do it. And what again, what we're trying to do is offer this, put this out into the market commercially speaking, such that it can be readily used for, for all the customers to use rather than building it from scratch all the time. >>You know what I didn't ask you was Roy, how does AWS come into play for you then? And, and as in the startup mode, the focus that they've had in startups in general, but in you in particular, I mean, talk about that partnership or that relationship and the value that you're extracting from that. >>The ad AWS partnership has been absolutely wonderful. The collaboration, they have one of the best managed service databases. The value that it that adds in terms of the durability, the manageability, what the Heim doll data does is it compliments Amazon rds, Amazon Redshift very well in the sense that we're not replacing the database. What we're doing is we are allowing the customer to get the most out of the managed service database, whether it be Redshift or Aurora Serverless, rds, all without code changes. And or the analogy that I would give John is a car, a race car may be very fast, but it takes a driver to get to those fast speeds. We're the driver, the Hyundai proxy provides that intelligence so that you can get the most out of that database engine. >>And, and Hfi would then touch on, first off AWS and the emphasis that you have put on startups and are obviously, you know, kind of putting your money where your mouth is, right? With, with the way you've encouraged and nurtured that environment. And they would be about Heim doll in general about where you see this going or what you would like to have, where you want to take this in the next say 12 months, 18 months. >>I think it's more of a better together story of how we can basically coil with our partners, right? And, and basically focusing on helping our customers drive that innovation and be collaboration. So as Heim, as a independent service vendor isv, most customers can leverage that through a marketplace where basically it integrates very nicely with aws. So that gives 'em that lift and it goes back to the undifferentiated heavy lifting on the Hein proxy side, if you will, because then you have this proxy in the middle where then it helps them with their SQL performance. And I've seen use cases where customers were, have some legacy system that they may not have time to modernize the application. So they use this as a lift to keep, keep going as they try to modernize. But also I've seen customers who use are trying to use it as a, a way to give that performance lift because they may have a third party software that they cannot change the code by putting this in there that helps optimize their lines of business or whatever that is, and maybe can be online store or whatever. So I would say it was a better together type of story. >>Yeah. Which is, there's gotta be a song in there somewhere. So peek around the corner and if you wanna be headlights here right now in terms of 12, 18 months, I mean, what, you know, what what next to solve, right? You've already taken, you've slayed a few dragons along the way, but there are others I'm sure is it always happens in innovation in this space. Just when you solve a problem you've just dealt or you have to deal with others that pop up as maybe unintended consequences or at least a new challenge. So what would that be in your world right now? What, what do you see, you know, occupying your sleepless nights here for the next year or so? >>Well, for, for HOMEDALE data, it's all about improving database performance and scale. And those workloads change. We have O ltp, we have OLA with artificial intelligence ml. There's different type of traffic profiles and we're focused on improving those data profiles. It could be unstructured structured. Right now we're focused on structured data, which is relational databases, but there's a lot of opportunity to improve the performance of data. >>Well, you're driving the car, you got a good navigator. I think the GPS is working. So keep up the good work and thank you for sharing the time today. Thank you. Thank you, joy. Do appreciate it. All right, you are watching the cube. We continue our coverage here from AWS Reinvent 22, the Cube, of course, the leader in high tech coverage.

Published Date : Nov 30 2022

SUMMARY :

Good to see you Han. Why are you here? a data access layer in the form of a proxy, and what it does is it And, and where's the cloud come into play then, for you then, where, where did it come into play for you? and a lot of enterprise customers are moving over to the cloud, and it was just a that conversion or that transition, if you would, from going OnPrem into a hybrid environment or and patching, you know, it's to take away from that initiative that they're trying to do. the hard stuff and, and that will free you up to do your product innovations, So a lot of it has to do with offering the, the value in terms So you started to touch on a little bit there for me, but, but dive a little deeper there. Well, lemme give you a use case here. but what are you seeing just from the macro level in terms of what people are doing and thinking about the database The Rediff is more of the BI side, but usually when you ingest the data, you have some level of processing Because I mean, that was it, you know, what, seven, eight years ago when, then they can still, like what you know, I'm trying to do is basically don't have to co do like any I mean, is that, is that essential to use rather than building it from scratch all the time. And, and as in the startup mode, the focus that they've so that you can get the most out of that database engine. you have put on startups and are obviously, you know, kind of putting your money where your mouth is, right? heavy lifting on the Hein proxy side, if you will, because then you have this proxy in the middle where I mean, what, you know, what what next to solve, right? to improve the performance of data. up the good work and thank you for sharing the time today.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
John WallsPERSON

0.99+

AWSORGANIZATION

0.99+

JohnPERSON

0.99+

HyundaiORGANIZATION

0.99+

Rolin LeePERSON

0.99+

GoogleORGANIZATION

0.99+

12QUANTITY

0.99+

RolandPERSON

0.99+

Heim Doll DataORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Heim DollORGANIZATION

0.99+

SohanPERSON

0.99+

RoanPERSON

0.99+

FirstQUANTITY

0.99+

RoyPERSON

0.99+

Black FridayEVENT

0.99+

18 monthsQUANTITY

0.99+

MySQLTITLE

0.99+

HeimORGANIZATION

0.99+

todayDATE

0.98+

amazon.comORGANIZATION

0.98+

firstQUANTITY

0.98+

next yearDATE

0.97+

sevenDATE

0.97+

Hawn Nguyen LoughrenPERSON

0.97+

two new guestsQUANTITY

0.97+

SQLTITLE

0.96+

Roland LeePERSON

0.96+

12 monthsQUANTITY

0.96+

oneQUANTITY

0.95+

HanPERSON

0.94+

awsORGANIZATION

0.94+

RediffORGANIZATION

0.89+

OLAORGANIZATION

0.89+

HeinORGANIZATION

0.85+

OnPremORGANIZATION

0.83+

HfiORGANIZATION

0.82+

Reinvent 22COMMERCIAL_ITEM

0.81+

eight years agoDATE

0.79+

RedshiftTITLE

0.79+

RedshiftORGANIZATION

0.76+

Heim dollORGANIZATION

0.73+

22TITLE

0.72+

AuroraORGANIZATION

0.71+

PostgresTITLE

0.66+

Global Startup ProgramTITLE

0.66+

Start ShowcaseEVENT

0.62+

HeindelPERSON

0.59+

Aurora ServerlessTITLE

0.57+

Invent 2022TITLE

0.49+

Global Startup ProgramOTHER

0.47+

HuntPERSON

0.41+

ReadWriteORGANIZATION

0.4+

ReinventCOMMERCIAL_ITEM

0.36+