Breaking Analysis: Databricks faces critical strategic decisions…here’s why

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
David Floyer	PERSON	0.99+
Mike Olson	PERSON	0.99+
2014	DATE	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
George	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
Dave	PERSON	0.99+
Uber	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Sun Microsystems	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Bob Muglia	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
2010	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Kristin Martin	PERSON	0.99+
Rob Hof	PERSON	0.99+
three	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Databricks'	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
Boston	LOCATION	0.99+
Tristan Handy	PERSON	0.99+
M&A	ORGANIZATION	0.99+
Frank Quattrone	PERSON	0.99+
second element	QUANTITY	0.99+
Daren Brabham	PERSON	0.99+
TechAlpha Partners	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
50 year	QUANTITY	0.99+
40%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
five years	QUANTITY	0.99+

Adam Wenchel & John Dickerson, Arthur | AWS Startup Showcase S3 E1

(upbeat music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase AI Machine Learning Top Startups Building Generative AI on AWS. This is season 3, episode 1 of the ongoing series covering the exciting startup from the AWS ecosystem to talk about AI and machine learning. I'm your host, John Furrier. I'm joined by two great guests here, Adam Wenchel, who's the CEO of Arthur, and Chief Scientist of Arthur, John Dickerson. Talk about how they help people build better LLM AI systems to get them into the market faster. Gentlemen, thank you for coming on. >> Yeah, thanks for having us, John. >> Well, I got to say I got to temper my enthusiasm because the last few months explosion of interest in LLMs with ChatGPT, has opened the eyes to everybody around the reality of that this is going next gen, this is it, this is the moment, this is the the point we're going to look back and say, this is the time where AI really hit the scene for real applications. So, a lot of Large Language Models, also known as LLMs, foundational models, and generative AI is all booming. This is where all the alpha developers are going. This is where everyone's focusing their business model transformations on. This is where developers are seeing action. So it's all happening, the wave is here. So I got to ask you guys, what are you guys seeing right now? You're in the middle of it, it's hitting you guys right on. You're in the front end of this massive wave. >> Yeah, John, I don't think you have to temper your enthusiasm at all. I mean, what we're seeing every single day is, everything from existing enterprise customers coming in with new ways that they're rethinking, like business things that they've been doing for many years that they can now do an entirely different way, as well as all manner of new companies popping up, applying LLMs to everything from generating code and SQL statements to generating health transcripts and just legal briefs. Everything you can imagine. And when you actually sit down and look at these systems and the demos we get of them, the hype is definitely justified. It's pretty amazing what they're going to do. And even just internally, we built, about a month ago in January, we built an Arthur chatbot so customers could ask questions, technical questions from our, rather than read our product documentation, they could just ask this LLM a particular question and get an answer. And at the time it was like state of the art, but then just last week we decided to rebuild it because the tooling has changed so much that we, last week, we've completely rebuilt it. It's now way better, built on an entirely different stack. And the tooling has undergone a full generation worth of change in six weeks, which is crazy. So it just tells you how much energy is going into this and how fast it's evolving right now. >> John, weigh in as a chief scientist. I mean, you must be blown away. Talk about kid in the candy store. I mean, you must be looking like this saying, I mean, she must be super busy to begin with, but the change, the acceleration, can you scope the kind of change you're seeing and be specific around the areas you're seeing movement and highly accelerated change? >> Yeah, definitely. And it is very, very exciting actually, thinking back to when ChatGPT was announced, that was a night our company was throwing an event at NeurIPS, which is maybe the biggest machine learning conference out there. And the hype when that happened was palatable and it was just shocking to see how well that performed. And then obviously over the last few months since then, as LLMs have continued to enter the market, we've seen use cases for them, like Adam mentioned all over the place. And so, some things I'm excited about in this space are the use of LLMs and more generally, foundation models to redesign traditional operations, research style problems, logistics problems, like auctions, decisioning problems. So moving beyond the already amazing news cases, like creating marketing content into more core integration and a lot of the bread and butter companies and tasks that drive the American ecosystem. And I think we're just starting to see some of that. And in the next 12 months, I think we're going to see a lot more. If I had to make other predictions, I think we're going to continue seeing a lot of work being done on managing like inference time costs via shrinking models or distillation. And I don't know how to make this prediction, but at some point we're going to be seeing lots of these very large scale models operating on the edge as well. So the time scales are extremely compressed, like Adam mentioned, 12 months from now, hard to say. >> We were talking on theCUBE prior to this session here. We had theCUBE conversation here and then the Wall Street Journal just picked up on the same theme, which is the printing press moment created the enlightenment stage of the history. Here we're in the whole nother automating intellect efficiency, doing heavy lifting, the creative class coming back, a whole nother level of reality around the corner that's being hyped up. The question is, is this justified? Is there really a breakthrough here or is this just another result of continued progress with AI? Can you guys weigh in, because there's two schools of thought. There's the, "Oh my God, we're entering a new enlightenment tech phase, of the equivalent of the printing press in all areas. Then there's, Ah, it's just AI (indistinct) inch by inch. What's your guys' opinion? >> Yeah, I think on the one hand when you're down in the weeds of building AI systems all day, every day, like we are, it's easy to look at this as an incremental progress. Like we have customers who've been building on foundation models since we started the company four years ago, particular in computer vision for classification tasks, starting with pre-trained models, things like that. So that part of it doesn't feel real new, but what does feel new is just when you apply these things to language with all the breakthroughs and computational efficiency, algorithmic improvements, things like that, when you actually sit down and interact with ChatGPT or one of the other systems that's out there that's building on top of LLMs, it really is breathtaking, like, the level of understanding that they have and how quickly you can accelerate your development efforts and get an actual working system in place that solves a really important real world problem and makes people way faster, way more efficient. So I do think there's definitely something there. It's more than just incremental improvement. This feels like a real trajectory inflection point for the adoption of AI. >> John, what's your take on this? As people come into the field, I'm seeing a lot of people move from, hey, I've been coding in Python, I've been doing some development, I've been a software engineer, I'm a computer science student. I'm coding in C++ old school, OG systems person. Where do they come in? Where's the focus, where's the action? Where are the breakthroughs? Where are people jumping in and rolling up their sleeves and getting dirty with this stuff? >> Yeah, all over the place. And it's funny you mentioned students in a different life. I wore a university professor hat and so I'm very, very familiar with the teaching aspects of this. And I will say toward Adam's point, this really is a leap forward in that techniques like in a co-pilot for example, everybody's using them right now and they really do accelerate the way that we develop. When I think about the areas where people are really, really focusing right now, tooling is certainly one of them. Like you and I were chatting about LangChain right before this interview started, two or three people can sit down and create an amazing set of pipes that connect different aspects of the LLM ecosystem. Two, I would say is in engineering. So like distributed training might be one, or just understanding better ways to even be able to train large models, understanding better ways to then distill them or run them. So like this heavy interaction now between engineering and what I might call traditional machine learning from 10 years ago where you had to know a lot of math, you had to know calculus very well, things like that. Now you also need to be, again, a very strong engineer, which is exciting. >> I interviewed Swami when he talked about the news. He's ahead of Amazon's machine learning and AI when they announced Hugging Face announcement. And I reminded him how Amazon was easy to get into if you were developing a startup back in 2007,8, and that the language models had that similar problem. It's step up a lot of content and a lot of expense to get provisioned up, now it's easy. So this is the next wave of innovation. So how do you guys see that from where we are right now? Are we at that point where it's that moment where it's that cloud-like experience for LLMs and large language models? >> Yeah, go ahead John. >> I think the answer is yes. We see a number of large companies that are training these and serving these, some of which are being co-interviewed in this episode. I think we're at that. Like, you can hit one of these with a simple, single line of Python, hitting an API, you can boot this up in seconds if you want. It's easy. >> Got it. >> So I (audio cuts out). >> Well let's take a step back and talk about the company. You guys being featured here on the Showcase. Arthur, what drove you to start the company? How'd this all come together? What's the origination story? Obviously you got a big customers, how'd get started? What are you guys doing? How do you make money? Give a quick overview. >> Yeah, I think John and I come at it from slightly different angles, but for myself, I have been a part of a number of technology companies. I joined Capital One, they acquired my last company and shortly after I joined, they asked me to start their AI team. And so even though I've been doing AI for a long time, I started my career back in DARPA. It was the first time I was really working at scale in AI at an organization where there were hundreds of millions of dollars in revenue at stake with the operation of these models and that they were impacting millions of people's financial livelihoods. And so it just got me hyper-focused on these issues around making sure that your AI worked well and it worked well for your company and it worked well for the people who were being affected by it. At the time when I was doing this 2016, 2017, 2018, there just wasn't any tooling out there to support this production management model monitoring life phase of the life cycle. And so we basically left to start the company that I wanted. And John has a his own story. I'll let let you share that one, John. >> Go ahead John, you're up. >> Yeah, so I'm coming at this from a different world. So I'm on leave now from a tenured role in academia where I was leading a large lab focusing on the intersection of machine learning and economics. And so questions like fairness or the response to the dynamism on the underlying environment have been around for quite a long time in that space. And so I've been thinking very deeply about some of those more like R and D style questions as well as having deployed some automation code across a couple of different industries, some in online advertising, some in the healthcare space and so on, where concerns of, again, fairness come to bear. And so Adam and I connected to understand the space of what that might look like in the 2018 20 19 realm from a quantitative and from a human-centered point of view. And so booted things up from there. >> Yeah, bring that applied engineering R and D into the Capital One, DNA that he had at scale. I could see that fit. I got to ask you now, next step, as you guys move out and think about LLMs and the recent AI news around the generative models and the foundational models like ChatGPT, how should we be looking at that news and everyone watching might be thinking the same thing. I know at the board level companies like, we should refactor our business, this is the future. It's that kind of moment, and the tech team's like, okay, boss, how do we do this again? Or are they prepared? How should we be thinking? How should people watching be thinking about LLMs? >> Yeah, I think they really are transformative. And so, I mean, we're seeing companies all over the place. Everything from large tech companies to a lot of our large enterprise customers are launching significant projects at core parts of their business. And so, yeah, I would be surprised, if you're serious about becoming an AI native company, which most leading companies are, then this is a trend that you need to be taking seriously. And we're seeing the adoption rate. It's funny, I would say the AI adoption in the broader business world really started, let's call it four or five years ago, and it was a relatively slow adoption rate, but I think all that kind of investment in and scaling the maturity curve has paid off because the rate at which people are adopting and deploying systems based on this is tremendous. I mean, this has all just happened in the few months and we're already seeing people get systems into production. So, now there's a lot of things you have to guarantee in order to put these in production in a way that basically is added into your business and doesn't cause more headaches than it solves. And so that's where we help customers is where how do you put these out there in a way that they're going to represent your company well, they're going to perform well, they're going to do their job and do it properly. >> So in the use case, as a customer, as I think about this, there's workflows. They might have had an ML AI ops team that's around IT. Their inference engines are out there. They probably don't have a visibility on say how much it costs, they're kicking the tires. When you look at the deployment, there's a cost piece, there's a workflow piece, there's fairness you mentioned John, what should be, I should be thinking about if I'm going to be deploying stuff into production, I got to think about those things. What's your opinion? >> Yeah, I'm happy to dive in on that one. So monitoring in general is extremely important once you have one of these LLMs in production, and there have been some changes versus traditional monitoring that we can dive deeper into that LLMs are really accelerated. But a lot of that bread and butter style of things you should be looking out for remain just as important as they are for what you might call traditional machine learning models. So the underlying environment of data streams, the way users interact with these models, these are all changing over time. And so any performance metrics that you care about, traditional ones like an accuracy, if you can define that for an LLM, ones around, for example, fairness or bias. If that is a concern for your particular use case and so on. Those need to be tracked. Now there are some interesting changes that LLMs are bringing along as well. So most ML models in production that we see are relatively static in the sense that they're not getting flipped in more than maybe once a day or once a week or they're just set once and then not changed ever again. With LLMs, there's this ongoing value alignment or collection of preferences from users that is often constantly updating the model. And so that opens up all sorts of vectors for, I won't say attack, but for problems to arise in production. Like users might learn to use your system in a different way and thus change the way those preferences are getting collected and thus change your system in ways that you never intended. So maybe that went through governance already internally at the company and now it's totally, totally changed and it's through no fault of your own, but you need to be watching over that for sure. >> Talk about the reinforced learnings from human feedback. How's that factoring in to the LLMs? Is that part of it? Should people be thinking about that? Is that a component that's important? >> It certainly is, yeah. So this is one of the big tweaks that happened with InstructGPT, which is the basis model behind ChatGPT and has since gone on to be used all over the place. So value alignment I think is through RLHF like you mentioned is a very interesting space to get into and it's one that you need to watch over. Like, you're asking humans for feedback over outputs from a model and then you're updating the model with respect to that human feedback. And now you've thrown humans into the loop here in a way that is just going to complicate things. And it certainly helps in many ways. You can ask humans to, let's say that you're deploying an internal chat bot at an enterprise, you could ask humans to align that LLM behind the chatbot to, say company values. And so you're listening feedback about these company values and that's going to scoot that chatbot that you're running internally more toward the kind of language that you'd like to use internally on like a Slack channel or something like that. Watching over that model I think in that specific case, that's a compliance and HR issue as well. So while it is part of the greater LLM stack, you can also view that as an independent bit to watch over. >> Got it, and these are important factors. When people see the Bing news, they freak out how it's doing great. Then it goes off the rails, it goes big, fails big. (laughing) So these models people see that, is that human interaction or is that feedback, is that not accepting it or how do people understand how to take that input in and how to build the right apps around LLMs? This is a tough question. >> Yeah, for sure. So some of the examples that you'll see online where these chatbots go off the rails are obviously humans trying to break the system, but some of them clearly aren't. And that's because these are large statistical models and we don't know what's going to pop out of them all the time. And even if you're doing as much in-house testing at the big companies like the Go-HERE's and the OpenAI's of the world, to try to prevent things like toxicity or racism or other sorts of bad content that might lead to bad pr, you're never going to catch all of these possible holes in the model itself. And so, again, it's very, very important to keep watching over that while it's in production. >> On the business model side, how are you guys doing? What's the approach? How do you guys engage with customers? Take a minute to explain the customer engagement. What do they need? What do you need? How's that work? >> Yeah, I can talk a little bit about that. So it's really easy to get started. It's literally a matter of like just handing out an API key and people can get started. And so we also offer alternative, we also offer versions that can be installed on-prem for models that, we find a lot of our customers have models that deal with very sensitive data. So you can run it in your cloud account or use our cloud version. And so yeah, it's pretty easy to get started with this stuff. We find people start using it a lot of times during the validation phase 'cause that way they can start baselining performance models, they can do champion challenger, they can really kind of baseline the performance of, maybe they're considering different foundation models. And so it's a really helpful tool for understanding differences in the way these models perform. And then from there they can just flow that into their production inferencing, so that as these systems are out there, you have really kind of real time monitoring for anomalies and for all sorts of weird behaviors as well as that continuous feedback loop that helps you make make your product get better and observability and you can run all sorts of aggregated reports to really understand what's going on with these models when they're out there deciding. I should also add that we just today have another way to adopt Arthur and that is we are in the AWS marketplace, and so we are available there just to make it that much easier to use your cloud credits, skip the procurement process, and get up and running really quickly. >> And that's great 'cause Amazon's got SageMaker, which handles a lot of privacy stuff, all kinds of cool things, or you can get down and dirty. So I got to ask on the next one, production is a big deal, getting stuff into production. What have you guys learned that you could share to folks watching? Is there a cost issue? I got to monitor, obviously you brought that up, we talked about the even reinforcement issues, all these things are happening. What is the big learnings that you could share for people that are going to put these into production to watch out for, to plan for, or be prepared for, hope for the best plan for the worst? What's your advice? >> I can give a couple opinions there and I'm sure Adam has. Well, yeah, the big one from my side is, again, I had mentioned this earlier, it's just the input data streams because humans are also exploring how they can use these systems to begin with. It's really, really hard to predict the type of inputs you're going to be seeing in production. Especially, we always talk about chatbots, but then any generative text tasks like this, let's say you're taking in news articles and summarizing them or something like that, it's very hard to get a good sampling even of the set of news articles in such a way that you can really predict what's going to pop out of that model. So to me, it's, adversarial maybe isn't the word that I would use, but it's an unnatural shifting input distribution of like prompts that you might see for these models. That's certainly one. And then the second one that I would talk about is, it can be hard to understand the costs, the inference time costs behind these LLMs. So the pricing on these is always changing as the models change size, it might go up, it might go down based on model size, based on energy cost and so on, but your pricing per token or per a thousand tokens and that I think can be difficult for some clients to wrap their head around. Again, you don't know how these systems are going to be used after all so it can be tough. And so again that's another metric that really should be tracked. >> Yeah, and there's a lot of trade off choices in there with like, how many tokens do you want at each step and in the sequence and based on, you have (indistinct) and you reject these tokens and so based on how your system's operating, that can make the cost highly variable. And that's if you're using like an API version that you're paying per token. A lot of people also choose to run these internally and as John mentioned, the inference time on these is significantly higher than a traditional classifi, even NLP classification model or tabular data model, like orders of magnitude higher. And so you really need to understand how that, as you're constantly iterating on these models and putting out new versions and new features in these models, how that's affecting the overall scale of that inference cost because you can use a lot of computing power very quickly with these profits. >> Yeah, scale, performance, price all come together. I got to ask while we're here on the secret sauce of the company, if you had to describe to people out there watching, what's the secret sauce of the company? What's the key to your success? >> Yeah, so John leads our research team and they've had a number of really cool, I think AI as much as it's been hyped for a while, it's still commercial AI at least is really in its infancy. And so the way we're able to pioneer new ways to think about performance for computer vision NLP LLMs is probably the thing that I'm proudest about. John and his team publish papers all the time at Navs and other places. But I think it's really being able to define what performance means for basically any kind of model type and give people really powerful tools to understand that on an ongoing basis. >> John, secret sauce, how would you describe it? You got all the action happening all around you. >> Yeah, well I going to appreciate Adam talking me up like that. No, I. (all laughing) >> Furrier: Robs to you. >> I would also say a couple of other things here. So we have a very strong engineering team and so I think some early hires there really set the standard at a very high bar that we've maintained as we've grown. And I think that's really paid dividends as scalabilities become even more of a challenge in these spaces, right? And so that's not just scalability when it comes to LLMs, that's scalability when it comes to millions of inferences per day, that kind of thing as well in traditional ML models. And I think that's compared to potential competitors, that's really... Well, it's made us able to just operate more efficiently and pass that along to the client. >> Yeah, and I think the infancy comment is really important because it's the beginning. You really is a long journey ahead. A lot of change coming, like I said, it's a huge wave. So I'm sure you guys got a lot of plannings at the foundation even for your own company, so I appreciate the candid response there. Final question for you guys is, what should the top things be for a company in 2023? If I'm going to set the agenda and I'm a customer moving forward, putting the pedal to the metal, so to speak, what are the top things I should be prioritizing or I need to do to be successful with AI in 2023? >> Yeah, I think, so number one, as we talked about, we've been talking about this entire episode, the things are changing so quickly and the opportunities for business transformation and really disrupting different applications, different use cases, is almost, I don't think we've even fully comprehended how big it is. And so really digging in to your business and understanding where I can apply these new sets of foundation models is, that's a top priority. The interesting thing is I think there's another force at play, which is the macroeconomic conditions and a lot of places are, they're having to work harder to justify budgets. So in the past, couple years ago maybe, they had a blank check to spend on AI and AI development at a lot of large enterprises that was limited primarily by the amount of talent they could scoop up. Nowadays these expenditures are getting scrutinized more. And so one of the things that we really help our customers with is like really calculating the ROI on these things. And so if you have models out there performing and you have a new version that you can put out that lifts the performance by 3%, how many tens of millions of dollars does that mean in business benefit? Or if I want to go to get approval from the CFO to spend a few million dollars on this new project, how can I bake in from the beginning the tools to really show the ROI along the way? Because I think in these systems when done well for a software project, the ROI can be like pretty spectacular. Like we see over a hundred percent ROI in the first year on some of these projects. And so, I think in 2023, you just need to be able to show what you're getting for that spend. >> It's a needle moving moment. You see it all the time with some of these aha moments or like, whoa, blown away. John, I want to get your thoughts on this because one of the things that comes up a lot for companies that I talked to, that are on my second wave, I would say coming in, maybe not, maybe the front wave of adopters is talent and team building. You mentioned some of the hires you got were game changing for you guys and set the bar high. As you move the needle, new developers going to need to come in. What's your advice given that you've been a professor, you've seen students, I know a lot of computer science people want to shift, they might not be yet skilled in AI, but they're proficient in programming, is that's going to be another opportunity with open source when things are happening. How do you talk to that next level of talent that wants to come in to this market to supplement teams and be on teams, lead teams? Any advice you have for people who want to build their teams and people who are out there and want to be a coder in AI? >> Yeah, I've advice, and this actually works for what it would take to be a successful AI company in 2023 as well, which is, just don't be afraid to iterate really quickly with these tools. The space is still being explored on what they can be used for. A lot of the tasks that they're used for now right? like creating marketing content using a machine learning is not a new thing to do. It just works really well now. And so I'm excited to see what the next year brings in terms of folks from outside of core computer science who are, other engineers or physicists or chemists or whatever who are learning how to use these increasingly easy to use tools to leverage LLMs for tasks that I think none of us have really thought about before. So that's really, really exciting. And so toward that I would say iterate quickly. Build things on your own, build demos, show them the friends, host them online and you'll learn along the way and you'll have somebody to show for it. And also you'll help us explore that space. >> Guys, congratulations with Arthur. Great company, great picks and shovels opportunities out there for everybody. Iterate fast, get in quickly and don't be afraid to iterate. Great advice and thank you for coming on and being part of the AWS showcase, thanks. >> Yeah, thanks for having us on John. Always a pleasure. >> Yeah, great stuff. Adam Wenchel, John Dickerson with Arthur. Thanks for coming on theCUBE. I'm John Furrier, your host. Generative AI and AWS. Keep it right there for more action with theCUBE. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startup Showcase has opened the eyes to everybody and the demos we get of them, but the change, the acceleration, And in the next 12 months, of the equivalent of the printing press and how quickly you can accelerate As people come into the field, aspects of the LLM ecosystem. and that the language models in seconds if you want. and talk about the company. of the life cycle. in the 2018 20 19 realm I got to ask you now, next step, in the broader business world So in the use case, as a the way users interact with these models, How's that factoring in to that LLM behind the chatbot and how to build the Go-HERE's and the OpenAI's What's the approach? differences in the way that are going to put So the pricing on these is always changing and in the sequence What's the key to your success? And so the way we're able to You got all the action Yeah, well I going to appreciate Adam and pass that along to the client. so I appreciate the candid response there. get approval from the CFO to spend You see it all the time with some of A lot of the tasks that and being part of the Yeah, thanks for having us Generative AI and AWS.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Adam Wenchel	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Adam	PERSON	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
John Dickerson	PERSON	0.99+
2016	DATE	0.99+
2018	DATE	0.99+
2023	DATE	0.99+
3%	QUANTITY	0.99+
2017	DATE	0.99+
Capital One	ORGANIZATION	0.99+
last week	DATE	0.99+
AWS	ORGANIZATION	0.99+
Arthur	PERSON	0.99+
Python	TITLE	0.99+
millions	QUANTITY	0.99+
Two	QUANTITY	0.99+
each step	QUANTITY	0.99+
2018 20 19	DATE	0.99+
two schools	QUANTITY	0.99+
couple years ago	DATE	0.99+
once a week	QUANTITY	0.99+
one	QUANTITY	0.98+
first year	QUANTITY	0.98+
Swami	PERSON	0.98+
four years ago	DATE	0.98+
four	DATE	0.98+
first time	QUANTITY	0.98+
Arthur	ORGANIZATION	0.98+
two great guests	QUANTITY	0.98+
next year	DATE	0.98+
once a day	QUANTITY	0.98+
six weeks	QUANTITY	0.97+
10 years ago	DATE	0.97+
ChatGPT	TITLE	0.97+
second one	QUANTITY	0.96+
three people	QUANTITY	0.96+
front	EVENT	0.95+
second wave	EVENT	0.95+
January	DATE	0.95+
hundreds of millions of dollars	QUANTITY	0.95+
five years ago	DATE	0.94+
about a month ago	DATE	0.94+
tens of millions	QUANTITY	0.93+
today	DATE	0.92+
next 12 months	DATE	0.91+
LangChain	ORGANIZATION	0.91+
over a hundred percent	QUANTITY	0.91+
million dollars	QUANTITY	0.89+
millions of inferences	QUANTITY	0.89+
theCUBE	ORGANIZATION	0.88+

Jay Marshall, Neural Magic | AWS Startup Showcase S3E1

(upbeat music) >> Hello, everyone, and welcome to theCUBE's presentation of the "AWS Startup Showcase." This is season three, episode one. The focus of this episode is AI/ML: Top Startups Building Foundational Models, Infrastructure, and AI. It's great topics, super-relevant, and it's part of our ongoing coverage of startups in the AWS ecosystem. I'm your host, John Furrier, with theCUBE. Today, we're excited to be joined by Jay Marshall, VP of Business Development at Neural Magic. Jay, thanks for coming on theCUBE. >> Hey, John, thanks so much. Thanks for having us. >> We had a great CUBE conversation with you guys. This is very much about the company focuses. It's a feature presentation for the "Startup Showcase," and the machine learning at scale is the topic, but in general, it's more, (laughs) and we should call it "Machine Learning and AI: How to Get Started," because everybody is retooling their business. Companies that aren't retooling their business right now with AI first will be out of business, in my opinion. You're seeing massive shift. This is really truly the beginning of the next-gen machine learning AI trend. It's really seeing ChatGPT. Everyone sees that. That went mainstream. But this is just the beginning. This is scratching the surface of this next-generation AI with machine learning powering it, and with all the goodness of cloud, cloud scale, and how horizontally scalable it is. The resources are there. You got the Edge. Everything's perfect for AI 'cause data infrastructure's exploding in value. AI is just the applications. This is a super topic, so what do you guys see in this general area of opportunities right now in the headlines? And I'm sure you guys' phone must be ringing off the hook, metaphorically speaking, or emails and meetings and Zooms. What's going on over there at Neural Magic? >> No, absolutely, and you pretty much nailed most of it. I think that, you know, my background, we've seen for the last 20-plus years. Even just getting enterprise applications kind of built and delivered at scale, obviously, amazing things with AWS and the cloud to help accelerate that. And we just kind of figured out in the last five or so years how to do that productively and efficiently, kind of from an operations perspective. Got development and operations teams. We even came up with DevOps, right? But now, we kind of have this new kind of persona and new workload that developers have to talk to, and then it has to be deployed on those ITOps solutions. And so you pretty much nailed it. Folks are saying, "Well, how do I do this?" These big, generational models or foundational models, as we're calling them, they're great, but enterprises want to do that with their data, on their infrastructure, at scale, at the edge. So for us, yeah, we're helping enterprises accelerate that through optimizing models and then delivering them at scale in a more cost-effective fashion. >> Yeah, and I think one of the things, the benefits of OpenAI we saw, was not only is it open source, then you got also other models that are more proprietary, is that it shows the world that this is really happening, right? It's a whole nother level, and there's also new landscape kind of maps coming out. You got the generative AI, and you got the foundational models, large LLMs. Where do you guys fit into the landscape? Because you guys are in the middle of this. How do you talk to customers when they say, "I'm going down this road. I need help. I'm going to stand this up." This new AI infrastructure and applications, where do you guys fit in the landscape? >> Right, and really, the answer is both. I think today, when it comes to a lot of what for some folks would still be considered kind of cutting edge around computer vision and natural language processing, a lot of our optimization tools and our runtime are based around most of the common computer vision and natural language processing models. So your YOLOs, your BERTs, you know, your DistilBERTs and what have you, so we work to help optimize those, again, who've gotten great performance and great value for customers trying to get those into production. But when you get into the LLMs, and you mentioned some of the open source components there, our research teams have kind of been right in the trenches with those. So kind of the GPT open source equivalent being OPT, being able to actually take, you know, a multi-$100 billion parameter model and sparsify that or optimize that down, shaving away a ton of parameters, and being able to run it on smaller infrastructure. So I think the evolution here, you know, all this stuff came out in the last six months in terms of being turned loose into the wild, but we're staying in the trenches with folks so that we can help optimize those as well and not require, again, the heavy compute, the heavy cost, the heavy power consumption as those models evolve as well. So we're staying right in with everybody while they're being built, but trying to get folks into production today with things that help with business value today. >> Jay, I really appreciate you coming on theCUBE, and before we came on camera, you said you just were on a customer call. I know you got a lot of activity. What specific things are you helping enterprises solve? What kind of problems? Take us through the spectrum from the beginning, people jumping in the deep end of the pool, some people kind of coming in, starting out slow. What are the scale? Can you scope the kind of use cases and problems that are emerging that people are calling you for? >> Absolutely, so I think if I break it down to kind of, like, your startup, or I maybe call 'em AI native to kind of steal from cloud native years ago, that group, it's pretty much, you know, part and parcel for how that group already runs. So if you have a data science team and an ML engineering team, you're building models, you're training models, you're deploying models. You're seeing firsthand the expense of starting to try to do that at scale. So it's really just a pure operational efficiency play. They kind of speak natively to our tools, which we're doing in the open source. So it's really helping, again, with the optimization of the models they've built, and then, again, giving them an alternative to expensive proprietary hardware accelerators to have to run them. Now, on the enterprise side, it varies, right? You have some kind of AI native folks there that already have these teams, but you also have kind of, like, AI curious, right? Like, they want to do it, but they don't really know where to start, and so for there, we actually have an open source toolkit that can help you get into this optimization, and then again, that runtime, that inferencing runtime, purpose-built for CPUs. It allows you to not have to worry, again, about do I have a hardware accelerator available? How do I integrate that into my application stack? If I don't already know how to build this into my infrastructure, does my ITOps teams, do they know how to do this, and what does that runway look like? How do I cost for this? How do I plan for this? When it's just x86 compute, we've been doing that for a while, right? So it obviously still requires more, but at least it's a little bit more predictable. >> It's funny you mentioned AI native. You know, born in the cloud was a phrase that was out there. Now, you have startups that are born in AI companies. So I think you have this kind of cloud kind of vibe going on. You have lift and shift was a big discussion. Then you had cloud native, kind of in the cloud, kind of making it all work. Is there a existing set of things? People will throw on this hat, and then what's the difference between AI native and kind of providing it to existing stuff? 'Cause we're a lot of people take some of these tools and apply it to either existing stuff almost, and it's not really a lift and shift, but it's kind of like bolting on AI to something else, and then starting with AI first or native AI. >> Absolutely. It's a- >> How would you- >> It's a great question. I think that probably, where I'd probably pull back to kind of allow kind of retail-type scenarios where, you know, for five, seven, nine years or more even, a lot of these folks already have data science teams, you know? I mean, they've been doing this for quite some time. The difference is the introduction of these neural networks and deep learning, right? Those kinds of models are just a little bit of a paradigm shift. So, you know, I obviously was trying to be fun with the term AI native, but I think it's more folks that kind of came up in that neural network world, so it's a little bit more second nature, whereas I think for maybe some traditional data scientists starting to get into neural networks, you have the complexity there and the training overhead, and a lot of the aspects of getting a model finely tuned and hyperparameterization and all of these aspects of it. It just adds a layer of complexity that they're just not as used to dealing with. And so our goal is to help make that easy, and then of course, make it easier to run anywhere that you have just kind of standard infrastructure. >> Well, the other point I'd bring out, and I'd love to get your reaction to, is not only is that a neural network team, people who have been focused on that, but also, if you look at some of the DataOps lately, AIOps markets, a lot of data engineering, a lot of scale, folks who have been kind of, like, in that data tsunami cloud world are seeing, they kind of been in this, right? They're, like, been experiencing that. >> No doubt. I think it's funny the data lake concept, right? And you got data oceans now. Like, the metaphors just keep growing on us, but where it is valuable in terms of trying to shift the mindset, I've always kind of been a fan of some of the naming shift. I know with AWS, they always talk about purpose-built databases. And I always liked that because, you know, you don't have one database that can do everything. Even ones that say they can, like, you still have to do implementation detail differences. So sitting back and saying, "What is my use case, and then which database will I use it for?" I think it's kind of similar here. And when you're building those data teams, if you don't have folks that are doing data engineering, kind of that data harvesting, free processing, you got to do all that before a model's even going to care about it. So yeah, it's definitely a central piece of this as well, and again, whether or not you're going to be AI negative as you're making your way to kind of, you know, on that journey, you know, data's definitely a huge component of it. >> Yeah, you would have loved our Supercloud event we had. Talk about naming and, you know, around data meshes was talked about a lot. You're starting to see the control plane layers of data. I think that was the beginning of what I saw as that data infrastructure shift, to be horizontally scalable. So I have to ask you, with Neural Magic, when your customers and the people that are prospects for you guys, they're probably asking a lot of questions because I think the general thing that we see is, "How do I get started? Which GPU do I use?" I mean, there's a lot of things that are kind of, I won't say technical or targeted towards people who are living in that world, but, like, as the mainstream enterprises come in, they're going to need a playbook. What do you guys see, what do you guys offer your clients when they come in, and what do you recommend? >> Absolutely, and I think where we hook in specifically tends to be on the training side. So again, I've built a model. Now, I want to really optimize that model. And then on the runtime side when you want to deploy it, you know, we run that optimized model. And so that's where we're able to provide. We even have a labs offering in terms of being able to pair up our engineering teams with a customer's engineering teams, and we can actually help with most of that pipeline. So even if it is something where you have a dataset and you want some help in picking a model, you want some help training it, you want some help deploying that, we can actually help there as well. You know, there's also a great partner ecosystem out there, like a lot of folks even in the "Startup Showcase" here, that extend beyond into kind of your earlier comment around data engineering or downstream ITOps or the all-up MLOps umbrella. So we can absolutely engage with our labs, and then, of course, you know, again, partners, which are always kind of key to this. So you are spot on. I think what's happened with the kind of this, they talk about a hockey stick. This is almost like a flat wall now with the rate of innovation right now in this space. And so we do have a lot of folks wanting to go straight from curious to native. And so that's definitely where the partner ecosystem comes in so hard 'cause there just isn't anybody or any teams out there that, I literally do from, "Here's my blank database, and I want an API that does all the stuff," right? Like, that's a big chunk, but we can definitely help with the model to delivery piece. >> Well, you guys are obviously a featured company in this space. Talk about the expertise. A lot of companies are like, I won't say faking it till they make it. You can't really fake security. You can't really fake AI, right? So there's going to be a learning curve. They'll be a few startups who'll come out of the gate early. You guys are one of 'em. Talk about what you guys have as expertise as a company, why you're successful, and what problems do you solve for customers? >> No, appreciate that. Yeah, we actually, we love to tell the story of our founder, Nir Shavit. So he's a 20-year professor at MIT. Actually, he was doing a lot of work on kind of multicore processing before there were even physical multicores, and actually even did a stint in computational neurobiology in the 2010s, and the impetus for this whole technology, has a great talk on YouTube about it, where he talks about the fact that his work there, he kind of realized that the way neural networks encode and how they're executed by kind of ramming data layer by layer through these kind of HPC-style platforms, actually was not analogous to how the human brain actually works. So we're on one side, we're building neural networks, and we're trying to emulate neurons. We're not really executing them that way. So our team, which one of the co-founders, also an ex-MIT, that was kind of the birth of why can't we leverage this super-performance CPU platform, which has those really fat, fast caches attached to each core, and actually start to find a way to break that model down in a way that I can execute things in parallel, not having to do them sequentially? So it is a lot of amazing, like, talks and stuff that show kind of the magic, if you will, a part of the pun of Neural Magic, but that's kind of the foundational layer of all the engineering that we do here. And in terms of how we're able to bring it to reality for customers, I'll give one customer quote where it's a large retailer, and it's a people-counting application. So a very common application. And that customer's actually been able to show literally double the amount of cameras being run with the same amount of compute. So for a one-to-one perspective, two-to-one, business leaders usually like that math, right? So we're able to show pure cost savings, but even performance-wise, you know, we have some of the common models like your ResNets and your YOLOs, where we can actually even perform better than hardware-accelerated solutions. So we're trying to do, I need to just dumb it down to better, faster, cheaper, but from a commodity perspective, that's where we're accelerating. >> That's not a bad business model. Make things easier to use, faster, and reduce the steps it takes to do stuff. So, you know, that's always going to be a good market. Now, you guys have DeepSparse, which we've talked about on our CUBE conversation prior to this interview, delivers ML models through the software so the hardware allows for a decoupling, right? >> Yep. >> Which is going to drive probably a cost advantage. Also, it's also probably from a deployment standpoint it must be easier. Can you share the benefits? Is it a cost side? Is it more of a deployment? What are the benefits of the DeepSparse when you guys decouple the software from the hardware on the ML models? >> No you actually, you hit 'em both 'cause that really is primarily the value. Because ultimately, again, we're so early. And I came from this world in a prior life where I'm doing Java development, WebSphere, WebLogic, Tomcat open source, right? When we were trying to do innovation, we had innovation buckets, 'cause everybody wanted to be on the web and have their app and a browser, right? We got all the money we needed to build something and show, hey, look at the thing on the web, right? But when you had to get in production, that was the challenge. So to what you're speaking to here, in this situation, we're able to show we're just a Python package. So whether you just install it on the operating system itself, or we also have a containerized version you can drop on any container orchestration platform, so ECS or EKS on AWS. And so you get all the auto-scaling features. So when you think about that kind of a world where you have everything from real-time inferencing to kind of after hours batch processing inferencing, the fact that you can auto scale that hardware up and down and it's CPU based, so you're paying by the minute instead of maybe paying by the hour at a lower cost shelf, it does everything from pure cost to, again, I can have my standard IT team say, "Hey, here's the Kubernetes in the container," and it just runs on the infrastructure we're already managing. So yeah, operational, cost and again, and many times even performance. (audio warbles) CPUs if I want to. >> Yeah, so that's easier on the deployment too. And you don't have this kind of, you know, blank check kind of situation where you don't know what's on the backend on the cost side. >> Exactly. >> And you control the actual hardware and you can manage that supply chain. >> And keep in mind, exactly. Because the other thing that sometimes gets lost in the conversation, depending on where a customer is, some of these workloads, like, you know, you and I remember a world where even like the roundtrip to the cloud and back was a problem for folks, right? We're used to extremely low latency. And some of these workloads absolutely also adhere to that. But there's some workloads where the latency isn't as important. And we actually even provide the tuning. Now, if we're giving you five milliseconds of latency and you don't need that, you can tune that back. So less CPU, lower cost. Now, throughput and other things come into play. But that's the kind of configurability and flexibility we give for operations. >> All right, so why should I call you if I'm a customer or prospect Neural Magic, what problem do I have or when do I know I need you guys? When do I call you in and what does my environment look like? When do I know? What are some of the signals that would tell me that I need Neural Magic? >> No, absolutely. So I think in general, any neural network, you know, the process I mentioned before called sparcification, it's, you know, an optimization process that we specialize in. Any neural network, you know, can be sparcified. So I think if it's a deep-learning neural network type model. If you're trying to get AI into production, you have cost concerns even performance-wise. I certainly hate to be too generic and say, "Hey, we'll talk to everybody." But really in this world right now, if it's a neural network, it's something where you're trying to get into production, you know, we are definitely offering, you know, kind of an at-scale performant deployable solution for deep learning models. >> So neural network you would define as what? Just devices that are connected that need to know about each other? What's the state-of-the-art current definition of neural network for customers that may think they have a neural network or might not know they have a neural network architecture? What is that definition for neural network? >> That's a great question. So basically, machine learning models that fall under this kind of category, you hear about transformers a lot, or I mentioned about YOLO, the YOLO family of computer vision models, or natural language processing models like BERT. If you have a data science team or even developers, some even regular, I used to call myself a nine to five developer 'cause I worked in the enterprise, right? So like, hey, we found a new open source framework, you know, I used to use Spring back in the day and I had to go figure it out. There's developers that are pulling these models down and they're figuring out how to get 'em into production, okay? So I think all of those kinds of situations, you know, if it's a machine learning model of the deep learning variety that's, you know, really specifically where we shine. >> Okay, so let me pretend I'm a customer for a minute. I have all these videos, like all these transcripts, I have all these people that we've interviewed, CUBE alumnis, and I say to my team, "Let's AI-ify, sparcify theCUBE." >> Yep. >> What do I do? I mean, do I just like, my developers got to get involved and they're going to be like, "Well, how do I upload it to the cloud? Do I use a GPU?" So there's a thought process. And I think a lot of companies are going through that example of let's get on this AI, how can it help our business? >> Absolutely. >> What does that progression look like? Take me through that example. I mean, I made up theCUBE example up, but we do have a lot of data. We have large data models and we have people and connect to the internet and so we kind of seem like there's a neural network. I think every company might have a neural network in place. >> Well, and I was going to say, I think in general, you all probably do represent even the standard enterprise more than most. 'Cause even the enterprise is going to have a ton of video content, a ton of text content. So I think it's a great example. So I think that that kind of sea or I'll even go ahead and use that term data lake again, of data that you have, you're probably going to want to be setting up kind of machine learning pipelines that are going to be doing all of the pre-processing from kind of the raw data to kind of prepare it into the format that say a YOLO would actually use or let's say BERT for natural language processing. So you have all these transcripts, right? So we would do a pre-processing path where we would create that into the file format that BERT, the machine learning model would know how to train off of. So that's kind of all the pre-processing steps. And then for training itself, we actually enable what's called sparse transfer learning. So that's transfer learning is a very popular method of doing training with existing models. So we would be able to retrain that BERT model with your transcript data that we have now done the pre-processing with to get it into the proper format. And now we have a BERT natural language processing model that's been trained on your data. And now we can deploy that onto DeepSparse runtime so that now you can ask that model whatever questions, or I should say pass, you're not going to ask it those kinds of questions ChatGPT, although we can do that too. But you're going to pass text through the BERT model and it's going to give you answers back. It could be things like sentiment analysis or text classification. You just call the model, and now when you pass text through it, you get the answers better, faster or cheaper. I'll use that reference again. >> Okay, we can create a CUBE bot to give us questions on the fly from the the AI bot, you know, from our previous guests. >> Well, and I will tell you using that as an example. So I had mentioned OPT before, kind of the open source version of ChatGPT. So, you know, typically that requires multiple GPUs to run. So our research team, I may have mentioned earlier, we've been able to sparcify that over 50% already and run it on only a single GPU. And so in that situation, you could train OPT with that corpus of data and do exactly what you say. Actually we could use Alexa, we could use Alexa to actually respond back with voice. How about that? We'll do an API call and we'll actually have an interactive Alexa-enabled bot. >> Okay, we're going to be a customer, let's put it on the list. But this is a great example of what you guys call software delivered AI, a topic we chatted about on theCUBE conversation. This really means this is a developer opportunity. This really is the convergence of the data growth, the restructuring, how data is going to be horizontally scalable, meets developers. So this is an AI developer model going on right now, which is kind of unique. >> It is, John, I will tell you what's interesting. And again, folks don't always think of it this way, you know, the AI magical goodness is now getting pushed in the middle where the developers and IT are operating. And so it again, that paradigm, although for some folks seem obvious, again, if you've been around for 20 years, that whole all that plumbing is a thing, right? And so what we basically help with is when you deploy the DeepSparse runtime, we have a very rich API footprint. And so the developers can call the API, ITOps can run it, or to your point, it's developer friendly enough that you could actually deploy our off-the-shelf models. We have something called the SparseZoo where we actually publish pre-optimized or pre-sparcified models. And so developers could literally grab those right off the shelf with the training they've already had and just put 'em right into their applications and deploy them as containers. So yeah, we enable that for sure as well. >> It's interesting, DevOps was infrastructure as code and we had a last season, a series on data as code, which we kind of coined. This is data as code. This is a whole nother level of opportunity where developers just want to have programmable data and apps with AI. This is a whole new- >> Absolutely. >> Well, absolutely great, great stuff. Our news team at SiliconANGLE and theCUBE said you guys had a little bit of a launch announcement you wanted to make here on the "AWS Startup Showcase." So Jay, you have something that you want to launch here? >> Yes, and thank you John for teeing me up. So I'm going to try to put this in like, you know, the vein of like an AWS, like main stage keynote launch, okay? So we're going to try this out. So, you know, a lot of our product has obviously been built on top of x86. I've been sharing that the past 15 minutes or so. And with that, you know, we're seeing a lot of acceleration for folks wanting to run on commodity infrastructure. But we've had customers and prospects and partners tell us that, you know, ARM and all of its kind of variance are very compelling, both cost performance-wise and also obviously with Edge. And wanted to know if there was anything we could do from a runtime perspective with ARM. And so we got the work and, you know, it's a hard problem to solve 'cause the instructions set for ARM is very different than the instruction set for x86, and our deep tensor column technology has to be able to work with that lower level instruction spec. But working really hard, the engineering team's been at it and we are happy to announce here at the "AWS Startup Showcase," that DeepSparse inference now has, or inference runtime now has support for AWS Graviton instances. So it's no longer just x86, it is also ARM and that obviously also opens up the door to Edge and further out the stack so that optimize once run anywhere, we're not going to open up. So it is an early access. So if you go to neuralmagic.com/graviton, you can sign up for early access, but we're excited to now get into the ARM side of the fence as well on top of Graviton. >> That's awesome. Our news team is going to jump on that news. We'll get it right up. We get a little scoop here on the "Startup Showcase." Jay Marshall, great job. That really highlights the flexibility that you guys have when you decouple the software from the hardware. And again, we're seeing open source driving a lot more in AI ops now with with machine learning and AI. So to me, that makes a lot of sense. And congratulations on that announcement. Final minute or so we have left, give a summary of what you guys are all about. Put a plug in for the company, what you guys are looking to do. I'm sure you're probably hiring like crazy. Take the last few minutes to give a plug for the company and give a summary. >> No, I appreciate that so much. So yeah, joining us out neuralmagic.com, you know, part of what we didn't spend a lot of time here, our optimization tools, we are doing all of that in the open source. It's called SparseML and I mentioned SparseZoo briefly. So we really want the data scientists community and ML engineering community to join us out there. And again, the DeepSparse runtime, it's actually free to use for trial purposes and for personal use. So you can actually run all this on your own laptop or on an AWS instance of your choice. We are now live in the AWS marketplace. So push button, deploy, come try us out and reach out to us on neuralmagic.com. And again, sign up for the Graviton early access. >> All right, Jay Marshall, Vice President of Business Development Neural Magic here, talking about performant, cost effective machine learning at scale. This is season three, episode one, focusing on foundational models as far as building data infrastructure and AI, AI native. I'm John Furrier with theCUBE. Thanks for watching. (bright upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the "AWS Startup Showcase." Thanks for having us. and the machine learning and the cloud to help accelerate that. and you got the foundational So kind of the GPT open deep end of the pool, that group, it's pretty much, you know, So I think you have this kind It's a- and a lot of the aspects of and I'd love to get your reaction to, And I always liked that because, you know, that are prospects for you guys, and you want some help in picking a model, Talk about what you guys have that show kind of the magic, if you will, and reduce the steps it takes to do stuff. when you guys decouple the the fact that you can auto And you don't have this kind of, you know, the actual hardware and you and you don't need that, neural network, you know, of situations, you know, CUBE alumnis, and I say to my team, and they're going to be like, and connect to the internet and it's going to give you answers back. you know, from our previous guests. and do exactly what you say. of what you guys call enough that you could actually and we had a last season, that you want to launch here? And so we got the work and, you know, flexibility that you guys have So you can actually run Vice President of Business

ENTITIES

Entity	Category	Confidence
Jay	PERSON	0.99+
Jay Marshall	PERSON	0.99+
John Furrier	PERSON	0.99+
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Nir Shavit	PERSON	0.99+
20-year	QUANTITY	0.99+
Alexa	TITLE	0.99+
2010s	DATE	0.99+
seven	QUANTITY	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
each core	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Java	TITLE	0.99+
YouTube	ORGANIZATION	0.99+
Today	DATE	0.99+
nine years	QUANTITY	0.98+
both	QUANTITY	0.98+
BERT	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
ChatGPT	TITLE	0.98+
20 years	QUANTITY	0.98+
over 50%	QUANTITY	0.97+
second nature	QUANTITY	0.96+
today	DATE	0.96+
ARM	ORGANIZATION	0.96+
one	QUANTITY	0.95+
DeepSparse	TITLE	0.94+
neuralmagic.com/graviton	OTHER	0.94+
SiliconANGLE	ORGANIZATION	0.94+
WebSphere	TITLE	0.94+
nine	QUANTITY	0.94+
first	QUANTITY	0.93+
Startup Showcase	EVENT	0.93+
five milliseconds	QUANTITY	0.92+
AWS Startup Showcase	EVENT	0.91+
two	QUANTITY	0.9+
YOLO	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+
OPT	TITLE	0.88+
last six months	DATE	0.88+
season three	QUANTITY	0.86+
double	QUANTITY	0.86+
one customer	QUANTITY	0.86+
Supercloud	EVENT	0.86+
one side	QUANTITY	0.85+
Vice	PERSON	0.85+
x86	OTHER	0.83+
AI/ML: Top Startups Building Foundational Models	TITLE	0.82+
ECS	TITLE	0.81+
$100 billion	QUANTITY	0.81+
DevOps	TITLE	0.81+
WebLogic	TITLE	0.8+
EKS	TITLE	0.8+
a minute	QUANTITY	0.8+
neuralmagic.com	OTHER	0.79+

Steven Hillion & Jeff Fletcher, Astronomer | AWS Startup Showcase S3E1

(upbeat music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase AI/ML Top Startups Building Foundation Model Infrastructure. This is season three, episode one of our ongoing series covering exciting startups from the AWS ecosystem to talk about data and analytics. I'm your host, Lisa Martin and today we're excited to be joined by two guests from Astronomer. Steven Hillion joins us, it's Chief Data Officer and Jeff Fletcher, it's director of ML. They're here to talk about machine learning and data orchestration. Guys, thank you so much for joining us today. >> Thank you. >> It's great to be here. >> Before we get into machine learning let's give the audience an overview of Astronomer. Talk about what that is, Steven. Talk about what you mean by data orchestration. >> Yeah, let's start with Astronomer. We're the Airflow company basically. The commercial developer behind the open-source project, Apache Airflow. I don't know if you've heard of Airflow. It's sort of de-facto standard these days for orchestrating data pipelines, data engineering pipelines, and as we'll talk about later, machine learning pipelines. It's really is the de-facto standard. I think we're up to about 12 million downloads a month. That's actually as a open-source project. I think at this point it's more popular by some measures than Slack. Airflow was created by Airbnb some years ago to manage all of their data pipelines and manage all of their workflows and now it powers the data ecosystem for organizations as diverse as Electronic Arts, Conde Nast is one of our big customers, a big user of Airflow. And also not to mention the biggest banks on Wall Street use Airflow and Astronomer to power the flow of data throughout their organizations. >> Talk about that a little bit more, Steven, in terms of the business impact. You mentioned some great customer names there. What is the business impact or outcomes that a data orchestration strategy enables businesses to achieve? >> Yeah, I mean, at the heart of it is quite simply, scheduling and managing data pipelines. And so if you have some enormous retailer who's managing the flow of information throughout their organization they may literally have thousands or even tens of thousands of data pipelines that need to execute every day to do things as simple as delivering metrics for the executives to consume at the end of the day, to producing on a weekly basis new machine learning models that can be used to drive product recommendations. One of our customers, for example, is a British food delivery service. And you get those recommendations in your application that says, "Well, maybe you want to have samosas with your curry." That sort of thing is powered by machine learning models that they train on a regular basis to reflect changing conditions in the market. And those are produced through Airflow and through the Astronomer platform, which is essentially a managed platform for running airflow. So at its simplest it really is just scheduling and managing those workflows. But that's easier said than done of course. I mean if you have 10 thousands of those things then you need to make sure that they all run that they all have sufficient compute resources. If things fail, how do you track those down across those 10,000 workflows? How easy is it for an average data scientist or data engineer to contribute their code, their Python notebooks or their SQL code into a production environment? And then you've got reproducibility, governance, auditing, like managing data flows across an organization which we think of as orchestrating them is much more than just scheduling. It becomes really complicated pretty quickly. >> I imagine there's a fair amount of complexity there. Jeff, let's bring you into the conversation. Talk a little bit about Astronomer through your lens, data orchestration and how it applies to MLOps. >> So I come from a machine learning background and for me the interesting part is that machine learning requires the expansion into orchestration. A lot of the same things that you're using to go and develop and build pipelines in a standard data orchestration space applies equally well in a machine learning orchestration space. What you're doing is you're moving data between different locations, between different tools, and then tasking different types of tools to act on that data. So extending it made logical sense from a implementation perspective. And a lot of my focus at Astronomer is really to explain how Airflow can be used well in a machine learning context. It is being used well, it is being used a lot by the customers that we have and also by users of the open source version. But it's really being able to explain to people why it's a natural extension for it and how well it fits into that. And a lot of it is also extending some of the infrastructure capabilities that Astronomer provides to those customers for them to be able to run some of the more platform specific requirements that come with doing machine learning pipelines. >> Let's get into some of the things that make Astronomer unique. Jeff, sticking with you, when you're in customer conversations, what are some of the key differentiators that you articulate to customers? >> So a lot of it is that we are not specific to one cloud provider. So we have the ability to operate across all of the big cloud providers. I know, I'm certain we have the best developers that understand how best practices implementations for data orchestration works. So we spend a lot of time talking to not just the business outcomes and the business users of the product, but also also for the technical people, how to help them better implement things that they may have come across on a Stack Overflow article or not necessarily just grown with how the product has migrated. So it's the ability to run it wherever you need to run it and also our ability to help you, the customer, better implement and understand those workflows that I think are two of the primary differentiators that we have. >> Lisa: Got it. >> I'll add another one if you don't mind. >> You can go ahead, Steven. >> Is lineage and dependencies between workflows. One thing we've done is to augment core Airflow with Lineage services. So using the Open Lineage framework, another open source framework for tracking datasets as they move from one workflow to another one, team to another, one data source to another is a really key component of what we do and we bundle that within the service so that as a developer or as a production engineer, you really don't have to worry about lineage, it just happens. Jeff, may show us some of this later that you can actually see as data flows from source through to a data warehouse out through a Python notebook to produce a predictive model or a dashboard. Can you see how those data products relate to each other? And when something goes wrong, figure out what upstream maybe caused the problem, or if you're about to change something, figure out what the impact is going to be on the rest of the organization. So Lineage is a big deal for us. >> Got it. >> And just to add on to that, the other thing to think about is that traditional Airflow is actually a complicated implementation. It required quite a lot of time spent understanding or was almost a bespoke language that you needed to be able to develop in two write these DAGs, which is like fundamental pipelines. So part of what we are focusing on is tooling that makes it more accessible to say a data analyst or a data scientist who doesn't have or really needs to gain the necessary background in how the semantics of Airflow DAGs works to still be able to get the benefit of what Airflow can do. So there is new features and capabilities built into the astronomer cloud platform that effectively obfuscates and removes the need to understand some of the deep work that goes on. But you can still do it, you still have that capability, but we are expanding it to be able to have orchestrated and repeatable processes accessible to more teams within the business. >> In terms of accessibility to more teams in the business. You talked about data scientists, data analysts, developers. Steven, I want to talk to you, as the chief data officer, are you having more and more conversations with that role and how is it emerging and evolving within your customer base? >> Hmm. That's a good question, and it is evolving because I think if you look historically at the way that Airflow has been used it's often from the ground up. You have individual data engineers or maybe single data engineering teams who adopt Airflow 'cause it's very popular. Lots of people know how to use it and they bring it into an organization and say, "Hey, let's use this to run our data pipelines." But then increasingly as you turn from pure workflow management and job scheduling to the larger topic of orchestration you realize it gets pretty complicated, you want to have coordination across teams, and you want to have standardization for the way that you manage your data pipelines. And so having a managed service for Airflow that exists in the cloud is easy to spin up as you expand usage across the organization. And thinking long term about that in the context of orchestration that's where I think the chief data officer or the head of analytics tends to get involved because they really want to think of this as a strategic investment that they're making. Not just per team individual Airflow deployments, but a network of data orchestrators. >> That network is key. Every company these days has to be a data company. We talk about companies being data driven. It's a common word, but it's true. It's whether it is a grocer or a bank or a hospital, they've got to be data companies. So talk to me a little bit about Astronomer's business model. How is this available? How do customers get their hands on it? >> Jeff, go ahead. >> Yeah, yeah. So we have a managed cloud service and we have two modes of operation. One, you can bring your own cloud infrastructure. So you can say here is an account in say, AWS or Azure and we can go and deploy the necessary infrastructure into that, or alternatively we can host everything for you. So it becomes a full SaaS offering. But we then provide a platform that connects at the backend to your internal IDP process. So however you are authenticating users to make sure that the correct people are accessing the services that they need with role-based access control. From there we are deploying through Kubernetes, the different services and capabilities into either your cloud account or into an account that we host. And from there Airflow does what Airflow does, which is its ability to then reach to different data systems and data platforms and to then run the orchestration. We make sure we do it securely, we have all the necessary compliance certifications required for GDPR in Europe and HIPAA based out of the US, and a whole bunch host of others. So it is a secure platform that can run in a place that you need it to run, but it is a managed Airflow that includes a lot of the extra capabilities like the cloud developer environment and the open lineage services to enhance the overall airflow experience. >> Enhance the overall experience. So Steven, going back to you, if I'm a Conde Nast or another organization, what are some of the key business outcomes that I can expect? As one of the things I think we've learned during the pandemic is access to realtime data is no longer a nice to have for organizations. It's really an imperative. It's that demanding consumer that wants to have that personalized, customized, instant access to a product or a service. So if I'm a Conde Nast or I'm one of your customers, what can I expect my business to be able to achieve as a result of data orchestration? >> Yeah, I think in a nutshell it's about providing a reliable, scalable, and easy to use service for developing and running data workflows. And talking of demanding customers, I mean, I'm actually a customer myself, as you mentioned, I'm the head of data for Astronomer. You won't be surprised to hear that we actually use Astronomer and Airflow to run all of our data pipelines. And so I can actually talk about my experience. When I started I was of course familiar with Airflow, but it always seemed a little bit unapproachable to me if I was introducing that to a new team of data scientists. They don't necessarily want to have to think about learning something new. But I think because of the layers that Astronomer has provided with our Astro service around Airflow it was pretty easy for me to get up and running. Of course I've got an incentive for doing that. I work for the Airflow company, but we went from about, at the beginning of last year, about 500 data tasks that we were running on a daily basis to about 15,000 every day. We run something like a million data operations every month within my team. And so as one outcome, just the ability to spin up new production workflows essentially in a single day you go from an idea in the morning to a new dashboard or a new model in the afternoon, that's really the business outcome is just removing that friction to operationalizing your machine learning and data workflows. >> And I imagine too, oh, go ahead, Jeff. >> Yeah, I think to add to that, one of the things that becomes part of the business cycle is a repeatable capabilities for things like reporting, for things like new machine learning models. And the impediment that has existed is that it's difficult to take that from a team that's an analyst team who then provide that or a data science team that then provide that to the data engineering team who have to work the workflow all the way through. What we're trying to unlock is the ability for those teams to directly get access to scheduling and orchestrating capabilities so that a business analyst can have a new report for C-suite execs that needs to be done once a week, but the time to repeatability for that report is much shorter. So it is then immediately in the hands of the person that needs to see it. It doesn't have to go into a long list of to-dos for a data engineering team that's already overworked that they eventually get it to it in a month's time. So that is also a part of it is that the realizing, orchestration I think is fairly well and a lot of people get the benefit of being able to orchestrate things within a business, but it's having more people be able to do it and shorten the time that that repeatability is there is one of the main benefits from good managed orchestration. >> So a lot of workforce productivity improvements in what you're doing to simplify things, giving more people access to data to be able to make those faster decisions, which ultimately helps the end user on the other end to get that product or the service that they're expecting like that. Jeff, I understand you have a demo that you can share so we can kind of dig into this. >> Yeah, let me take you through a quick look of how the whole thing works. So our starting point is our cloud infrastructure. This is the login. You go to the portal. You can see there's a a bunch of workspaces that are available. Workspaces are like individual places for people to operate in. I'm not going to delve into all the deep technical details here, but starting point for a lot of our data science customers is we have what we call our Cloud IDE, which is a web-based development environment for writing and building out DAGs without actually having to know how the underpinnings of Airflow work. This is an internal one, something that we use. You have a notebook-like interface that lets you write python code and SQL code and a bunch of specific bespoke type of blocks if you want. They all get pulled together and create a workflow. So this is a workflow, which gets compiled to something that looks like a complicated set of Python code, which is the DAG. I then have a CICD process pipeline where I commit this through to my GitHub repo. So this comes to a repo here, which is where these DAGs that I created in the previous step exist. I can then go and say, all right, I want to see how those particular DAGs have been running. We then get to the actual Airflow part. So this is the managed Airflow component. So we add the ability for teams to fairly easily bring up an Airflow instance and write code inside our notebook-like environment to get it into that instance. So you can see it's been running. That same process that we built here that graph ends up here inside this, but you don't need to know how the fundamentals of Airflow work in order to get this going. Then we can run one of these, it runs in the background and we can manage how it goes. And from there, every time this runs, it's emitting to a process underneath, which is the open lineage service, which is the lineage integration that allows me to come in here and have a look and see this was that actual, that same graph that we built, but now it's the historic version. So I know where things started, where things are going, and how it ran. And then I can also do a comparison. So if I want to see how this particular run worked compared to one historically, I can grab one from a previous date and it will show me the comparison between the two. So that combination of managed Airflow, getting Airflow up and running very quickly, but the Cloud IDE that lets you write code and know how to get something into a repeatable format get that into Airflow and have that attached to the lineage process adds what is a complete end-to-end orchestration process for any business looking to get the benefit from orchestration. >> Outstanding. Thank you so much Jeff for digging into that. So one of my last questions, Steven is for you. This is exciting. There's a lot that you guys are enabling organizations to achieve here to really become data-driven companies. So where can folks go to get their hands on this? >> Yeah, just go to astronomer.io and we have plenty of resources. If you're new to Airflow, you can read our documentation, our guides to getting started. We have a CLI that you can download that is really I think the easiest way to get started with Airflow. But you can actually sign up for a trial. You can sign up for a guided trial where our teams, we have a team of experts, really the world experts on getting Airflow up and running. And they'll take you through that trial and allow you to actually kick the tires and see how this works with your data. And I think you'll see pretty quickly that it's very easy to get started with Airflow, whether you're doing that from the command line or doing that in our cloud service. And all of that is available on our website >> astronomer.io. Jeff, last question for you. What are you excited about? There's so much going on here. What are some of the things, maybe you can give us a sneak peek coming down the road here that prospects and existing customers should be excited about? >> I think a lot of the development around the data awareness components, so one of the things that's traditionally been complicated with orchestration is you leave your data in the place that you're operating on and we're starting to have more data processing capability being built into Airflow. And from a Astronomer perspective, we are adding more capabilities around working with larger datasets, doing bigger data manipulation with inside the Airflow process itself. And that lends itself to better machine learning implementation. So as we start to grow and as we start to get better in the machine learning context, well, in the data awareness context, it unlocks a lot more capability to do and implement proper machine learning pipelines. >> Awesome guys. Exciting stuff. Thank you so much for talking to me about Astronomer, machine learning, data orchestration, and really the value in it for your customers. Steve and Jeff, we appreciate your time. >> Thank you. >> My pleasure, thanks. >> And we thank you for watching. This is season three, episode one of our ongoing series covering exciting startups from the AWS ecosystem. I'm your host, Lisa Martin. You're watching theCUBE, the leader in live tech coverage. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startup Showcase let's give the audience and now it powers the data ecosystem What is the business impact or outcomes for the executives to consume how it applies to MLOps. and for me the interesting that you articulate to customers? So it's the ability to run it if you don't mind. that you can actually see as data flows the other thing to think about to more teams in the business. about that in the context of orchestration So talk to me a little bit at the backend to your So Steven, going back to you, just the ability to spin up but the time to repeatability a demo that you can share that allows me to come There's a lot that you guys We have a CLI that you can download What are some of the things, in the place that you're operating on and really the value in And we thank you for watching.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Jeff Fletcher	PERSON	0.99+
Steven	PERSON	0.99+
Steve	PERSON	0.99+
Steven Hillion	PERSON	0.99+
Lisa	PERSON	0.99+
Europe	LOCATION	0.99+
Conde Nast	ORGANIZATION	0.99+
US	LOCATION	0.99+
thousands	QUANTITY	0.99+
two	QUANTITY	0.99+
HIPAA	TITLE	0.99+
AWS	ORGANIZATION	0.99+
two guests	QUANTITY	0.99+
Airflow	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
10 thousands	QUANTITY	0.99+
One	QUANTITY	0.99+
Electronic Arts	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Python	TITLE	0.99+
two modes	QUANTITY	0.99+
Airflow	TITLE	0.98+
10,000 workflows	QUANTITY	0.98+
about 500 data tasks	QUANTITY	0.98+
today	DATE	0.98+
one outcome	QUANTITY	0.98+
tens of thousands	QUANTITY	0.98+
GDPR	TITLE	0.97+
SQL	TITLE	0.97+
GitHub	ORGANIZATION	0.96+
astronomer.io	OTHER	0.94+
Slack	ORGANIZATION	0.94+
Astronomer	ORGANIZATION	0.94+
some years ago	DATE	0.92+
once a week	QUANTITY	0.92+
Astronomer	TITLE	0.92+
theCUBE	ORGANIZATION	0.92+
last year	DATE	0.91+
Kubernetes	TITLE	0.88+
single day	QUANTITY	0.87+
about 15,000 every day	QUANTITY	0.87+
one cloud	QUANTITY	0.86+
IDE	TITLE	0.86+

Robert Nishihara, Anyscale | AWS Startup Showcase S3 E1

(upbeat music) >> Hello everyone. Welcome to theCube's presentation of the "AWS Startup Showcase." The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three, episode one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. I'm excited I'm joined today by Robert Nishihara, who's the co-founder and CEO of a hot startup called Anyscale. He's here to talk about Ray, the open source project, Anyscale's infrastructure for foundation as well. Robert, thank you for joining us today. >> Yeah, thanks so much as well. >> I've been following your company since the founding pre pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with ChatGPT and OpenAI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream, so I think you guys are really well positioned. I'm looking forward to to talking with you today. But before we get into it, introduce the core mission for Anyscale. Why do you guys exist? What is the North Star for Anyscale? >> Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. You know, I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or you know, these companies getting a lot of value from AI, were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the Cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in, you know, in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that, or don't make it to production, is the need for this scale, the infrastructure lift, to actually make it happen. So our goal here with Anyscale and Ray, is to make that easy, is to make scalable computing easy. So that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like, all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. >> That programming example of how easy it is with Python reminds me of the early days of Cloud, when infrastructure as code was talked about was, it was just code the infrastructure programmable. That's super important. That's what AI people wanted, first program AI. That's the new trend. And I want to understand, if you don't mind explaining, the relationship that Anyscale has to these foundational models and particular the large language models, also called LLMs, was seen with like OpenAI and ChatGPT. Before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? >> Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then, of course, you know, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them. Okay, so Ray and Anyscale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models. Or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune, that, you know, don't want to train the large underlying foundation models, but that do want to fine tune them, do want to adapt them to their purposes, and build products around them and serve them, those are also using Ray and Anyscale for that fine tuning and that serving. And so the reason that Ray and Anyscale are important here is that, you know, building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources. And to actually take advantage of that and actually build these scalable applications, there's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and Anyscale to take care of that and manage the infrastructure and solve those infrastructure problems. Or you can build the infrastructure and manage the infrastructure yourself, which you can do, but it's going to slow your team down. It's going to, you know, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. They want to focus on product development and move faster. >> I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point, doing it yourself, hard to do. These are things where opportunities are and the Cloud did that with data centers. Turned a data center and made it an API. The heavy lifting went away and went to the Cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening that you guys are taking the learnings and making that available so people don't have to do that? >> That's exactly right. So today, if you want to succeed with AI, if you want to use AI in your business, infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point, and you know, with Ray and Anyscale, we're going to remove the infrastructure from the critical path so that as a developer or as a business, all you need to focus on is your application logic, what you want the the program to do, what you want your application to do, how you want the AI to actually interface with the rest of your product. Now the way that will happen is that Ray and Anyscale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray in Anyscale. And so I think something like this is really necessary for AI to reach its potential, for AI to have the impact and the reach that we think it will, you have to make it easier to do. >> And just for clarification to point out, if you don't mind explaining the relationship of Ray and Anyscale real quick just before we get into the presentation. >> So Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that, in order to provide an easy, a simple open source tool for building and running scalable applications. And Anyscale is the managed version of Ray, basically we will run Ray for you in the Cloud, provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. >> Awesome. I know you got a presentation on Ray and Anyscale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away and then when you're done presenting, we'll come back, I'll probably grill you with a few questions and then we'll close it out so take it away. >> Robert: Sounds great. So I'll say a little bit about how companies are using Ray and Anyscale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation, the underlying trend here, and this is a plot from OpenAI, is that the amount of compute needed to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers. But the point is, no matter how you slice and dice it, it' a astronomical rate. Now if you compare that to something we're all familiar with, like Moore's Law, which says that, you know, the processor performance doubles every roughly 18 months, you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications, and what you can do with a single chip, right. So even if Moore's Law were continuing strong and you know, doing what it used to be doing, even if that were the case, there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph, what we've seen, and what has been clear to us since we started this company, is that doing AI requires scaling. There's no way around it. It's not a nice to have, it's really a requirement. And so that led us to start Ray, which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project, it's been adopted by a tremendous number of companies. Companies like OpenAI, which use Ray to train their large models like ChatGPT, companies like Uber, which run all of their deep learning and classical machine learning on top of Ray, companies like Shopify or Spotify or Instacart or Lyft or Netflix, ByteDance, which use Ray for their machine learning infrastructure. Companies like Ant Group, which makes Alipay, you know, they use Ray across the board for fraud detection, for online learning, for detecting money laundering, you know, for graph processing, stream processing. Companies like Amazon, you know, run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since, over the past few years. And one of the most exciting use cases is really providing the infrastructure for building training, fine tuning, and serving foundation models. So I'll say a little bit about, you know, here are some examples of companies using Ray for foundation models. Cohere trains large language models. OpenAI also trains large language models. You can think about the workloads required there are things like supervised pre-training, also reinforcement learning from human feedback. So this is not only the regular supervised learning, but actually more complex reinforcement learning workloads that take human input about what response to a particular question, you know is better than a certain other response. And incorporating that into the learning. There's open source versions as well, like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects in organizations, training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level, there's the core Ray system. This is essentially low level primitives for building scalable Python applications. Things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system, what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries, scalable libraries for ingesting and pre-processing data, for training your models, for fine tuning those models, for hyper parameter tuning, for doing batch processing and batch inference, for doing model serving and deployment, right. And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right. They want to load their data and feed that into training. And Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray, the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature. The fact that it is common infrastructure for scaling arbitrary workloads, from data ingest to pre-processing to training to inference and serving, right. This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities. If they want to do reinforcement learning, if they want to use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray, being future proof and being flexible and general gives them that ability. Another reason people choose Ray in Anyscale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray, you know, making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale, you know, training to scale data ingest, pre-processing and so on. So scalability and performance, you know, are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any Cloud provider. Google, you know, Google Cloud, AWS, Asure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or JAX or XG Boost or Hugging Face or PyTorch Lightning, right, or Scikit-learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tooling in the machine learning ecosystem. That's things like weights and biases or ML flow, right. Or you know, different data platforms like Databricks, you know, Delta Lake or Snowflake or tools for model monitoring for feature stores, all of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then Anyscale is the scalable compute platform that's built on top, you know, that provides Ray. So Anyscale is a managed Ray service that runs in the Cloud. And what Anyscale does is it offers the best way to run Ray. And if you think about what you get with Anyscale, there are fundamentally two things. One is about moving faster, accelerating the time to market. And you get that by having the managed service so that as a developer you don't have to worry about managing infrastructure, you don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows. Things like easily moving from development to production, things like having the observability tooling, the debug ability to actually easily diagnose what's going wrong in a distributed application. So things like the dashboards and the other other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose Anyscale is superior infrastructure. So this is things like, you know, cost deficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling. Things like just overall better performance and faster scheduling. And so these are the kinds of things that Anyscale provides on top of Ray. It's the managed infrastructure. It's fast, it's like the developer productivity and velocity as well as performance. So this is what I wanted to share about Ray in Anyscale. >> John: Awesome. >> Provide that context. But John, I'm curious what you think. >> I love it. I love the, so first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an Anyscale platform, not- >> That's right. >> Tools. So you got tools in the platform. Okay, that's key. Love that managed service. Just curious, you mentioned Python multiple times, is that because of PyTorch and TensorFlow or Python's the most friendly with machine learning or it's because it's very common amongst all developers? >> That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, Ray is actually designed in a language agnostic way and there are companies out there that use Ray to build scalable Java applications. But for the most part right now we're focused on Python and being the best way to build these scalable Python and machine learning applications. But, of course, down the road there always is that potential. >> So if you're slinging Python code out there and you're watching that, you're watching this video, get on Anyscale bus quickly. Also, I just, while you were giving the presentation, I couldn't help, since you mentioned OpenAI, which by the way, congratulations 'cause they've had great scale, I've noticed in their rapid growth 'cause they were the fastest company to the number of users than anyone in the history of the computer industry, so major successor, OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning, so congratulations. But I actually typed into ChatGPT, what are the top three benefits of Anyscale and came up with scalability, flexibility, and ease of use. Obviously, scalability is what you guys are called. >> That's pretty good. >> So that's what they came up with. So they nailed it. Did you have an inside prompt training, buy it there? Only kidding. (Robert laughs) >> Yeah, we hard coded that one. >> But that's the kind of thing that came up really, really quickly if I asked it to write a sales document, it probably will, but this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer, to be more human, more natural. And this is clearly will be in every application in the future. >> Absolutely. This is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just a chat bot that you talk to. This is going to be how you get things done, right. How you use your web browser or how you use, you know, how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. >> This is going to be one of those things, we're going to look back at this time Robert and saying, "Yeah, from that company, that was the beginning of that wave." And just like AWS and Cloud Computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing and that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those, Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, "Okay, I love it. Let's go play around." But what's it going to cost me? Am I going to be tied to certain GPUs? What's the landscape look like from an operational standpoint, from the customer? Are they locked in and the benefit was flexibility, are you flexible to handle any Cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? >> Cost is super important here and many of the companies, I mean, companies are spending a huge amount on their Cloud computing, on AWS, and on doing AI, right. And I think a lot of the advantage of Anyscale, what we can provide here is not only better performance, but cost efficiency. Because if we can run something faster and more efficiently, it can also use less resources and you can lower your Cloud spending, right. We've seen companies go from, you know, 20% GPU utilization with their current setup and the current tools they're using to running on Anyscale and getting more like 95, you know, 100% GPU utilization. That's something like a five x improvement right there. So depending on the kind of application you're running, you know, it's a significant cost savings. We've seen companies that have, you know, processing petabytes of data every single day with Ray going from, you know, getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And when you have applications that are spending, you know, potentially $100 million a year and getting a 10 X cost savings is just absolutely enormous. So these are some of the kinds of- >> Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the Cloud, you got infrastructure, you got the platform, you got SaaS, same kind of thing's going to go on in AI. So I want to get into that, you know, ROI discussion and some of the impact with your customers that are leveraging the platform. But first I hear you got a demo. >> Robert: Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the Anyscale UI. I've started a little Anyscale Workspace. So Workspaces are the Anyscale concept for interactive developments, right. So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on Anyscale. And I'm just going to kick this off. This is going to train a large language model, so OPT. And it's doing this on 32 GPUs. We've got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change when I launch the Workspace. And what I can do is I can pull up VS code, right. Remember this is the interactive development experience. I can look at the actual code. Here it's using Ray train to train the torch model. We've got the training loop and we're saying that each worker gets access to one GPU and four CPU cores. And, of course, as I make the model larger, this is using deep speed, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right. And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or a different, you know, accelerator type, again, this is just a one line change. And here we're using Ray train to train the models, just taking my vanilla PyTorch model using Hugging Face and then scaling that across a bunch of GPUs. And, of course, if I want to look at the dashboard, I can go to the Ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at, you know, the CPU utilization here where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about Anyscale, both I can get that interactive development experience with VS code. You know, I can look at the dashboards. I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can, with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, you know, being able to take advantage of all the resources in the Cloud to scale. And it's like when, you know, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if they're idle. But imagine you go to sleep, I have this big cluster. You can turn it off, shut off the cluster, come back tomorrow, restart the Workspace, and you know, your big cluster is back up and all of your code changes are still there. All of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience we want to provide for our users. So that's what I wanted to share with you. >> Well, I think that whole, couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean human error is a big deal. People pass out at their computer. They've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on or your water running in your house. It's just, at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle. But you know, data cranking the models is doing, that's a big point. >> Another thing I want to add there about cost efficiency is that we make it really easy to use, if you're running on Anyscale, to use spot instances and these preemptable instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using Anyscale and they go from not using these spot instances 'cause they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. >> You know, this was my whole, my feature article at Reinvent last year when I met with Adam Selipsky, this next gen Cloud is here. I mean, it's not auto scale, it's infrastructure scale. It's agility. It's flexibility. I think this is where the world needs to go. Almost what DevOps did for Cloud and what you were showing me that demo had this whole SRE vibe. And remember Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. I mean, I might be a little bit off base there, but how would you explain it? >> It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure. Where developers only think about their application logic. And where businesses can do AI, can succeed with AI, and build these scalable applications, but they don't have to build, you know, an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code, on their application logic, and run the stuff out of the box. >> Awesome. Well, I appreciate the time. Before we wrap up here, give a plug for the company. I know you got a couple websites. Again, go, Ray's got its own website. You got Anyscale. You got an event coming up. Give a plug for the company looking to hire. Put a plug in for the company. >> Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right. We can be the infrastructure that enables all of that to happen, that makes it easy for companies to succeed with AI, and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Our adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models. But really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. You know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. And all of these companies can get value out of AI, can really use AI to improve their businesses. So if you're interested in learning more about Ray and Anyscale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if your business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them, and build applications and products around them, give us a call, talk to us. You know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering, product, go-to-market, and it's an exciting time. >> Robert Nishihara, co-founder and CEO of Anyscale, congratulations on a great company you've built and continuing to iterate on and you got growth ahead of you, you got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGPT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable so we're going to need that infrastructure. So thanks for coming on this season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top startups building foundational model infrastructure for AI and ML. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

episode one of the ongoing and you guys really had and other resources in the Cloud. and particular the large language and what you want to achieve. and the Cloud did that with data centers. the point, and you know, if you don't mind explaining and managing the infrastructure and you guys are positioning is that the amount of compute needed to do But John, I'm curious what you think. because that's the platform So you got tools in the platform. and being the best way to of the computer industry, Did you have an inside prompt and the large language models and tell it what you want it to do. So I have to ask you and you can lower your So I want to get into that, you know, and you know, your big cluster is back up So I think, you know, the on-demand instances. and what you were showing me that demo and run the stuff out of the box. I know you got a couple websites. and the opportunity is there, right. and you got growth ahead

ENTITIES

Entity	Category	Confidence
Robert Nishihara	PERSON	0.99+
John	PERSON	0.99+
Robert	PERSON	0.99+
John Furrier	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
35 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Ant Group	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Python	TITLE	0.99+
20%	QUANTITY	0.99+
32 GPUs	QUANTITY	0.99+
Lyft	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
tomorrow	DATE	0.99+
Anyscale	ORGANIZATION	0.99+
three	QUANTITY	0.99+
128	QUANTITY	0.99+
September	DATE	0.99+
today	DATE	0.99+
Moore's Law	TITLE	0.99+
Adam Selipsky	PERSON	0.99+
PyTorch	TITLE	0.99+
Ray	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
64	QUANTITY	0.99+
each worker	QUANTITY	0.99+
each worker	QUANTITY	0.99+
Photoshop	TITLE	0.99+
UC Berkeley	ORGANIZATION	0.99+
Java	TITLE	0.99+
Shopify	ORGANIZATION	0.99+
OpenAI	ORGANIZATION	0.99+
Anyscale	PERSON	0.99+
third	QUANTITY	0.99+
two things	QUANTITY	0.99+
ByteDance	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
One	QUANTITY	0.99+
95	QUANTITY	0.99+
Asure	ORGANIZATION	0.98+
one line	QUANTITY	0.98+
one GPU	QUANTITY	0.98+
ChatGPT	TITLE	0.98+
TensorFlow	TITLE	0.98+
last year	DATE	0.98+
first bucket	QUANTITY	0.98+
both	QUANTITY	0.98+
two layers	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Alipay	ORGANIZATION	0.98+
Ray	PERSON	0.97+
one	QUANTITY	0.97+
Instacart	ORGANIZATION	0.97+

Jacqueline Kuo, Dataiku | WiDS 2023

(upbeat music) >> Morning guys and girls, welcome back to theCUBE's live coverage of Women in Data Science WIDS 2023 live at Stanford University. Lisa Martin here with my co-host for this segment, Tracy Zhang. We're really excited to be talking with a great female rockstar. You're going to learn a lot from her next, Jacqueline Kuo, solutions engineer at Dataiku. Welcome, Jacqueline. Great to have you. >> Thank you so much. >> Thank for being here. >> I'm so excited to be here. >> So one of the things I have to start out with, 'cause my mom Kathy Dahlia is watching, she's a New Yorker. You are a born and raised New Yorker and I learned from my mom and others. If you're born in New York no matter how long you've moved away, you are a New Yorker. There's you guys have like a secret club. (group laughs) >> I am definitely very proud of being born and raised in New York. My family immigrated to New York, New Jersey from Taiwan. So very proud Taiwanese American as well. But I absolutely love New York and I can't imagine living anywhere else. >> Yeah, yeah. >> I love it. >> So you studied, I was doing some research on you you studied mechanical engineering at MIT. >> Yes. >> That's huge. And you discovered your passion for all things data-related. You worked at IBM as an analytics consultant. Talk to us a little bit about your career path. Were you always interested in engineering STEM-related subjects from the time you were a child? >> I feel like my interests were ranging in many different things and I ended up landing in engineering, 'cause I felt like I wanted to gain a toolkit like a toolset to make some sort of change with or use my career to make some sort of change in this world. And I landed on engineering and mechanical engineering specifically, because I felt like I got to, in my undergrad do a lot of hands-on projects, learn every part of the engineering and design process to build products which is super-transferable and transferable skills sort of is like the trend in my career so far. Where after undergrad I wanted to move back to New York and mechanical engineering jobs are kind of few and fall far in between in the city. And I ended up landing at IBM doing analytics consulting, because I wanted to understand how to use data. I knew that data was really powerful and I knew that working with it could allow me to tell better stories to influence people across different industries. And that's also how I kind of landed at Dataiku to my current role, because it really does allow me to work across different industries and work on different problems that are just interesting. >> Yeah, I like the way that, how you mentioned building a toolkit when doing your studies at school. Do you think a lot of skills are still very relevant to your job at Dataiku right now? >> I think that at the core of it is just problem solving and asking questions and continuing to be curious or trying to challenge what is is currently given to you. And I think in an engineering degree you get a lot of that. >> Yeah, I'm sure. >> But I think that we've actually seen that a lot in the panels today already, that you get that through all different types of work and research and that kind of thoughtfulness comes across in all different industries too. >> Talk a little bit about some of the challenges, that data science is solving, because every company these days, whether it's an enterprise in manufacturing or a small business in retail, everybody has to be data-driven, because the end user, the end customer, whoever that is whether it's a person, an individual, a company, a B2B, expects to have a personalized custom experience and that comes from data. But you have to be able to understand that data treated properly, responsibly. Talk about some of the interesting projects that you're doing at Dataiku or maybe some that you've done in the past that are really kind of transformative across things climate change or police violence, some of the things that data science really is impacting these days. >> Yeah, absolutely. I think that what I love about coming to these conferences is that you hear about those really impactful social impact projects that I think everybody who's in data science wants to be working on. And I think at Dataiku what's great is that we do have this program called Ikig.AI where we work with nonprofits and we support them in their data and analytics projects. And so, a project I worked on was with the Clean Water, oh my goodness, the Ocean Cleanup project, Ocean Cleanup organization, which was amazing, because it was sort of outside of my day-to-day and it allowed me to work with them and help them understand better where plastic is being aggregated across the world and where it appears, whether that's on beaches or in lakes and rivers. So using data to help them better understand that. I feel like from a day-to-day though, we, in terms of our customers, they're really looking at very basic problems with data. And I say basic, not to diminish it, but really just to kind of say that it's high impact, but basic problems around how do they forecast sales better? That's a really kind of, sort of basic problem, but it's actually super-complex and really impactful for people, for companies when it comes to forecasting how much headcount they need to have in the next year or how much inventory to have if they're retail. And all of those are going to, especially for smaller companies, make a huge impact on whether they make profit or not. And so, what's great about working at Dataiku is you get to work on these high-impact projects and oftentimes I think from my perspective, I work as a solutions engineer on the commercial team. So it's just, we work generally with smaller customers and sometimes talking to them, me talking to them is like their first introduction to what data science is and what they can do with that data. And sort of using our platform to show them what the possibilities are and help them build a strategy around how they can implement data in their day-to-day. >> What's the difference? You were a data scientist by title and function, now you're a solutions engineer. Talk about the ascendancy into that and also some of the things that you and Tracy will talk about as those transferable, those transportable skills that probably maybe you learned in engineering, you brought data science now you're bringing to solutions engineering. >> Yeah, absolutely. So data science, I love working with data. I love getting in the weeds of things and I love, oftentimes that means debugging things or looking line by line at your code and trying to make it better. I found that on in the data science role, while those things I really loved, sometimes it also meant that I didn't, couldn't see or didn't have visibility into the broader picture of well like, well why are we doing this project? And who is it impacting? And because oftentimes your day-to-day is very much in the weeds. And so, I moved into sales or solutions engineering at Dataiku to get that perspective, because what a sales engineer does is support the sale from a technical perspective. And so, you really truly understand well, what is the customer looking for and what is going to influence them to make a purchase? And how do you tell the story of the impact of data? Because oftentimes they need to quantify well, if I purchase a software like Dataiku then I'm able to build this project and make this X impact on the business. And that is really powerful. That's where the storytelling comes in and that I feel like a lot of what we've been hearing today about connecting data with people who can actually do something with that data. That's really the bridge that we as sales engineers are trying to connect in that sales process. >> It's all about connectivity, isn't it? >> Yeah, definitely. We were talking about this earlier that it's about making impact and it's about people who we are analyzing data is like influencing. And I saw that one of the keywords or one of the biggest thing at Dataiku is everyday AI, so I wanted to just ask, could you please talk more about how does that weave into the problem solving and then day-to-day making an impact process? >> Yes, so I started working on Dataiku around three years ago and I fell in love with the product itself. The product that we have is we allow for people with different backgrounds. If you're coming from a data analyst background, data science, data engineering, maybe you are more of like a business subject matter expert, to all work in one unified central platform, one user interface. And why that's powerful is that when you're working with data, it's not just that data scientist working on their own and their own computer coding. We've heard today that it's all about connecting the data scientists with those business people, with maybe the data engineers and IT people who are actually going to put that model into production or other folks. And so, they all use different languages. Data scientists might use Python and R, your business people are using PowerPoint and Excel, everyone's using different tools. How do we bring them all in one place so that you can have conversations faster? So the business people can understand exactly what you're building with the data and can get their hands on that data and that model prediction faster. So that's what Dataiku does. That's the product that we have. And I completely forgot your question, 'cause I got so invested in talking about this. Oh, everyday AI. Yeah, so the goal of of Dataiku is really to allow for those maybe less technical people with less traditional data science backgrounds. Maybe they're data experts and they understand the data really well and they've been working in SQL for all their career. Maybe they're just subject matter experts and want to get more into working with data. We allow those people to do that through our no and low-code tools within our platform. Platform is very visual as well. And so, I've seen a lot of people learn data science, learn machine learning by working in the tool itself. And that's sort of, that's where everyday AI comes in, 'cause we truly believe that there are a lot of, there's a lot of unutilized expertise out there that we can bring in. And if we did give them access to data, imagine what we could do in the kind of work that they can do and become empowered basically with that. >> Yeah, we're just scratching the surface. I find data science so fascinating, especially when you talk about some of the real world applications, police violence, health inequities, climate change. Here we are in California and I don't know if you know, we're experiencing an atmospheric river again tomorrow. Californians and the rain- >> Storm is coming. >> We are not good... And I'm a native Californian, but we all know about climate change. People probably don't associate all of the data that is helping us understand it, make decisions based on what's coming what's happened in the past. I just find that so fascinating. But I really think we're truly at the beginning of really understanding the impact that being data-driven can actually mean whether you are investigating climate change or police violence or health inequities or your a grocery store that needs to become data-driven, because your consumer is expecting a personalized relevant experience. I want you to offer me up things that I know I was doing online grocery shopping, yesterday, I just got back from Europe and I was so thankful that my grocer is data-driven, because they made the process so easy for me. And but we have that expectation as consumers that it's going to be that easy, it's going to be that personalized. And what a lot of folks don't understand is the data the democratization of data, the AI that's helping make that a possibility that makes our lives easier. >> Yeah, I love that point around data is everywhere and the more we have, the actually the more access we actually are providing. 'cause now compute is cheaper, data is literally everywhere, you can get access to it very easily. And so, I feel like more people are just getting themselves involved and that's, I mean this whole conference around just bringing more women into this industry and more people with different backgrounds from minority groups so that we get their thoughts, their opinions into the work is so important and it's becoming a lot easier with all of the technology and tools just being open source being easier to access, being cheaper. And that I feel really hopeful about in this field. >> That's good. Hope is good, isn't it? >> Yes, that's all we need. But yeah, I'm glad to see that we're working towards that direction. I'm excited to see what lies in the future. >> We've been talking about numbers of women, percentages of women in technical roles for years and we've seen it hover around 25%. I was looking at some, I need to AnitaB.org stats from 2022 was just looking at this yesterday and the numbers are going up. I think the number was 26, 27.6% of women in technical roles. So we're seeing a growth there especially over pre-pandemic levels. Definitely the biggest challenge that still seems to be one of the biggest that remains is attrition. I would love to get your advice on what would you tell your younger self or the previous prior generation in terms of having the confidence and the courage to pursue engineering, pursue data science, pursue a technical role, and also stay in that role so you can be one of those females on stage that we saw today? >> Yeah, that's the goal right there one day. I think it's really about finding other people to lift and mentor and support you. And I talked to a bunch of people today who just found this conference through Googling it, and the fact that organizations like this exist really do help, because those are the people who are going to understand the struggles you're going through as a woman in this industry, which can get tough, but it gets easier when you have a community to share that with and to support you. And I do want to definitely give a plug to the WIDS@Dataiku team. >> Talk to us about that. >> Yeah, I was so fortunate to be a WIDS ambassador last year and again this year with Dataiku and I was here last year as well with Dataiku, but we have grown the WIDS effort so much over the last few years. So the first year we had two events in New York and also in London. Our Dataiku's global. So this year we additionally have one in the west coast out here in SF and another one in Singapore which is incredible to involve that team. But what I love is that everyone is really passionate about just getting more women involved in this industry. But then also what I find fortunate too at Dataiku is that we have a strong female, just a lot of women. >> Good. >> Yeah. >> A lot of women working as data scientists, solutions engineer and sales and all across the company who even if they aren't doing data work in a day-to-day, they are super-involved and excited to get more women in the technical field. And so. that's like our Empower group internally that hosts events and I feel like it's a really nice safe space for all of us to speak about challenges that we encounter and feel like we're not alone in that we have a support system to make it better. So I think from a nutrition standpoint every organization should have a female ERG to just support one another. >> Absolutely. There's so much value in a network in the community. I was talking to somebody who I'm blanking on this may have been in Barcelona last week, talking about a stat that showed that a really high percentage, 78% of people couldn't identify a female role model in technology. Of course, Sheryl Sandberg's been one of our role models and I thought a lot of people know Sheryl who's leaving or has left. And then a whole, YouTube influencers that have no idea that the CEO of YouTube for years has been a woman, who has- >> And she came last year to speak at WIDS. >> Did she? >> Yeah. >> Oh, I missed that. It must have been, we were probably filming. But we need more, we need to be, and it sounds like Dataiku was doing a great job of this. Tracy, we've talked about this earlier today. We need to see what we can be. And it sounds like Dataiku was pioneering that with that ERG program that you talked about. And I completely agree with you. That should be a standard program everywhere and women should feel empowered to raise their hand ask a question, or really embrace, "I'm interested in engineering, I'm interested in data science." Then maybe there's not a lot of women in classes. That's okay. Be the pioneer, be that next Sheryl Sandberg or the CTO of ChatGPT, Mira Murati, who's a female. We need more people that we can see and lean into that and embrace it. I think you're going to be one of them. >> I think so too. Just so that young girls like me like other who's so in school, can see, can look up to you and be like, "She's my role model and I want to be like her. And I know that there's someone to listen to me and to support me if I have any questions in this field." So yeah. >> Yeah, I mean that's how I feel about literally everyone that I'm surrounded by here. I find that you find role models and people to look up to in every conversation whenever I'm speaking with another woman in tech, because there's a journey that has had happen for you to get to that place. So it's incredible, this community. >> It is incredible. WIDS is a movement we're so proud of at theCUBE to have been a part of it since the very beginning, since 2015, I've been covering it since 2017. It's always one of my favorite events. It's so inspiring and it just goes to show the power that data can have, the influence, but also just that we're at the beginning of uncovering so much. Jacqueline's been such a pleasure having you on theCUBE. Thank you. >> Thank you. >> For sharing your story, sharing with us what Dataiku was doing and keep going. More power to you girl. We're going to see you up on that stage one of these years. >> Thank you so much. Thank you guys. >> Our pleasure. >> Our pleasure. >> For our guests and Tracy Zhang, this is Lisa Martin, you're watching theCUBE live at WIDS '23. #EmbraceEquity is this year's International Women's Day theme. Stick around, our next guest joins us in just a minute. (upbeat music)

Published Date : Mar 8 2023

SUMMARY :

We're really excited to be talking I have to start out with, and I can't imagine living anywhere else. So you studied, I was the time you were a child? and I knew that working Yeah, I like the way and continuing to be curious that you get that through and that comes from data. And I say basic, not to diminish it, and also some of the I found that on in the data science role, And I saw that one of the keywords so that you can have conversations faster? Californians and the rain- that it's going to be that easy, and the more we have, Hope is good, isn't it? I'm excited to see what and also stay in that role And I talked to a bunch of people today is that we have a strong and all across the company that have no idea that the And she came last and lean into that and embrace it. And I know that there's I find that you find role models but also just that we're at the beginning We're going to see you up on Thank you so much. #EmbraceEquity is this year's

ENTITIES

Entity	Category	Confidence
Sheryl	PERSON	0.99+
Mira Murati	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Tracy Zhang	PERSON	0.99+
Tracy	PERSON	0.99+
Jacqueline	PERSON	0.99+
Kathy Dahlia	PERSON	0.99+
Jacqueline Kuo	PERSON	0.99+
California	LOCATION	0.99+
Europe	LOCATION	0.99+
Dataiku	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Singapore	LOCATION	0.99+
London	LOCATION	0.99+
last year	DATE	0.99+
Sheryl Sandberg	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Barcelona	LOCATION	0.99+
2022	DATE	0.99+
Taiwan	LOCATION	0.99+
2015	DATE	0.99+
last week	DATE	0.99+
two events	QUANTITY	0.99+
26, 27.6%	QUANTITY	0.99+
last year	DATE	0.99+
PowerPoint	TITLE	0.99+
Excel	TITLE	0.99+
this year	DATE	0.99+
yesterday	DATE	0.99+
Python	TITLE	0.99+
Dataiku	PERSON	0.99+
New York, New Jersey	LOCATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
SF	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
today	DATE	0.98+
78%	QUANTITY	0.98+
ChatGPT	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Ocean Cleanup	ORGANIZATION	0.98+
SQL	TITLE	0.98+
next year	DATE	0.98+
International Women's Day	EVENT	0.97+
R	TITLE	0.97+
around 25%	QUANTITY	0.96+
Californians	PERSON	0.95+
Women in Data Science	TITLE	0.94+
one day	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.91+
WIDS	ORGANIZATION	0.89+
first introduction	QUANTITY	0.88+
Stanford University	LOCATION	0.87+
one place	QUANTITY	0.87+

Robert Nishihara, Anyscale | CUBE Conversation

(upbeat instrumental) >> Hello and welcome to this CUBE conversation. I'm John Furrier, host of theCUBE, here in Palo Alto, California. Got a great conversation with Robert Nishihara who's the co-founder and CEO of Anyscale. Robert, great to have you on this CUBE conversation. It's great to see you. We did your first Ray Summit a couple years ago and congratulations on your venture. Great to have you on. >> Thank you. Thanks for inviting me. >> So you're first time CEO out of Berkeley in Data. You got the Databricks is coming out of there. You got a bunch of activity coming from Berkeley. It's like a, it really is kind of like where a lot of innovations going on data. Anyscale has been one of those startups that has risen out of that scene. Right? You look at the success of what the Data lakes are now. Now you've got the generative AI. This has been a really interesting innovation market. This new wave is coming. Tell us what's going on with Anyscale right now, as you guys are gearing up and getting some growth. What's happening with the company? >> Yeah, well one of the most exciting things that's been happening in computing recently, is the rise of AI and the excitement about AI, and the potential for AI to really transform every industry. Now of course, one of the of the biggest challenges to actually making that happen is that doing AI, that AI is incredibly computationally intensive, right? To actually succeed with AI to actually get value out of AI. You're typically not just running it on your laptop, you're often running it and scaling it across thousands of machines, or hundreds of machines or GPUs, and to, so organizations and companies and businesses that do AI often end up building a large infrastructure team to manage the distributed systems, the computing to actually scale these applications. And that's a, that's a, a huge software engineering lift, right? And so, one of the goals for Anyscale is really to make that easy. To get to the point where, developers and teams and companies can succeed with AI. Can build these scalable AI applications, without really you know, without a huge investment in infrastructure with a lot of, without a lot of expertise in infrastructure, where really all they need to know is how to program on their laptop, how to program in Python. And if you have that, then that's really all you need to succeed with AI. So that's what we've been focused on. We're building Ray, which is an open source project that's been starting to get adopted by tons of companies, to actually train these models, to deploy these models, to do inference with these models, you know, to ingest and pre-process their data. And our goals, you know, here with the company are really to make Ray successful. To grow the Ray community, and then to build a great product around it and simplify the development and deployment, and productionization of machine learning for, for all these businesses. >> It's a great trend. Everyone wants developer productivity seeing that, clearly right now. And plus, developers are voting literally on what standards become. As you look at how the market is open source driven, a lot of that I love the model, love the Ray project love the, love the Anyscale value proposition. How big are you guys now, and how is that value proposition of Ray and Anyscale and foundational models coming together? Because it seems like you guys are in a perfect storm situation where you guys could get a real tailwind and draft off the the mega trend that everyone's getting excited. The new toy is ChatGPT. So you got to look at that and say, hey, I mean, come on, you guys did all the heavy lifting. >> Absolutely. >> You know how many people you are, and what's the what's the proposition for you guys these days? >> You know our company's about a hundred people, that a bit larger than that. Ray's been going really quickly. It's been, you know, companies using, like OpenAI uses Ray to train their models, like ChatGPT. Companies like Uber run all their deep learning you know, and classical machine learning on top of Ray. Companies like Shopify, Spotify, Netflix, Cruise, Lyft, Instacart, you know, Bike Dance. A lot of these companies are investing heavily in Ray for their machine learning infrastructure. And I think it's gotten to the point where, if you're one of these, you know type of businesses, and you're looking to revamp your machine learning infrastructure. If you're looking to enable new capabilities, you know make your teams more productive, increase, speed up the experimentation cycle, you know make it more performance, like build, you know, run applications that are more scalable, run them faster, run them in a more cost efficient way. All of these types of companies are at least evaluating Ray and Ray is an increasingly common choice there. I think if they're not using Ray, if many of these companies that end up not using Ray, they often end up building their own infrastructure. So Ray has been, the growth there has been incredibly exciting over the, you know we had our first in-person Ray Summit just back in August, and planning the next one for, for coming September. And so when you asked about the value proposition, I think there's there's really two main things, when people choose to go with Ray and Anyscale. One reason is about moving faster, right? It's about developer productivity, it's about speeding up the experimentation cycle, easily getting their models in production. You know, we hear many companies say that they, you know they, once they prototype a model, once they develop a model, it's another eight weeks, or 12 weeks to actually get that model in production. And that's a reason they talk to us. We hear companies say that, you know they've been training their models and, and doing inference on a single machine, and they've been sort of scaling vertically, like using bigger and bigger machines. But they, you know, you can only do that for so long, and at some point you need to go beyond a single machine and that's when they start talking to us. Right? So one of the main value propositions is around moving faster. I think probably the phrase I hear the most is, companies saying that they don't want their machine learning people to have to spend all their time configuring infrastructure. All this is about productivity. >> Yeah. >> The other. >> It's the big brains in the company. That are being used to do remedial tasks that should be automated right? I mean that's. >> Yeah, and I mean, it's hard stuff, right? It's also not these people's area of expertise, and or where they're adding the most value. So all of this is around developer productivity, moving faster, getting to market faster. The other big value prop and the reason people choose Ray and choose Anyscale, is around just providing superior infrastructure. This is really, can we scale more? You know, can we run it faster, right? Can we run it in a more cost effective way? We hear people saying that they're not getting good GPU utilization with the existing tools they're using, or they can't scale beyond a certain point, or you know they don't have a way to efficiently use spot instances to save costs, right? Or their clusters, you know can't auto scale up and down fast enough, right? These are all the kinds of things that Ray and Anyscale, where Ray and Anyscale add value and solve these kinds of problems. >> You know, you bring up great points. Auto scaling concept, early days, it was easy getting more compute. Now it's complicated. They're built into more integrated apps in the cloud. And you mentioned those companies that you're working with, that's impressive. Those are like the big hardcore, I call them hardcore. They have a good technical teams. And as the wave starts to move from these companies that were hyper scaling up all the time, the mainstream are just developers, right? So you need an interface in, so I see the dots connecting with you guys and I want to get your reaction. Is that how you see it? That you got the alphas out there kind of kicking butt, building their own stuff, alpha developers and infrastructure. But mainstream just wants programmability. They want that heavy lifting taken care of for them. Is that kind of how you guys see it? I mean, take us through that. Because to get crossover to be democratized, the automation's got to be there. And for developer productivity to be in, it's got to be coding and programmability. >> That's right. Ultimately for AI to really be successful, and really you know, transform every industry in the way we think it has the potential to. It has to be easier to use, right? And that is, and being easier to use, there's many dimensions to that. But an important one is that as a developer to do AI, you shouldn't have to be an expert in distributed systems. You shouldn't have to be an expert in infrastructure. If you do have to be, that's going to really limit the number of people who can do this, right? And I think there are so many, all of the companies we talk to, they don't want to be in the business of building and managing infrastructure. It's not that they can't do it. But it's going to slow them down, right? They want to allocate their time and their energy toward building their product, right? To building a better product, getting their product to market faster. And if we can take the infrastructure work off of the critical path for them, that's going to speed them up, it's going to simplify their lives. And I think that is critical for really enabling all of these companies to succeed with AI. >> Talk about the customers you guys are talking to right now, and how that translates over. Because I think you hit a good thread there. Data infrastructure is critical. Managed services are coming online, open sources continuing to grow. You have these people building their own, and then if they abandon it or don't scale it properly, there's kind of consequences. 'Cause it's a system you mentioned, it's a distributed system architecture. It's not as easy as standing up a monolithic app these days. So when you guys go to the marketplace and talk to customers, put the customers in buckets. So you got the ones that are kind of leaning in, that are pretty peaked, probably working with you now, open source. And then what's the customer profile look like as you go mainstream? Are they looking to manage service, looking for more architectural system, architecture approach? What's the, Anyscale progression? How do you engage with your customers? What are they telling you? >> Yeah, so many of these companies, yes, they're looking for managed infrastructure 'cause they want to move faster, right? Now the kind of these profiles of these different customers, they're three main workloads that companies run on Anyscale, run with Ray. It's training related workloads, and it is serving and deployment related workloads, like actually deploying your models, and it's batch processing, batch inference related workloads. Like imagine you want to do computer vision on tons and tons of, of images or videos, or you want to do natural language processing on millions of documents or audio, or speech or things like that, right? So the, I would say the, there's a pretty large variety of use cases, but the most common you know, we see tons of people working with computer vision data, you know, computer vision problems, natural language processing problems. And it's across many different industries. We work with companies doing drug discovery, companies doing you know, gaming or e-commerce, right? Companies doing robotics or agriculture. So there's a huge variety of the types of industries that can benefit from AI, and can really get a lot of value out of AI. And, but the, but the problems are the same problems that they all want to solve. It's like how do you make your team move faster, you know succeed with AI, be more productive, speed up the experimentation, and also how do you do this in a more performant way, in a faster, cheaper, in a more cost efficient, more scalable way. >> It's almost like the cloud game is coming back to AI and these foundational models, because I was just on a podcast, we recorded our weekly podcast, and I was just riffing with Dave Vellante, my co-host on this, were like, hey, in the early days of Amazon, if you want to build an app, you just, you have to build a data center, and then you go to now you go to the cloud, cloud's easier, pay a little money, penny's on the dollar, you get your app up and running. Cloud computing is born. With foundation models in generative AI. The old model was hard, heavy lifting, expensive, build out, before you get to do anything, as you mentioned time. So I got to think that you're pretty much in a good position with this foundational model trend in generative AI because I just looked at the foundation map, foundation models, map of the ecosystem. You're starting to see layers of, you got the tooling, you got platform, you got cloud. It's filling out really quickly. So why is Anyscale important to this new trend? How do you talk to people when they ask you, you know what does ChatGPT mean for Anyscale? And how does the financial foundational model growth, fit into your plan? >> Well, foundational models are hugely important for the industry broadly. Because you're going to have these really powerful models that are trained that you know, have been trained on tremendous amounts of data. tremendous amounts of computes, and that are useful out of the box, right? That people can start to use, and query, and get value out of, without necessarily training these huge models themselves. Now Ray fits in and Anyscale fit in, in a number of places. First of all, they're useful for creating these foundation models. Companies like OpenAI, you know, use Ray for this purpose. Companies like Cohere use Ray for these purposes. You know, IBM. If you look at, there's of course also open source versions like GPTJ, you know, created using Ray. So a lot of these large language models, large foundation models benefit from training on top of Ray. And, but of course for every company training and creating these huge foundation models, you're going to have many more that are fine tuning these models with their own data. That are deploying and serving these models for their own applications, that are building other application and business logic around these models. And that's where Ray also really shines, because Ray you know, is, can provide common infrastructure for all of these workloads. The training, the fine tuning, the serving, the data ingest and pre-processing, right? The hyper parameter tuning, the and and so on. And so where the reason Ray and Anyscale are important here, is that, again, foundation models are large, foundation models are compute intensive, doing you know, using both creating and using these foundation models requires tremendous amounts of compute. And there there's a big infrastructure lift to make that happen. So either you are using Ray and Anyscale to do this, or you are building the infrastructure and managing the infrastructure yourself. Which you can do, but it's, it's hard. >> Good luck with that. I always say good luck with that. I mean, I think if you really need to do, build that hardened foundation, you got to go all the way. And I think this, this idea of composability is interesting. How is Ray working with OpenAI for instance? Take, take us through that. Because I think you're going to see a lot of people talking about, okay I got trained models, but I'm going to have not one, I'm going to have many. There's big debate that OpenAI is going to be the mother of all LLMs, but now, but really people are also saying that to be many more, either purpose-built or specific. The fusion and these things come together there's like a blending of data, and that seems to be a value proposition. How does Ray help these guys get their models up? Can you take, take us through what Ray's doing for say OpenAI and others, and how do you see the models interacting with each other? >> Yeah, great question. So where, where OpenAI uses Ray right now, is for the training workloads. Training both to create ChatGPT and models like that. There's both a supervised learning component, where you're pre-training this model on doing supervised pre-training with example data. There's also a reinforcement learning component, where you are fine-tuning the model and continuing to train the model, but based on human feedback, based on input from humans saying that, you know this response to this question is better than this other response to this question, right? And so Ray provides the infrastructure for scaling the training across many, many GPUs, many many machines, and really running that in an efficient you know, performance fault tolerant way, right? And so, you know, open, this is not the first version of OpenAI's infrastructure, right? They've gone through iterations where they did start with building the infrastructure themselves. They were using tools like MPI. But at some point, you know, given the complexity, given the scale of what they're trying to do, you hit a wall with MPI and that's going to happen with a lot of other companies in this space. And at that point you don't have many other options other than to use Ray or to build your own infrastructure. >> That's awesome. And then your vision on this data interaction, because the old days monolithic models were very rigid. You couldn't really interface with them. But we're kind of seeing this future of data fusion, data interaction, data blending at large scale. What's your vision? How do you, what's your vision of where this goes? Because if this goes the way people think. You can have this data chemistry kind of thing going on where people are integrating all kinds of data with each other at large scale. So you need infrastructure, intelligence, reasoning, a lot of code. Is this something that you see? What's your vision in all this? Take us through. >> AI is going to be used everywhere right? It's, we see this as a technology that's going to be ubiquitous, and is going to transform every business. I mean, imagine you make a product, maybe you were making a tool like Photoshop or, or whatever the, you know, tool is. The way that people are going to use your tool, is not by investing, you know, hundreds of hours into learning all of the different, you know specific buttons they need to press and workflows they need to go through it. They're going to talk to it, right? They're going to say, ask it to do the thing they want it to do right? And it's going to do it. And if it, if it doesn't know what it's want, what it's, what's being asked of it. It's going to ask clarifying questions, right? And then you're going to clarify, and you're going to have a conversation. And this is going to make many many many kinds of tools and technology and products easier to use, and lower the barrier to entry. And so, and this, you know, many companies fit into this category of trying to build products that, and trying to make them easier to use, this is just one kind of way it can, one kind of way that AI will will be used. But I think it's, it's something that's pretty ubiquitous. >> Yeah. It'll be efficient, it'll be efficiency up and down the stack, and will change the productivity equation completely. You just highlighted one, I don't want to fill out forms, just stand up my environment for me. And then start coding away. Okay well this is great stuff. Final word for the folks out there watching, obviously new kind of skill set for hiring. You guys got engineers, give a plug for the company, for Anyscale. What are you looking for? What are you guys working on? Give a, take the last minute to put a plug in for the company. >> Yeah well if you're interested in AI and if you think AI is really going to be transformative, and really be useful for all these different industries. We are trying to provide the infrastructure to enable that to happen, right? So I think there's the potential here, to really solve an important problem, to get to the point where developers don't need to think about infrastructure, don't need to think about distributed systems. All they think about is their application logic, and what they want their application to do. And I think if we can achieve that, you know we can be the foundation or the platform that enables all of these other companies to succeed with AI. So that's where we're going. I think something like this has to happen if AI is going to achieve its potential, we're looking for, we're hiring across the board, you know, great engineers, on the go-to-market side, product managers, you know people who want to really, you know, make this happen. >> Awesome well congratulations. I know you got some good funding behind you. You're in a good spot. I think this is happening. I think generative AI and foundation models is going to be the next big inflection point, as big as the pc inter-networking, internet and smartphones. This is a whole nother application framework, a whole nother set of things. So this is the ground floor. Robert, you're, you and your team are right there. Well done. >> Thank you so much. >> All right. Thanks for coming on this CUBE conversation. I'm John Furrier with theCUBE. Breaking down a conversation around AI and scaling up in this new next major inflection point. This next wave is foundational models, generative AI. And thanks to ChatGPT, the whole world's now knowing about it. So it really is changing the game and Anyscale is right there, one of the hot startups, that is in good position to ride this next wave. Thanks for watching. (upbeat instrumental)

Published Date : Feb 24 2023

SUMMARY :

Robert, great to have you Thanks for inviting me. as you guys are gearing up and the potential for AI to a lot of that I love the and at some point you need It's the big brains in the company. and the reason people the automation's got to be there. and really you know, and talk to customers, put but the most common you know, and then you go to now that are trained that you know, and that seems to be a value proposition. And at that point you don't So you need infrastructure, and lower the barrier to entry. What are you guys working on? and if you think AI is really is going to be the next And thanks to ChatGPT,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Robert Nishihara	PERSON	0.99+
John Furrier	PERSON	0.99+
12 weeks	QUANTITY	0.99+
Robert	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Lyft	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
eight weeks	QUANTITY	0.99+
Spotify	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
August	DATE	0.99+
September	DATE	0.99+
Palo Alto, California	LOCATION	0.99+
Cruise	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Instacart	ORGANIZATION	0.99+
Anyscale	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Photoshop	TITLE	0.99+
One reason	QUANTITY	0.99+
Bike Dance	ORGANIZATION	0.99+
Ray	ORGANIZATION	0.99+
Python	TITLE	0.99+
thousands of machines	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
two main things	QUANTITY	0.98+
single machine	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Ray and Anyscale	ORGANIZATION	0.98+
millions of documents	QUANTITY	0.98+
both	QUANTITY	0.98+
one kind	QUANTITY	0.96+
first version	QUANTITY	0.95+
CUBE	ORGANIZATION	0.95+
about a hundred people	QUANTITY	0.95+
hundreds of machines	QUANTITY	0.95+
one	QUANTITY	0.95+
OpenAI	ORGANIZATION	0.94+
First	QUANTITY	0.94+
hundreds of hours	QUANTITY	0.93+
first time	QUANTITY	0.93+
Databricks	ORGANIZATION	0.91+
Ray and Anyscale	ORGANIZATION	0.9+
tons	QUANTITY	0.89+
couple years ago	DATE	0.88+
Ray and	ORGANIZATION	0.86+
ChatGPT	TITLE	0.81+
tons of people	QUANTITY	0.8+

How to Make a Data Fabric Smart A Technical Demo With Jess Jowdy

(inspirational music) (music ends) >> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities, including data exploration business intelligence, natural language processing and machine learning directly within the fabric makes it faster and easier for organizations to gain new insights and power intelligence predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi, yeah, thank you so much for having me. And so for this demo, we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements, and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo, and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see, and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be, for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software, or adverse reaction warnings from a clinical risk grouping application, and so much more. So I'm really going to be simulating a patient logging in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here, and I'm going to be looking for information where the last name of this patient is Simmons, and their medical record number or their patient identifier in the system is 32345. And so as you can see, I have this single JSON payload that showed up here of, just, relevant clinical information for my patient whose last name is Simmons, all within a single response. So fantastic, right? Typically though, when we see responses that look like this there is an assumption that this service is interacting with a single backend system, and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture, we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here, we have our data fabric coordinator which is going to be in charge of this refinement and analysis, those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service, and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end, we do also support full life cycle API management within this platform. When you're dealing with APIs, I always like to make a little shout out on this, that you really want to make sure you have enough, like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what context. >> Can I just interrupt you for a second, Jess? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you could have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry, and API securities are like, really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So, there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So, the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product, and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So, that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of the security. >> And that's been designed in, it's not a bolt on as they like to say. >> Absolutely. >> Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly, each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like Fire. Interactions with a homegrown enterprise data warehouse for instance, may use SQL. For a cloud-based solutions managed by a vendor, they may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and applications. And I'm about to log out, so I'm going to (chuckles) keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources, and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is, it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST, or SOAP, or SQL, or FTP, regardless of that protocol, there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as in healthcare we have HL7, we have Fire, we have CCDs, across the industry, JSON is, you know, really hitting a market strong now, and XML payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel, I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example, communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR, I'm leveraging a standard healthcare messaging format known as Fire, which is a REST based protocol. And then when I'm working with my health record management system, I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly, and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN, and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out-of-the box or black box approach to be able to develop things that are specific to their data fabric, or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you not only get an opportunity to view how we're establishing these connections or how we're building out these processes, but you have the opportunity to inject your own kind of processing, your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out-of-the-box code that is provided in this data fabric platform from IRIS, combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out-of-the-box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. (laughs) >> It's a lot here. You know, actually- >> I can pause. >> If I could, if we just want to sort of play that back. So we went to the connect and the collect phase. >> Yes, we're going into refine. So it's a good place to stop. >> So before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know, the ability to bring in different dev tools. We heard about Fire, which of course big in healthcare. And that's the standard, and then SQL for traditional kind of structured data, and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely. And I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection, into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refinement. >> We're going into refinement. Exciting. (chuckles) So how do we actually do refinement? Where does refinement happen? And how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator, or stands for Smart Data Fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information, it's aggregating it, and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like. And as you can see, it follows a flow chart like structure. So there's a start, there is an end, and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce, or we make this data fabric a bit smarter, and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection, we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging, 'cause you need to be able to know, you know, if there was an issue, where did that issue happen in which connected process, and how did it affect the other processes that are related to it? In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric, to when data was sent back out from that smart data fabric. So I didn't record the time, but I bet if you recorded the time it was this time that we sent that request in and you can see my patient's name and their medical record number here, and you can see that that instigated four different calls to four different systems, and they're represented by these arrows going out. So we sent something to chart script, to our health record management system, to our clinical risk grouping application, into my EMR through their Fire server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems, and we bundle them together. And from my Fire lovers, here's our Fire bundle that we got back from our Fire server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping, or errors that were thrown, alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Sure, who did what when where, what did the machine do what went wrong, and where did that go wrong? Right at your fingertips. >> Right. And I'm a visual person so a bunch of log files to me is not the most helpful. While being able to see this happened at this time in this location, gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric, is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information it's transforming that data, in a format that your consumer's not going to understand. It's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? It all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well, we can keep going. I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this, but essentially if we go back to our coordinator here, we can see here's that original, that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here, which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric, but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at, and we're running it through a machine learning model that exists within the smart data fabric pipeline, and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world, is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL-like syntax to be able to construct and execute these predictions. So, it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge, right? Because it directly affects the cost for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment, or, you know, as an outpatient perhaps, to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day, which is what makes this so exciting. >> Awesome demo. >> Thank you! >> Jess, are you cool if people want to get in touch with you? Can they do that? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy, and we'd love to hear from you. I always love talking about this topic so we'd be happy to engage on that. >> Great stuff. Thank you Jessica, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment, we're going to dig into the use cases where data fabric is driving business value. Stay right there. (inspirational music) (music fades)

Published Date : Feb 22 2023

SUMMARY :

and she's going to show And to that end, we do also So you were showing hundreds of these APIs depending in the healthcare industry, So can I even see this as they like to say. that are specific to their data fabric, Yeah, I'll pause. It's a lot here. So we went to the connect So it's a good place to stop. So before we get So that platform needs to All right, so now we're that are related to it? Right at your fingertips. I need to actually troubleshoot a problem. of being able to create of clients that are using this technology Anything else you want to show us? So in this scenario, we're and the patient, you know. And that really brings So you can find me on Thank you Jessica, appreciate it. in the next segment,

ENTITIES

Entity	Category	Confidence
Joe Lichtenberg	PERSON	0.99+
Jessica Jowdy	PERSON	0.99+
Jessica	PERSON	0.99+
Jess Jowdy	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Scott	PERSON	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
Jess	PERSON	0.99+
32345	OTHER	0.99+
hundreds	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
each	QUANTITY	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
third segment	QUANTITY	0.98+
Fire	COMMERCIAL_ITEM	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.97+
each data	QUANTITY	0.97+
one	QUANTITY	0.97+
single	QUANTITY	0.95+
single response	QUANTITY	0.94+
single backend system	QUANTITY	0.92+
two more	QUANTITY	0.92+
four different segments	QUANTITY	0.89+
APIs	QUANTITY	0.88+
one step	QUANTITY	0.88+
four	QUANTITY	0.85+
Healthcare Field Engineering	ORGANIZATION	0.82+
JSON	TITLE	0.8+
single payload	QUANTITY	0.8+
second	QUANTITY	0.79+
one payload	QUANTITY	0.76+
next 30 days	DATE	0.76+
IRIS	TITLE	0.75+
Fire	TITLE	0.72+
Postman	TITLE	0.71+
every	QUANTITY	0.68+
four different calls	QUANTITY	0.66+
Jes	PERSON	0.66+
a second	QUANTITY	0.61+
services	QUANTITY	0.6+
evelopers	PERSON	0.58+
Postman	ORGANIZATION	0.54+
HL7	OTHER	0.4+

How to Make a Data Fabric "Smart": A Technical Demo With Jess Jowdy

>> Okay, so now that we've heard Scott talk about smart data fabrics, it's time to see this in action. Right now we're joined by Jess Jowdy, who's the manager of Healthcare Field Engineering at InterSystems. She's going to give a demo of how smart data fabrics actually work, and she's going to show how embedding a wide range of analytics capabilities including data exploration, business intelligence natural language processing, and machine learning directly within the fabric, makes it faster and easier for organizations to gain new insights and power intelligence, predictive and prescriptive services and applications. Now, according to InterSystems, smart data fabrics are applicable across many industries from financial services to supply chain to healthcare and more. Jess today is going to be speaking through the lens of a healthcare focused demo. Don't worry, Joe Lichtenberg will get into some of the other use cases that you're probably interested in hearing about. That will be in our third segment, but for now let's turn it over to Jess. Jess, good to see you. >> Hi. Yeah, thank you so much for having me. And so for this demo we're really going to be bucketing these features of a smart data fabric into four different segments. We're going to be dealing with connections, collections, refinements and analysis. And so we'll see that throughout the demo as we go. So without further ado, let's just go ahead and jump into this demo and you'll see my screen pop up here. I actually like to start at the end of the demo. So I like to begin by illustrating what an end user's going to see and don't mind the screen 'cause I gave you a little sneak peek of what's about to happen. But essentially what I'm going to be doing is using Postman to simulate a call from an external application. So we talked about being in the healthcare industry. This could be for instance, a mobile application that a patient is using to view an aggregated summary of information across that patient's continuity of care or some other kind of application. So we might be pulling information in this case from an electronic medical record. We might be grabbing clinical history from that. We might be grabbing clinical notes from a medical transcription software or adverse reaction warnings from a clinical risk grouping application and so much more. So I'm really going to be assimilating a patient logging on in on their phone and retrieving this information through this Postman call. So what I'm going to do is I'm just going to hit send, I've already preloaded everything here and I'm going to be looking for information where the last name of this patient is Simmons and their medical record number their patient identifier in the system is 32345. And so as you can see I have this single JSON payload that showed up here of just relevant clinical information for my patient whose last name is Simmons all within a single response. So fantastic, right? Typically though when we see responses that look like this there is an assumption that this service is interacting with a single backend system and that single backend system is in charge of packaging that information up and returning it back to this caller. But in a smart data fabric architecture we're able to expand the scope to handle information across different, in this case, clinical applications. So how did this actually happen? Let's peel back another layer and really take a look at what happened in the background. What you're looking at here is our mission control center for our smart data fabric. On the left we have our APIs that allow users to interact with particular services. On the right we have our connections to our different data silos. And in the middle here we have our data fabric coordinator which is going to be in charge of this refinement and analysis those key pieces of our smart data fabric. So let's look back and think about the example we just showed. I received an inbound request for information for a patient whose last name is Simmons. My end user is requesting to connect to that service and that's happening here at my patient data retrieval API location. Users can define any number of different services and APIs depending on their use cases. And to that end we do also support full lifecycle API management within this platform. When you're dealing with APIs I always like to make a little shout out on this that you really want to make sure you have enough like a granular enough security model to handle and limit which APIs and which services a consumer can interact with. In this IRIS platform, which we're talking about today we have a very granular role-based security model that allows you to handle that, but it's really important in a smart data fabric to consider who's accessing your data and in what contact. >> Can I just interrupt you for a second? >> Yeah, please. >> So you were showing on the left hand side of the demo a couple of APIs. I presume that can be a very long list. I mean, what do you see as typical? >> I mean you can have hundreds of these APIs depending on what services an organization is serving up for their consumers. So yeah, we've seen hundreds of these services listed here. >> So my question is, obviously security is critical in the healthcare industry and API securities are really hot topic these days. How do you deal with that? >> Yeah, and I think API security is interesting 'cause it can happen at so many layers. So there's interactions with the API itself. So can I even see this API and leverage it? And then within an API call, you then have to deal with all right, which end points or what kind of interactions within that API am I allowed to do? What data am I getting back? And with healthcare data, the whole idea of consent to see certain pieces of data is critical. So the way that we handle that is, like I said, same thing at different layers. There is access to a particular API, which can happen within the IRIS product and also we see it happening with an API management layer, which has become a really hot topic with a lot of organizations. And then when it comes to data security, that really happens under the hood within your smart data fabric. So that role-based access control becomes very important in assigning, you know, roles and permissions to certain pieces of information. Getting that granular becomes the cornerstone of security. >> And that's been designed in, >> Absolutely, yes. it's not a bolt-on as they like to say. Okay, can we get into collect now? >> Of course, we're going to move on to the collection piece at this point in time, which involves pulling information from each of my different data silos to create an overall aggregated record. So commonly each data source requires a different method for establishing connections and collecting this information. So for instance, interactions with an EMR may require leveraging a standard healthcare messaging format like FIRE, interactions with a homegrown enterprise data warehouse for instance may use SQL for a cloud-based solutions managed by a vendor. They may only allow you to use web service calls to pull data. So it's really important that your data fabric platform that you're using has the flexibility to connect to all of these different systems and and applications. And I'm about to log out so I'm going to keep my session going here. So therefore it's incredibly important that your data fabric has the flexibility to connect to all these different kinds of applications and data sources and all these different kinds of formats and over all of these different kinds of protocols. So let's think back on our example here. I had four different applications that I was requesting information for to create that payload that we saw initially. Those are listed here under this operations section. So these are going out and connecting to downstream systems to pull information into my smart data fabric. What's great about the IRIS platform is it has an embedded interoperability platform. So there's all of these native adapters that can support these common connections that we see for different kinds of applications. So using REST or SOAP or SQL or FTP regardless of that protocol there's an adapter to help you work with that. And we also think of the types of formats that we typically see data coming in as, in healthcare we have H7, we have FIRE we have CCDs across the industry. JSON is, you know, really hitting a market strong now and XML, payloads, flat files. We need to be able to handle all of these different kinds of formats over these different kinds of protocols. So to illustrate that, if I click through these when I select a particular connection on the right side panel I'm going to see the different settings that are associated with that particular connection that allows me to collect information back into my smart data fabric. In this scenario, my connection to my chart script application in this example communicates over a SOAP connection. When I'm grabbing information from my clinical risk grouping application I'm using a SQL based connection. When I'm connecting to my EMR I'm leveraging a standard healthcare messaging format known as FIRE, which is a rest based protocol. And then when I'm working with my health record management system I'm leveraging a standard HTTP adapter. So you can see how we can be flexible when dealing with these different kinds of applications and systems. And then it becomes important to be able to validate that you've established those connections correctly and be able to do it in a reliable and quick way. Because if you think about it, you could have hundreds of these different kinds of applications built out and you want to make sure that you're maintaining and understanding those connections. So I can actually go ahead and test one of these applications and put in, for instance my patient's last name and their MRN and make sure that I'm actually getting data back from that system. So it's a nice little sanity check as we're building out that data fabric to ensure that we're able to establish these connections appropriately. So turnkey adapters are fantastic, as you can see we're leveraging them all here, but sometimes these connections are going to require going one step further and building something really specific for an application. So let's, why don't we go one step further here and talk about doing something custom or doing something innovative. And so it's important for users to have the ability to develop and go beyond what's an out of the box or black box approach to be able to develop things that are specific to their data fabric or specific to their particular connection. In this scenario, the IRIS data platform gives users access to the entire underlying code base. So you cannot, you not only get an opportunity to view how we're establishing these connections or how we're building out these processes but you have the opportunity to inject your own kind of processing your own kinds of pipelines into this. So as an example, you can leverage any number of different programming languages right within this pipeline. And so I went ahead and I injected Python. So Python is a very up and coming language, right? We see more and more developers turning towards Python to do their development. So it's important that your data fabric supports those kinds of developers and users that have standardized on these kinds of programming languages. This particular script here, as you can see actually calls out to our turnkey adapters. So we see a combination of out of the box code that is provided in this data fabric platform from IRIS combined with organization specific or user specific customizations that are included in this Python method. So it's a nice little combination of how do we bring the developer experience in and mix it with out of the box capabilities that we can provide in a smart data fabric. >> Wow. >> Yeah, I'll pause. >> It's a lot here. You know, actually, if I could >> I can pause. >> If I just want to sort of play that back. So we went through the connect and the collect phase. >> And the collect, yes, we're going into refine. So it's a good place to stop. >> Yeah, so before we get there, so we heard a lot about fine grain security, which is crucial. We heard a lot about different data types, multiple formats. You've got, you know the ability to bring in different dev tools. We heard about FIRE, which of course big in healthcare. >> Absolutely. >> And that's the standard and then SQL for traditional kind of structured data and then web services like HTTP you mentioned. And so you have a rich collection of capabilities within this single platform. >> Absolutely, and I think that's really important when you're dealing with a smart data fabric because what you're effectively doing is you're consolidating all of your processing, all of your collection into a single platform. So that platform needs to be able to handle any number of different kinds of scenarios and technical challenges. So you've got to pack that platform with as many of these features as you can to consolidate that processing. >> All right, so now we're going into refine. >> We're going into refinement, exciting. So how do we actually do refinement? Where does refinement happen and how does this whole thing end up being performant? Well the key to all of that is this SDF coordinator or stands for smart data fabric coordinator. And what this particular process is doing is essentially orchestrating all of these calls to all of these different downstream systems. It's aggregating, it's collecting that information it's aggregating it and it's refining it into that single payload that we saw get returned to the user. So really this coordinator is the main event when it comes to our data fabric. And in the IRIS platform we actually allow users to build these coordinators using web-based tool sets to make it intuitive. So we can take a sneak peek at what that looks like and as you can see it follows a flow chart like structure. So there's a start, there is an end and then there are these different arrows that point to different activities throughout the business process. And so there's all these different actions that are being taken within our coordinator. You can see an action for each of the calls to each of our different data sources to go retrieve information. And then we also have the sync call at the end that is in charge of essentially making sure that all of those responses come back before we package them together and send them out. So this becomes really crucial when we're creating that data fabric. And you know, this is a very simple data fabric example where we're just grabbing data and we're consolidating it together. But you can have really complex orchestrators and coordinators that do any number of different things. So for instance, I could inject SQL Logic into this or SQL code, I can have conditional logic, I can do looping, I can do error trapping and handling. So we're talking about a whole number of different features that can be included in this coordinator. So like I said, we have a really very simple process here that's just calling out, grabbing all those different data elements from all those different data sources and consolidating it. We'll look back at this coordinator in a second when we introduce or we make this data fabric a bit smarter and we start introducing that analytics piece to it. So this is in charge of the refinement. And so at this point in time we've looked at connections, collections, and refinements. And just to summarize what we've seen 'cause I always like to go back and take a look at everything that we've seen. We have our initial API connection we have our connections to our individual data sources and we have our coordinators there in the middle that are in charge of collecting the data and refining it into a single payload. As you can imagine, there's a lot going on behind the scenes of a smart data fabric, right? There's all these different processes that are interacting. So it's really important that your smart data fabric platform has really good traceability, really good logging 'cause you need to be able to know, you know, if there was an issue, where did that issue happen, in which connected process and how did it affect the other processes that are related to it. In IRIS, we have this concept called a visual trace. And what our clients use this for is basically to be able to step through the entire history of a request from when it initially came into the smart data fabric to when data was sent back out from that smart data fabric. So I didn't record the time but I bet if you recorded the time it was this time that we sent that request in. And you can see my patient's name and their medical record number here and you can see that that instigated four different calls to four different systems and they're represented by these arrows going out. So we sent something to chart script to our health record management system, to our clinical risk grouping application into my EMR through their FIRE server. So every request, every outbound application gets a request and we pull back all of those individual pieces of information from all of those different systems and we bundle them together. And for my FIRE lovers, here's our FIRE bundle that we got back from our FIRE server. So this is a really good way of being able to validate that I am appropriately grabbing the data from all these different applications and then ultimately consolidating it into one payload. Now we change this into a JSON format before we deliver it, but this is those data elements brought together. And this screen would also be used for being able to see things like error trapping or errors that were thrown alerts, warnings, developers might put log statements in just to validate that certain pieces of code are executing. So this really becomes the one stop shop for understanding what's happening behind the scenes with your data fabric. >> Etcher, who did what, when, where what did the machine do? What went wrong and where did that go wrong? >> Exactly. >> Right in your fingertips. >> Right, and I'm a visual person so a bunch of log files to me is not the most helpful. Well, being able to see this happened at this time in this location gives me that understanding I need to actually troubleshoot a problem. >> This business orchestration piece, can you say a little bit more about that? How people are using it? What's the business impact of the business orchestration? >> The business orchestration, especially in the smart data fabric is really that crucial part of being able to create a smart data fabric. So think of your business orchestrator as doing the heavy lifting of any kind of processing that involves data, right? It's bringing data in, it's analyzing that information, it's transforming that data, in a format that your consumer's not going to understand it's doing any additional injection of custom logic. So really your coordinator or that orchestrator that sits in the middle is the brains behind your smart data fabric. >> And this is available today? This all works? >> It's all available today. Yeah, it all works. And we have a number of clients that are using this technology to support these kinds of use cases. >> Awesome demo. Anything else you want to show us? >> Well we can keep going. 'Cause right now, I mean we can, oh, we're at 18 minutes. God help us. You can cut some of this. (laughs) I have a lot to say, but really this is our data fabric. The core competency of IRIS is making it smart, right? So I won't spend too much time on this but essentially if we go back to our coordinator here we can see here's that original that pipeline that we saw where we're pulling data from all these different systems and we're collecting it and we're sending it out. But then we see two more at the end here which involves getting a readmission prediction and then returning a prediction. So we can not only deliver data back as part of a smart data fabric but we can also deliver insights back to users and consumers based on data that we've aggregated as part of a smart data fabric. So in this scenario, we're actually taking all that data that we just looked at and we're running it through a machine learning model that exists within the smart data fabric pipeline and producing a readmission score to determine if this particular patient is at risk for readmission within the next 30 days. Which is a typical problem that we see in the healthcare space. So what's really exciting about what we're doing in the IRIS world is we're bringing analytics close to the data with integrated ML. So in this scenario we're actually creating the model, training the model, and then executing the model directly within the IRIS platform. So there's no shuffling of data, there's no external connections to make this happen. And it doesn't really require having a PhD in data science to understand how to do that. It leverages all really basic SQL like syntax to be able to construct and execute these predictions. So it's going one step further than the traditional data fabric example to introduce this ability to define actionable insights to our users based on the data that we've brought together. >> Well that readmission probability is huge. >> Yes. >> Right, because it directly affects the cost of for the provider and the patient, you know. So if you can anticipate the probability of readmission and either do things at that moment or you know, as an outpatient perhaps to minimize the probability then that's huge. That drops right to the bottom line. >> Absolutely, absolutely. And that really brings us from that data fabric to that smart data fabric at the end of the day which is what makes this so exciting. >> Awesome demo. >> Thank you. >> Fantastic people, are you cool? If people want to get in touch with you? >> Oh yes, absolutely. So you can find me on LinkedIn, Jessica Jowdy and we'd love to hear from you. I always love talking about this topic, so would be happy to engage on that. >> Great stuff, thank you Jess, appreciate it. >> Thank you so much. >> Okay, don't go away because in the next segment we're going to dig into the use cases where data fabric is driving business value. Stay right there.

Published Date : Feb 15 2023

SUMMARY :

for organizations to gain new insights And to that end we do also So you were showing hundreds of these APIs in the healthcare industry So the way that we handle that it's not a bolt-on as they like to say. that data fabric to ensure that we're able It's a lot here. So we went through the So it's a good place to stop. the ability to bring And so you have a rich collection So that platform needs to we're going into refine. that are related to it. so a bunch of log files to of being able to create this technology to support Anything else you want to show us? So in this scenario, we're Well that readmission and the patient, you know. to that smart data fabric So you can find me on you Jess, appreciate it. because in the next segment

ENTITIES

Entity	Category	Confidence
Jessica Jowdy	PERSON	0.99+
Joe Lichtenberg	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Jess Jowdy	PERSON	0.99+
Scott	PERSON	0.99+
Jess	PERSON	0.99+
18 minutes	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
32345	OTHER	0.99+
Python	TITLE	0.99+
Simmons	PERSON	0.99+
each	QUANTITY	0.99+
IRIS	ORGANIZATION	0.99+
third segment	QUANTITY	0.99+
Etcher	ORGANIZATION	0.99+
today	DATE	0.99+
LinkedIn	ORGANIZATION	0.98+
SQL	TITLE	0.98+
single platform	QUANTITY	0.98+
one	QUANTITY	0.98+
JSON	TITLE	0.96+
each data source	QUANTITY	0.96+
single	QUANTITY	0.95+
one step	QUANTITY	0.94+
one step	QUANTITY	0.94+
single backend	QUANTITY	0.92+
single response	QUANTITY	0.9+
two more	QUANTITY	0.85+
single payload	QUANTITY	0.84+
SQL Logic	TITLE	0.84+
a second	QUANTITY	0.83+
IRIS	TITLE	0.83+
four different segments	QUANTITY	0.82+
Postman	PERSON	0.78+
FIRE	TITLE	0.77+
SOAP	TITLE	0.76+
four different applications	QUANTITY	0.74+
one stop	QUANTITY	0.74+
Postman	TITLE	0.73+
one payload	QUANTITY	0.72+
each of	QUANTITY	0.71+
REST	TITLE	0.7+
Healthcare Field Engineering	ORGANIZATION	0.67+
next 30 days	DATE	0.65+
four	QUANTITY	0.63+
these APIs	QUANTITY	0.62+
second	QUANTITY	0.54+
God	PERSON	0.53+
every	QUANTITY	0.53+
services	QUANTITY	0.51+
H7	COMMERCIAL_ITEM	0.5+
application	QUANTITY	0.48+
FIRE	ORGANIZATION	0.38+
XML	TITLE	0.38+

Brian Stevens, Neural Magic | Cube Conversation

>> John: Hello and welcome to this cube conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We got a great conversation on making machine learning easier and more affordable in an era where everybody wants more machine learning and AI. We're featuring Neural Magic with the CEO is also Cube alumni, Brian Steve. CEO, Great to see you Brian. Thanks for coming on this cube conversation. Talk about machine learning. >> Brian: Hey John, happy to be here again. >> John: What a buzz that's going on right now? Machine learning, one of the hottest topics, AI front and center, kind of going mainstream. We're seeing the success of the, of the kind of NextGen capabilities in the enterprise and in apps. It's a really exciting time. So perfect timing. Great, great to have this conversation. Let's start with taking a minute to explain what you guys are doing over there at Neural Magic. I know there's some history there, neural networks, MIT. But the, the convergence of what's going on, this big wave hitting, it's an exciting time for you guys. Take a minute to explain the company and your mission. >> Brian: Sure, sure, sure. So, as you said, the company's Neural Magic and spun out at MIT four plus years ago, along with some people and, and some intellectual property. And you summarize it better than I can cause you said, we're just trying to make, you know, AI that much easier. And so, but like another level of specificity around it is. You know, in the world you have a lot of like data scientists really focusing on making AI work for whatever their use case is. And then the next phase of that, then they're looking at optimizing the models that they built. And then it's not good enough just to work on models. You got to put 'em into production. So, what we do is we make it easier to optimize the models that have been developed and trained and then trying to make it super simple when it comes time to deploying those in production and managing them. >> Brian: You know, we've seen this movie before with the cloud. You start to see abstractions come out. Data science we saw like was like the, the secret art of being like a data scientist now democratization of data. You're kind of seeing a similar wave with machine learning models, foundational models, some call it developers are getting involved. Model complexity's still there, but, but it's getting easier. There's almost like the democratization happening. You got complexity, you got deployment, it's challenges, cost, you got developers involved. So it's like how do you grow it? How do you get more horsepower? And then how do you make developers productive, right? So like, this seems to be the thread. So, so where, where do you see this going? Because there's going to be a massive demand for, I want to do more with my machine learning. But what's the data source? What's the formatting? This kind of a stack develop, what, what are you guys doing to address this? Can you take us through and demystify this, this wave that's hitting, that everyone's seeing? >> Brian: Yeah. Now like you said, like, you know, the democratization of all of it. And that brings me all the way back to like the roots of open source, right? When you think about like, like back in the day you had to build your own tech stack yourself. A lot of people probably probably don't remember that. And then you went, you're building, you're always starting on a body of code or a module that was out there with open source. And I think that's what I equate to where AI has gotten to with what you were talking about the foundational models that didn't really exist years ago. So you really were like putting the layers of your models together in the formulas and it was a lot of heavy lifting. And so there was so much time spent on development. With far too few success cases, you know, to get into production to solve like a business stereo technical need. But as these, what's happening is as these models are becoming foundational. It's meaning people don't have to start from scratch. They're actually able to, you know, the avant-garde now is start with existing model that almost does what you want, but then applying your data set to it. So it's, you know, it's really the industry moving forward. And then we, you know, and, and the best thing about it is open source plays a new dimension, but this time, you know, in the, in the realm of AI. And so to us though, like, you know, I've been like, I spent a career focusing on, I think on like the, not just the technical side, but the consumption of the technology and how it's still way too hard for somebody to actually like, operationalize technology that all those vendors throw at them. So I've always been like empathetic the user around like, you know what their job is once you give them great technology. And so it's still too difficult even with the foundational models because what happens is there's really this impedance mismatch between the development of the model and then where, where the model has to live and run and be deployed and the life cycle of the model, if you will. And so what we've done in our research is we've developed techniques to introduce what's known as sparsity into a machine learning model. It's already been developed and trained. And what that sparsity does is that unlocks by making that model so much smaller. So in many cases we can make a model 90 to 95% smaller, even smaller than that in research. So, and, and so by doing that, we do that in a way that preserves all the accuracy out of the foundational model as you talked about. So now all of a sudden you get this much smaller model just as accurate. And then the even more exciting part about it is we developed a software-based engine called Deep Source. And what that, what the Inference Runtime does is takes that now sparsified model and it runs it, but because you sparsified it, it only needs a fraction of the compute that it, that it would've needed otherwise. So what we've done is make these models much faster, much smaller, and then by pairing that with an inference runtime, you now can actually deploy that model anywhere you want on commodity hardware, right? So X 86 in the cloud, X 86 in the data center arm at the edge, it's like this massive unlock that happens because you get the, the state-of-the-art models, but you get 'em, you know, on the IT assets and the commodity infrastructure. That is where all the applications are running today. >> John: I want to get into the inference piece and the deep sparse you mentioned, but I first have to ask, you mentioned open source, Dave and I with some fellow cube alumnis. We're having a chat about, you know, the iPhone and Android moment where you got proprietary versus open source. You got a similar thing happening with some of these machine learning modules where there's a lot of proprietary things happening and there's open source movement is growing. So is there a balance there? Are they all trying to do the same thing? Is it more like a chip, you know, silicons involved, all kinds of things going on that are really fascinating from a science. What's your, what's your reaction to that? >> Brian: I think it's like anything that, you know, the way we talk about AI you think had been around for decades, but the reality is it's been some of the deep learning models. When we first, when we first started taking models that the brain team was working on at Google and billing APIs around them on Google Cloud where the first cloud to even have AI services was 2015, 2016. So when you think about it, it's really been what, 6 years since like this thing is even getting lift off. So I think with that, everybody's throwing everything at it. You know, there's tons of funded hardware thrown at specialty for training or inference new companies. There's legacy companies that are getting into like AI now and whether it's a, you know, a CPU company that's now building specialized ASEX for training. There's new tech stacks proprietary software and there's a ton of asset service. So it really is, you know, what's gone from nascent 8 years ago is the wild, wild west out there. So there's a, there's a little bit of everything right now and I think that makes sense because at the early part of any industry it really becomes really specialized. And that's the, you know, showing my age of like, you know, the early pilot of the two thousands, you know, red Hat people weren't running X 86 in enterprise back then and they thought it was a toy and they certainly weren't running open source, but you really, and it made sense that they weren't because it didn't deliver what they needed to at that time. So they needed specialty stacks, they needed expensive, they needed expensive hardware that did what an Oracle database needed to do. They needed proprietary software. But what happens is that commoditizes through both hardware and through open source and the same thing's really just starting with with AI. >> John: Yeah. And I think that's a great point before we to call that out because in any industry timing's everything, right? I mean I remember back in the 80s, late 80s and 90s, AI, you know, stuff was going on and it just wasn't, there wasn't enough horsepower, there wasn't enough tech. >> Brian: Yep. >> John: You mentioned some of the processing. So AI is this industry that has all these experts who have been itch scratching that itch for decades. And now with cloud and custom silicon. The tech fundamental at the lower end of the stack, if you will, on the performance side is significantly more performant. It's there you got more capabilities. >> Brian: Yeah. >> John: Now you're kicking into more software, faster software. So it just seems like we're at a tipping point where finally it's here, like that AI moment or machine learning and now data is, is involved. So this is where organizations I see really jumping in with the CEO mandate. Hey team, make ML work for us. Go figure it out. It's got to be an advantage for us. >> Brian: Yeah. >> John: So now they go, okay boss, we will. So what, what do they do? What's the steps does an enterprise take to get machine learning into their organizations? Cause you know, it's coming down from the boards, you know, how does this work for rob? >> Brian: Yeah. Like the, you know, the, what we're seeing is it's like anything, like it's, whether that was source adoption or whether that was cloud adoption, it always starts usually with one person. And increasingly it is the CEO, which realizes they're getting further behind the competition because they're not leaning in, you know, faster. But typically it really comes down to like a really strong practitioner that's inside the organization, right? And, that realizes that the number one goal isn't doing more and just training more models and and necessarily being proprietary about it. It's really around understanding the art of the possible. Something that's grounded in the art of the possible, what, what deep learning can do today and what business outcomes you can deliver, you know, if you can employ. And then there's well proven paths through that. It's just that because of where it's been, it's not that industrialized today. It's very much, you know, you see ML project by ML project is very snowflakey, right? And that was kind of the early days of open source as well. And so, we're just starting to get to the point where it's getting easier, it's getting more industrialized, there's less steps, there's less burdensome on developers, there's less burdensome on, on the deployment side. And we're trying to bring that, that whole last mile by saying, you know what? Deploying deep learning and AI models should be as easy as the as to deploy your application, right? You shouldn't have to take an extra step to deploy an AI model. It shouldn't have to require a new hardware, it shouldn't require a new process, a new DevOps model. It should be as simple as what you're already doing. >> John: What is the best practice for companies to effectively bring an acceptable level of machine learning and performance into their organizations? >> Brian: Yeah, I think like the, the number one start is like what you hinted at before is they, they have to know the use case. They have to, in most cases, you're going to find across every industry you know, that that problem's been tackled by some company, right? And then you have to have the best practice around fine-tuning the models already exist. So fine tuning that existing model. That foundational model on your unique dataset. You, you know, if you are in medical instruments, it's not good enough to identify that it's a medical instrument in the picture. You got to know what type of medical instrument. So there's always a fine tuning step. And so we've created open source tools that make it easy for you to do two things at once. You can fine tune that existing foundational model, whether that's in the language space or whether that's in the vision space. You can fine tune that on your dataset. And at the same time you get an optimized model that comes out the other end. So you get kind of both things. So you, you no longer have to worry about you're, we're freeing you from worrying about the complexity of that transfer learning, if you will. And we're freeing you from worrying about, well where am I going to deploy the model? Where does it need to be? Does it need to be on a device, an edge, a data center, a cloud edge? What kind of hardware is it? Is there enough hardware there? We're liberating you from all of that. Because what you want, what you can count on is there'll always be commodity capability, commodity CPUs where you want to deploy in abundance cause that's where your application is. And so all of a sudden we're just freeing you of that, of that whole step. >> John: Okay. Let's get into deep sparse because you mentioned that earlier. What inspired the creation of deep sparse and how does it differ from any other solutions in the market that are out there? >> Brian: Sure. So, so where unique is it? It starts by, by two things. One is what the industry's pretty good at from the optimization side is they're good at like this thing called quantization, which turns like, you know, big numbers into small numbers, lower precision. So a 32 bit representation of a, of AI weight into a bit. And they're good at like cutting out layers, which also takes away accuracy. What we've figured out is to take those, the industry techniques for those that are best practice, but we combined it with unstructured varsity. So by reducing that model by 90 to 95% in size, that's great because it's made it smaller. But we've taken that when it's the deep sparse engine, when you deploy it that looks at that model and says, because it's so much smaller, I no longer have to run the part of the model that's been essentially sparsified. So what that's done is, it's meant that you no longer need a supercomputer to run models because there's not nearly as much math and processing as there was before the model was optimized. So now what happens is, every CPU platform out there has, has an enormous amount of compute because we've sparsified the rest of it away. So you can pick a, you can pick your, your laptop and you have enough compute to run state-of-the-art models. The second thing that, and you need a software engine to do that cause it ignores the parts of the models. It doesn't need to run, which is what like specialized hardware can't do. The second part is it's then turned into a memory efficiency problem. So it's really around just getting memory, getting the models loaded into the cash of the computer and keeping it there. Never having to go back out to memory. So, so our techniques are both, we reduce the model size and then we only run the part of the model that matters and then we keep it all in cash. And so what that does is it gets us to like these, these low, low latency faster and we're able to increase, you know, the CPU processing by an order magnitude. >> John: Yeah. That low latency is key. And you got developers, you know, co coding super fast. We'll get to the developer angle in a second. I want to just follow up on this, this motivation behind the, the deep sparse because you know, as we were talking earlier before we came on camera about the old days, I mean, not too long ago, virtualization and VMware abstracted away the os from, from the hardware rights and the server virtualization changed the game. >> Brian: Yeah. >> John: And that basically invented cloud computing as we know it today. So, so we see that abstraction. >> Brian: Yeah. >> John: There seems to be a motivation behind abstracting the way the machine learning models away from the hardware. And that seems to be bringing advantages to the AI growth. Can you elaborate on, is that true? And it's, what's your comment? >> Brian: It's true. I think it's true for us. I don't think the industry's there yet, honestly. Cause I think the industry still is of that mindset that if I took, if it took these expensive GPUs to train my model, then I want to run my model on those same expensive GPUs. Because there's often like not a separation between the people that are developing AI and the people that have to manage and deploy at where you need it. So the reality is, is that that's everything that we're after. Like, do we decrease the cost? Yes. Do we make the models smaller? Yes. Do we make them faster? A yes. But I think the most amazing power is that we've turned AI into a docker based microservice. And so like who in the industry wants to deploy their apps the old way on a os without virtualization, without docker, without Kubernetes, without microservices, without service mesh without serverless. You want all those tools for your apps by converting AI models. So they can be run inside a docker container with no apologies around latency and performance cause it's faster. You get the best of that whole world that you just talked about, which is, you know, what we're calling, you know, software delivered AI. So now the AI lives in the same world. Organizations that have gone through that digital cloud transformation with their app infrastructure. AI fits into that world. >> John: And this is where the abstraction concepts matter. When you have these inflection points, the convergence of compute data, machine learning that powers AI, it really becomes a developer opportunity. Because now applications and businesses, when they actually go through the digital transformation, their businesses are completely transformed. There is no IT. Developers are the application. They are the company, right? So AI will be part of whatever business or app will be out there. So there is a application developer angle here. Brian, can you explain >> Brian: Oh completely. >> John: how they're going to use this? Because you mentioned docker container microservice, I mean this really is an insane flipping of the script for developers. >> Brian: Yeah. >> John: So what's that look like? >> Brian: Well speak, it's because like AI's kind of, I mean, again, like it's come so fast. So you figure there's my app team and here's my AI team, right? And they're in different places and the AI team is dragging in specialized infrastructure in support of that as well. And that's not how app developers think. Like they've ran on fungible infrastructure that subtracted and virtualized forever, right? And so what we've done is we've, in addition to fitting into that world that they, that they like, we've also made it simple for them for they don't have to be a machine learning engineer to be able to experiment with these foundational models and transfer learning 'em. We've done that. So they can do that in a couple of commands and it has a simple API that they can either link to their application directly as a library to make difference calls or they can stand it up as a standalone, you know, scale up, scale out inference server. They get two choices. But it really fits into that, you know, you know that world that the modern developer, whether they're just using Python or C or otherwise, we made it just simple. So as opposed to like Go learn something else, they kind of don't have to. So in a way though, it's made it. It's almost made it hard because people expect when we talk to 'em for the first time to be the old way. Like, how do you look like a piece of hardware? Are you compatible with my existing hardware that runs ML? Like, no, we're, we're not. Because you don't need that stack anymore. All you need is a library called to make your prediction and that's it. That's it. >> John: Well, I mean, we were joking on Twitter the other day with someone saying, is AI a pet or a cattle? Right? Because they love their, their AI bots right now. So, so I'd say pet there. But you look at a lot of, there's going to be a lot of AI. So on a more serious note, you mentioned in microservices, will deep sparse have an API for developers? And how does that look like? What do I do? >> Brian: Yeah. >> John: tell me what my, as a developer, what's the roadmap look like? What's the >> Brian: Yeah, it, it really looks, it really can go in both modes. It can go in a standalone server mode where it handles, you know, rest API and it can scale out with ES as the workload comes up and scale back and like try to make hardware do that. Hardware may scale back, but it's just sitting there dormant, you know, so with this, it scales the same way your application needs to. And then for a developer, they basically just, they just, the PIP install de sparse, you know, has one commanded to do an install, and then they do two calls, really. The first call is a library call that the app makes to create the model. And models really already trained, but they, it's called a model create call. And the second command they do is they make a call to do a prediction. And it's as simple as that. So it's, it's AI's as simple as using any other library that the developers are already using, which I, which sounds hard to fathom because it is just so simplified. >> John: Software delivered AI. Okay, that's a cool thing. I believe in it personally. I think that's the way to go. I think there's going to be plenty of hardware options if you look at the advances of cloud players that got more silicon coming out. Yeah. More GPU. I mean, there's more instance, I mean, everything's out there right now. So the question is how does that evolve in your mind? Because that's seems to be key. You have open source projects emerging. What, what path does this take? Is there a parallel mental model that you see, Brian, that is similar? You mentioned open source earlier. Is it more like a VMware virtualization thing or is it more of a cloud thing? Is there Yeah. Is it going to evolve in a, in a trajectory that looks similar to what we might've seen in the past? >> Brian: Yeah, we're, you know, when I, when when I got involved with the company, what I, when I thought about it and I was reasoning about it, like, do you, you know, you want to, like, we all do when you want to join something full-time. I thought about it and said, where will the industry eventually get to? Right? To fully realize the value of, of deep learning and what's plausible as it evolves. And to me, like I, I know it's the old adage of, you know, you know, software, its hardware, cloudy software. But it truly was like, you know, we can solve these problems in software. Like there's nothing special that's happening at the hardware layer and the processing AI. The reality is that it's just early in the industry. So the view that that we had was like, this is eventually the best place where the industry will be, is the liberation of being able to run AI anywhere. Like you're really not democratizing, you democratize the model. But if you can't run the model anywhere you want because these models are getting bigger and bigger with these large language models, then you're kind of not democratizing. And if you got to go and like by a cluster to run this thing on. So the democratization comes by if all of a sudden that model can be consumed anywhere on demand without planning, without provisioning, wherever infrastructure is. And so I think that's with or without Neural Magic, that's where the industry will go and will get to. I think we're the leaders, leaders in getting it there. It's right because we're more advanced on these techniques. >> John: Yeah. And your background too. You've seen OpenStack, pre-cloud, you saw open source grow and still exponentially growing. And so you have the same similar dynamic with machine learning models growing. And they're also segmenting into almost a, an ML stack or foundational model as we talk about. So you're starting to see the formation of tooling inference. So a lot of components coming. It's almost a stack, it's almost a, it literally is like an operating system problem space, you know? How do you run things, how do you link things? How do you bring things together? Is that what's going on here? Is this like a data modeling operating environment kind of red hat type thing going on? Like. >> Brian: Yeah. Yeah. Like I think there is, you know, I thought about that too. And I think there is the role of like distribution, because the industrialization not happening fast enough of this. Like, can I go back to like every customers, every, every user does it in their own kind of way. Like it's not, everyone's a little bit of a snowflake. And I think that's okay. There's definitely plenty of companies that want to come in and say, well, this is the way it's going to be and we industrialize it as long as you do it our way. The reality is technology doesn't get industrialized by one company just saying, do it our way. And so that's why like we've taken the approach through open source by saying like, Hey, you haven't really industrialized it if you said. We made it simple, but you always got to run AI here. Yeah, right. You only like really industrialize it if you break it down into components that are simple to use and they work integrated in the stack the way you want them to. And so to me, that first principles was getting thing into microservices and dockers that could be run on VMware, OpenShare on the cloud in the edge. And so that's the, that's the real part that we're happening with. The other part, like I do agree, like I think it's going to quickly move into less about the model. Less about the training of the model and the transfer learning, you know, the data set of the model. We're taking away the complexity of optimization. Giving liberating deployment to be anywhere. And I think the last mile, John is going to be around the ML ops around that. Because it's easy to think of like soft now that it's just a software problem, we've turned it into a software problem. So it's easy to think of software as like kind of a point release, but that's not the reality, right? It's a life cycle. And it's, and so I think ML very much brings in the what is the lifecycle of that deployment? And, you know, you get into more interesting conversations, to be honest than like, once you've deployed in a docking container is around like model drift and accuracy and the dataset changes and the user changes is how do you become from an ML perspective of where of that sending signal back retraining. And, and that's where I think a lot of the, in more of the innovation's going to start to move there. >> John: Yeah. And software also, the software problem, the software opportunity as well is developer focused. And if you look at the cloud native landscape now, similar stacks developing a lot of components. A lot of things to, to stitch together a lot of things that are automating under the hood. A lot of developer productivity conversations. I think this is going to go down that same road. I want to get your thoughts because developers will set the pace. And this is something that's clear in this next wave developer productivity. They're the defacto standards bodies. They will decide what microservices check, API check. Now, skill gap is going to be a problem because it's relatively new. So model sprawl, model sizes, proprietary versus open. There has to be a way to kind of crunch that down into a, like a DevOps, like just make it, get the developer out of the, the muck. So what's your view? Are we early days like that? Or what's the young kid in college studying CS or whatever degree who comes into this with, with both feet? What are they doing? >> Brian: I'll probably say like the, the non-popular answer to that. A little bit is it's happening so fast that it's going to get kind of boring fast. Meaning like, yeah, you could go to school and go to MIT, right? Sorry. Like, and you could get a hold through end like becoming a model architect, like inventing the next model, right? And the layers and combining 'em and et cetera, et cetera. And then what operators and, and building a model that's bigger than the last one and trains faster, right? And there will be those people, right? That actually, like they're building the engines the same way. You know, I grew up as an infrastructure software developer. There's not a lot of companies that hire those anymore because they're all sitting inside of three big clouds. Yeah. Right? So you better be a good app developer, but I think what you're going to see is before you had to be everything, you had to be the, if you were going to use infrastructure, you had to know how to build infrastructure. And I think the same thing's true around is quickly exiting ML is to be able to use ML in your company, you better be like, great at every aspect of ML, including every intricacy inside of the model and every operation's doing, that's quickly changing. Like, you're going to start with a starting point. You know, in the future you're not going to be like cracking open these GPT models, you're going to just be pulling them off the shelf, fine tuning 'em and go. You don't have to invent it. You don't have to understand it. And I think that's going to be a pivot point, you know, in the industry between, you know, what's the future? What's, what's the future of a, a data scientist? ML engineer researcher look like? >> John: I think that's, the outcome's going to be determined. I mean, you mentioned, you know, doing it yourself what an SRE is for a Google with the servers scale's huge. So yeah, it might have to, at the beginning get boring, you get obsolete quickly, but that means it's progressing. So, The scale becomes huge. And that's where I think it's going to be interesting when we see that scale. >> Brian: Yep. Yeah, I think that's right. I think that's right. And we always, and, and what I've always said, and much the, again, the distribute into my ML team is that I want every developer to be as adept at being able take advantage of ML as non ML engineer, right? It's got to be that simple. And I think, I think it's getting there. I really do. >> John: Well, Brian, great, great to have you on theCUBE here on this cube conversation. As part of the startup showcase that's coming up. You're going to be featured. Or your company would featured on the upcoming ABRA startup showcase on making machine learning easier and more affordable as more machine learning models come in. You guys got deep sparse and some great technology. We're going to dig into that next time. I'll give you the final word right now. What do you see for the company? What are you guys looking for? Give a plug for the company right now. >> Brian: Oh, give a plug that I haven't already doubled in as the plug. >> John: You're hiring engineers, I assume from MIT and other places. >> Brian: Yep. I think like the, the biggest thing is like, like we're on the developer side. We're here to make this easy. The majority of inference today is, is on CPUs already, believe it or not, as much as kind of, we like to talk about hardware and specialized hardware. The majority is already on CPUs. We're basically bringing 95% cost savings to CPUs through this acceleration. So, but we're trying to do it in a way that makes it community first. So I think the, the shout out would be come find the Neural Magic community and engage with us and you'll find, you know, a thousand other like-minded people in Slack that are willing to help you as well as our engineers. And, and let's, let's go take on some successful AI deployments. >> John: Exciting times. This is, I think one of the pivotal moments, NextGen data, machine learning, and now starting to see AI not be that chat bot, just, you know, customer support or some basic natural language processing thing. You're starting to see real innovation. Brian Stevens, CEO of Neural Magic, bringing the magic here. Thanks for the time. Great conversation. >> Brian: Thanks John. >> John: Thanks for joining me. >> Brian: Cheers. Thank you. >> John: Okay. I'm John Furrier, host of theCUBE here in Palo Alto, California for this cube conversation with Brian Stevens. Thanks for watching.

Published Date : Feb 13 2023

SUMMARY :

CEO, Great to see you Brian. happy to be here again. minute to explain what you guys in the world you have a lot So it's like how do you grow it? like back in the day you had and the deep sparse you And that's the, you know, late 80s and 90s, AI, you know, It's there you got more capabilities. the CEO mandate. Cause you know, it's coming the as to deploy your application, right? And at the same time you get in the market that are out meant that you no longer need a the deep sparse because you know, John: And that basically And that seems to be bringing and the people that have to the convergence of compute data, insane flipping of the script But it really fits into that, you know, But you look at a lot of, call that the app makes to model that you see, Brian, the old adage of, you know, And so you have the same the way you want them to. And if you look at the to see is before you had to be I mean, you mentioned, you know, the distribute into my ML team great to have you on theCUBE already doubled in as the plug. and other places. the biggest thing is like, of the pivotal moments, Brian: Cheers. host of theCUBE here in Palo Alto,

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Brian	PERSON	0.99+
Brian Stevens	PERSON	0.99+
Dave	PERSON	0.99+
95%	QUANTITY	0.99+
2015	DATE	0.99+
John Furrier	PERSON	0.99+
90	QUANTITY	0.99+
2016	DATE	0.99+
32 bit	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Brian Steve	PERSON	0.99+
Neural Magic	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two calls	QUANTITY	0.99+
both things	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Palo Alto, California	LOCATION	0.99+
second thing	QUANTITY	0.99+
both	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
first call	QUANTITY	0.99+
two things	QUANTITY	0.99+
second part	QUANTITY	0.99+
One	QUANTITY	0.99+
both feet	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
both modes	QUANTITY	0.98+
today	DATE	0.98+
80s	DATE	0.98+
first	QUANTITY	0.98+
second command	QUANTITY	0.98+

Tia Dubuisson | Special Program Series: Women of the Cloud

(upbeat music) >> Welcome to this special program series by theCUBE, "Women of the Cloud", brought to you by AWS. I'm your host, Lisa Martin. Very pleased to welcome my next guest, Tia Dubuisson, president, co-founder of Belle Fleur Technologies. Tia, welcome to the program. It's great to have you. >> Thank you so much, Lisa. I'm very happy to be here, thank you. >> Tell me a little bit about you, a little bit about Belle Fleur Technologies and your current role. >> Yeah, so myself, a little bit about me. I'm actually a former microbiologist, so we'll talk a little bit more about that and my journey into tech and shifting over into helping others, right? Belle Fleur technologies was birthed basically after I got a wake up call in the lab and saw that data was really going to be driving a lot of decision making, you know, in not so near future, which we're seeing now, and that was probably a good 11 years ago and you're seeing almost data driven everything or at least conversations about that now. So I have to say that was a good shift. And how we help customers, we're a consulting partner, and so helping them to make a journey that maybe I made on maybe more of an individual level to shift into you know, what does that look like? You have data but gaining the value out of it to actually make, you know, decisions. And so helping our customers to actually do the assessment around the type of data that they have taking them through a process all the way through to insights that they could then look at how can we monetize that is where we actually play and that is our specialty. >> Okay, my heart skipped a beat when you said you were a microbiologist because that's what I studied in undergrad. Oh my gosh, isn't that crazy? That's what we have in common. >> I'm super excited about that, yes. >> Yes, and I got segued into tech as well so we could chat for hours about that I'm sure of it. But, you know, you bring up such a great point, especially science being so data driven. Every industry is data driven. Every company has to be a data company and to help organizations really understand where their data is it's growing obviously continuously, exponentiation how to extract value from it is where a lot of organizations really struggle. So it sounds like that Belle Fleur comes in and really helps organizations to tackle that challenge so that they can extract value from the data that will give them that competitive advantage that they're looking for. >> Absolutely, absolutely, Lisa. >> So talk to me a little bit about your career path was zig-zaggy which I love, so is mine. What are some recommendations that you would have for others watching this program that are really looking to step that ladder in tech from a career perspective? >> Well, I think, you know if I pull from my own individual experience, I would definitely say when you have that aha moment, try to investigate a little bit more about that. I was blessed in the sense that I was married to a computer scientist, so I was able to go home and kind of tell him, hey, I just saw a demonstration that blew me away. We were doing drug discovery work, and we were going to be able to use a computer program to basically help us to narrow the focus of our drug discovery work to see which drugs would be most active before we even synthesized them. And so that was going to save us a lot of money, a lot of time. Drug discovery work is a guessing game, itty house. So if a computer can actually make a million different compounds in a month, I knew that was way more than me and the whole team could make at the bench and then order them by activity. So I came home and I told him that and he said, oh, in 10 years everything will be data driven, no doubt. And we started to have these conversations. And so then I started to then investigate a little more. I started taking courses, dusting off my Python, my R trying to see, you know, where else is data, you know, king. And basically it was everywhere. I wasn't seeing a lot of people at that time really using their data. There was really dark data still, right? They were collecting it but not really using it. And so I said, I think this is something I can help companies do. And I was really excited to really learn more about that. So I started to go learn, pick up certification. So then I'm starting to reinvest in myself. I would really highly advise you once you find that this is part of your passion. You know, find a mentor. I was, thankfully I was already married to a mentor, but there are other mentors and he wasn't my only mentor. There were others, right, to help you along this journey 'cause no one person rules, I think rules at all, right? When you're trying to make this journey and try to make this shift because it is complex, and so you want to make sure you have your tribe, right? That's going to get you there and you want to make sure that you can contribute to the tribe. So I always tried to find ways that I could actually contribute to different projects, right? Even if they're open, you know, projects, hackathons go to boot camps, a lot of them are free, some of them not so free but pretty close. And I think it's, you know, kind of lowers that bar to access where you can kind of take a little peek and you can even go to some that are, you know, driven from an area that you're interested in. If you're interested in healthcare, do a hack for good around healthcare. You know, try to get involved. You'll meet a lot of good people that I think will be very happy to help steward you along the way as you try to navigate these waters, 'cause there is no straight path, right? There is no A plus B gets you to C. You really kind of have to navigate those waters. But I would definitely say get the exposure, make a decision around your passion, meet, you know, nice people at boot camps, you know, workshops, hackathons and then go for some of those industry certifications. Do an do an online search, you know and find out what are the top 10 certifications that would help to support a role that you're looking for, right, in the area that you're passionate about. And then invest in yourself, study for it go for those things, make plans, right? And bounce those off of your mentor. I think they'll be very impressed that you laid out plans and you're actually meeting those goals. They'll be more inclined to actually invest back in you, as well. >> Absolutely, and I love how you said invest in yourself. You laid out some really great tactical recommendations and guidelines. There's very few paths do I come across in tech that have been linear. Most of them have been like yours and mine very zig-zaggy. But the most important thing is investing in yourself. And sometimes I'll hear people say things like create personal board of directors and that kind of reminded me of some of the things that you said, to have those mentors, have those sponsors. To your point, after you invest in you and have those folks invest in you as well. That's great advice, Tia. >> Awesome, thank you so much. Yeah, absolutely, we have to invest in each other. I think that that's the better together story here, right? >> I do too, it's got to be symbiotic. I'll bring up a a biology word for you, symbiotic. (laughs) >> (laughing) Yes, symbiotic. >> Yes, let's talk a little bit now about some of the specific projects where you've helped either internal customers or external customers solve problems related to cloud. >> Yeah, so I would say from an internal customer standpoint, that's what we call our employees, our our BFFs, right, our Belle Fleur friends. We want to make sure that we're investing in them just as much as we do our external customers. If you have happy internal customers, you're definitely without a shadow of a doubt going to be able to solution and really have happy external customers. So you got, you know, everything starts at home first, right? So far as you know, success stories, I would say from the internal customers is really looking at how to upscale and reskill not just junior talent but senior talent. Probably over the last two and a half years, we've been working very closely with a couple of non-profits, community colleges that now have cloud computing certificates that you can get, and also bachelor's degrees, and actually creating a talent pipeline, a playbook for a talent pipeline, to reskill and upskill, to make sure that people have the skill sets that are in market today. We were seeing that there was a gap between classroom and industry as we were trying to hire. And so we wanted to be a part of the narrative not just point out the problem. But how can we really dig in there? And so, it's been tested, tried and true this playbook over 300 different interns, as well as apprentices. So we're super excited to actually have a playbook that, you know, we're able to pull from that we're now sharing with our external customers. They are also struggling with the talent pipeline. They said, hey, you come and you build these solutions so, you know, internally we need to be reskilled and we need to be skilled up and how can we work alongside you and your team not just to build out the solution but for the longer term? How can we actually build out a bench that's healthy, right? That can keep up with the pace, right? That cutting edge pace of innovation and get right in there. And so it's been really great to work with a good majority of our customers are very quite interested in the how. They maybe don't have that playbook internally or that process internally, which tends to be a challenge. So I would say, so far as cloud computing, in addition to just solving, you know, technical problems that is something in parallel that you equally have to give a lot of respect to, right? >> Yeah, absolutely. Speaking of the talent pipeline, I want to get your thoughts on where we are with respect to diversity. We talk about DEI a lot in technology but there's still challenges there. What are some of those challenges that you see and how can organizations really correct those challenges to build a diverse talent pipeline? >> That's a great question. I would say the challenges, I would call 'em the three A's, access, acceleration, and acceptance. And I think what we found with just doing this journey in the last two and a half years really documenting what are those challenges and how can we, you know, iterate to kind of just get past those challenges and just blow right through the doors and say, hey, there's ways that we can introduce access. And so joining forces, like we said with those nonprofits and those community colleges that are already, I think we all have different pieces of the puzzle, and I think we're all trying to give different pieces of access, but how do we draw a thread through it? And I think that's what our playbook attempts to do. I mean because when we say in tech that is so vast and so even within tech we say, okay, within tech these are the areas where we play, right? We have a playbook around data and analytics and we're now working from (indistinct) machine learning. And so we're looking at individuals that are coming from backgrounds that maybe are not typical, right? Maybe not a computer science degree, maybe they're biologists, like ourselves, you know, maybe that's how they started. Maybe they're psychologists. We have a few psychologists on the team. We have accountants on the team. And so what happens is that we're able to go into these different groups that we're partnered with and actually showcase to them from an access standpoint, how is tech really intersecting. I don't like to use the word disrupting, but intersecting with, you know, the traditional accounting degree, with a traditional biology degree. Did you know that this was happening? You know, and try to peak their interests and if they're interested in learning more taking them through that process. A very similar process that I had to make that decision you know, over a decade ago to really, you know, look at ways to reskill myself. And so we've put together different programs with those nonprofits and the colleges and other partners as well to make sure that we're moving them along the way and the path of access, and then, you know, also giving, you know some acceleration around some of the different programs. Some of the colleges are giving scholarships, which is awesome, with some of their partners to accelerate some of the people through our program to actually get some of those skill sets that are very applicable. Helping them to understand how their psychology background actually plays a part in that. So really not using random examples but really examples from their traditional learning and saying, you know, this is how this applies in the tech world. And so then it really helps to lower that bar, right? You know, so that they can really, not only have access, but really accelerate because now it's applied. And so when you are able to then apply it, show them how it can be applied in other industries, right? Whether they're similar or not, we all have data and data takes a very similar path in an interesting way. So once they're able to dive in there and then the acceptance, so then making those partnerships with our customers and, you know, other industries that maybe don't have this talent pipeline but would like to have that. They partner with us for the pipeline and so making sure that either they land with us or with one of our customers where they can now showcase what they've learned. They can go in and be more, maybe more junior at those companies, but they're able to grow over a two year cycle with that company that has an agreement that they're actually going to nurture that talent and really, you know, invest back in people who have invested in themselves. >> I love what you just described as four A's. It's so intentional and I think that's what a lot of organizations miss with respect to diversity is it's not, and it's not done with intention and interest as it should be, but it sounds like what you've developed is a fantastic playbook to provide access, to provide that ability to accelerate, to be able to apply their skills. Really kudos to that because my cheeks were hurting from smiling with what you were describing. It's just, it's so needed. There's so much opportunity out there, especially for people who might be on a zig or a zag and not sure where what to do next. Showing them, giving them the access, showing them what they can do and how it applies to their industry with data that's where the world is going. So I love that, very exciting. Last couple questions for you as we wrap up our time here. What are some of the things that you see next in cloud that are evolving that excite you about where we're going? >> I'm super excited. It brings you back to the A's. I think that companies of all sorts, right, have already gotten a lot of access because they can build a, you know, they can build a not a server farm, but necessarily they can have the power of the same computing, right, as some of the larger enterprises, whether you're a startup or, you know, smaller, medium-sized business. So I'm super excited that it's going from, I think more of a solution conversation where you're a lot closer to the end goal even from the first assessment conversation and less of an infrastructure kind of conversation where you're talking about the different services around cloud computing and, you know, inside those. And so I'm super excited about that. I think, you'll see a lot of solutions being kind of more or less pre-baked ready for those buy versus build conversations. You'll still have to configure. You'll still have to integrate, but I think we're going to all live around the API. I see a lot of APIs, you know, driving some really great SaaS applications that are really then connecting data to everything. And then it's not just about having that data that can then be shared across the organization, but even organizational units across the enterprise can self-serve from those analytics and those insights instead of, you know, I think back on one of our customers, they were a manufacturer and really it was their accounting team that brought us in and they said, listen, we need to get insights during a manufacturing run to make decisions if we're profitable or not. Right now, we're manually trying to wrangle the data as accountants across different, you know, even different states, right, to get this information and we're not getting the insights, and we're scratching the surface 'cause we don't have that time until a month after it's already shipped. There's really at that point you can't make a decision. And so they really wanted to change that. They really wanted to look at profitability. They really wanted to look at how can we go back to just being accountants? Like we don't want to be data wranglers. >> Right. >> And I think a lot of our customers are in that boat. They don't want to manually wrangle data. How can you help us to at least make it to where it's more of a self-service, and we're consuming, not the data, but the insights, right? So we can be actionable on the insights. And that's what I'm super excited about, and that's what I think you'll see become easier and easier for companies to be able to do with cloud computing. >> Which is so exciting because the frontier is endless but as every company, whether it's a retailer, or a manufacturer, or a life sciences organization have to be a data company these days. There's no choice. You have to be able to serve customers 'cause of course we have the demand as consumers in our personal lives and our business lives. We want that data to deliver relevant content to us. And so organizations have to work with folks like you to be able to do that. Tia, it's been such a pleasure having you on the program. Thank you so much for giving us some of your time walking us through your interesting background and some of the great techniques that you're employing at your company to really help drive organizations to be successful with with the talent pipeline, with the cloud. We really appreciate your insights. >> Thank you so much, Lisa. Appreciate you, theCUBE, AWS as well, thank you. >> Yeah, you're very welcome. For Tia Dubuisson, I'm Lisa Martin. You're watching theCube's special program series, "Women of the Cloud", brought to you by AWS. Thanks for watching. (upbeat music)

Published Date : Feb 9 2023

SUMMARY :

brought to you by AWS. Thank you so much, Lisa. and your current role. and so helping them to make beat when you said you were and to help organizations that you would have some that are, you know, of the things that you said, Awesome, thank you so much. I do too, it's got to be symbiotic. problems related to cloud. in addition to just solving, you know, challenges that you see ago to really, you know, that excite you about where we're going? and those insights instead of, you know, to do with cloud computing. And so organizations have to work Thank you so much, Lisa. brought to you by AWS.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Tia Dubuisson	PERSON	0.99+
Lisa Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tia dubuson	PERSON	0.99+
Lisa	PERSON	0.99+
Bellflower Technologies	ORGANIZATION	0.99+
Belle Fleur Technologies	ORGANIZATION	0.99+
Women of the Cloud	TITLE	0.99+
Tia	PERSON	0.99+
Python	TITLE	0.99+
one	QUANTITY	0.99+
Bellflower Technologies P.O	ORGANIZATION	0.99+
11 years ago	DATE	0.99+
10 years	QUANTITY	0.99+
over 300 different interns	QUANTITY	0.98+
Belle Fleur	ORGANIZATION	0.98+
Belfair	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.97+
python	TITLE	0.96+
over 300 different interns	QUANTITY	0.96+
over a two-year cycle	QUANTITY	0.94+
first	QUANTITY	0.93+
first assessment	QUANTITY	0.92+
Playbook	COMMERCIAL_ITEM	0.92+
two year	QUANTITY	0.91+
today	DATE	0.91+
a million different compounds	QUANTITY	0.9+
over a decade ago	DATE	0.86+
first assessment	QUANTITY	0.86+
lot of people	QUANTITY	0.85+
couple questions	QUANTITY	0.83+
over a decade ago	DATE	0.83+
president	PERSON	0.82+
and a half years	QUANTITY	0.8+
10 certifications	QUANTITY	0.78+
a month	QUANTITY	0.77+
four	QUANTITY	0.77+
a million different compounds	QUANTITY	0.76+
three	QUANTITY	0.76+
lot of money	QUANTITY	0.76+
a month	QUANTITY	0.75+
top 10 certifications	QUANTITY	0.73+
two and a half years	QUANTITY	0.72+
couple	QUANTITY	0.68+
over	QUANTITY	0.68+
money	QUANTITY	0.67+
two and a half years	QUANTITY	0.65+
theCube	ORGANIZATION	0.63+
last	DATE	0.63+
Playbook	TITLE	0.61+
Bell	ORGANIZATION	0.61+
of people	QUANTITY	0.6+
lot	QUANTITY	0.57+
last two	DATE	0.5+
Belle	PERSON	0.42+
Fleur	ORGANIZATION	0.39+
Three	ORGANIZATION	0.37+
Cube	PERSON	0.36+

Breaking Analysis: Enterprise Technology Predictions 2023

(upbeat music beginning) >> From the Cube Studios in Palo Alto and Boston, bringing you data-driven insights from the Cube and ETR, this is "Breaking Analysis" with Dave Vellante. >> Making predictions about the future of enterprise tech is more challenging if you strive to lay down forecasts that are measurable. In other words, if you make a prediction, you should be able to look back a year later and say, with some degree of certainty, whether the prediction came true or not, with evidence to back that up. Hello and welcome to this week's Wikibon Cube Insights, powered by ETR. In this breaking analysis, we aim to do just that, with predictions about the macro IT spending environment, cost optimization, security, lots to talk about there, generative AI, cloud, and of course supercloud, blockchain adoption, data platforms, including commentary on Databricks, snowflake, and other key players, automation, events, and we may even have some bonus predictions around quantum computing, and perhaps some other areas. To make all this happen, we welcome back, for the third year in a row, my colleague and friend Eric Bradley from ETR. Eric, thanks for all you do for the community, and thanks for being part of this program. Again. >> I wouldn't miss it for the world. I always enjoy this one. Dave, good to see you. >> Yeah, so let me bring up this next slide and show you, actually come back to me if you would. I got to show the audience this. These are the inbounds that we got from PR firms starting in October around predictions. They know we do prediction posts. And so they'll send literally thousands and thousands of predictions from hundreds of experts in the industry, technologists, consultants, et cetera. And if you bring up the slide I can show you sort of the pattern that developed here. 40% of these thousands of predictions were from cyber. You had AI and data. If you combine those, it's still not close to cyber. Cost optimization was a big thing. Of course, cloud, some on DevOps, and software. Digital... Digital transformation got, you know, some lip service and SaaS. And then there was other, it's kind of around 2%. So quite remarkable, when you think about the focus on cyber, Eric. >> Yeah, there's two reasons why I think it makes sense, though. One, the cybersecurity companies have a lot of cash, so therefore the PR firms might be working a little bit harder for them than some of their other clients. (laughs) And then secondly, as you know, for multiple years now, when we do our macro survey, we ask, "What's your number one spending priority?" And again, it's security. It just isn't going anywhere. It just stays at the top. So I'm actually not that surprised by that little pie chart there, but I was shocked that SaaS was only 5%. You know, going back 10 years ago, that would've been the only thing anyone was talking about. >> Yeah. So true. All right, let's get into it. First prediction, we always start with kind of tech spending. Number one is tech spending increases between four and 5%. ETR has currently got it at 4.6% coming into 2023. This has been a consistently downward trend all year. We started, you know, much, much higher as we've been reporting. Bottom line is the fed is still in control. They're going to ease up on tightening, is the expectation, they're going to shoot for a soft landing. But you know, my feeling is this slingshot economy is going to continue, and it's going to continue to confound, whether it's supply chains or spending. The, the interesting thing about the ETR data, Eric, and I want you to comment on this, the largest companies are the most aggressive to cut. They're laying off, smaller firms are spending faster. They're actually growing at a much larger, faster rate as are companies in EMEA. And that's a surprise. That's outpacing the US and APAC. Chime in on this, Eric. >> Yeah, I was surprised on all of that. First on the higher level spending, we are definitely seeing it coming down, but the interesting thing here is headlines are making it worse. The huge research shop recently said 0% growth. We're coming in at 4.6%. And just so everyone knows, this is not us guessing, we asked 1,525 IT decision-makers what their budget growth will be, and they came in at 4.6%. Now there's a huge disparity, as you mentioned. The Fortune 500, global 2000, barely at 2% growth, but small, it's at 7%. So we're at a situation right now where the smaller companies are still playing a little bit of catch up on digital transformation, and they're spending money. The largest companies that have the most to lose from a recession are being more trepidatious, obviously. So they're playing a "Wait and see." And I hope we don't talk ourselves into a recession. Certainly the headlines and some of their research shops are helping it along. But another interesting comment here is, you know, energy and utilities used to be called an orphan and widow stock group, right? They are spending more than anyone, more than financials insurance, more than retail consumer. So right now it's being driven by mid, small, and energy and utilities. They're all spending like gangbusters, like nothing's happening. And it's the rest of everyone else that's being very cautious. >> Yeah, so very unpredictable right now. All right, let's go to number two. Cost optimization remains a major theme in 2023. We've been reporting on this. You've, we've shown a chart here. What's the primary method that your organization plans to use? You asked this question of those individuals that cited that they were going to reduce their spend and- >> Mhm. >> consolidating redundant vendors, you know, still leads the way, you know, far behind, cloud optimization is second, but it, but cloud continues to outpace legacy on-prem spending, no doubt. Somebody, it was, the guy's name was Alexander Feiglstorfer from Storyblok, sent in a prediction, said "All in one becomes extinct." Now, generally I would say I disagree with that because, you know, as we know over the years, suites tend to win out over, you know, individual, you know, point products. But I think what's going to happen is all in one is going to remain the norm for these larger companies that are cutting back. They want to consolidate redundant vendors, and the smaller companies are going to stick with that best of breed and be more aggressive and try to compete more effectively. What's your take on that? >> Yeah, I'm seeing much more consolidation in vendors, but also consolidation in functionality. We're seeing people building out new functionality, whether it's, we're going to talk about this later, so I don't want to steal too much of our thunder right now, but data and security also, we're seeing a functionality creep. So I think there's further consolidation happening here. I think niche solutions are going to be less likely, and platform solutions are going to be more likely in a spending environment where you want to reduce your vendors. You want to have one bill to pay, not 10. Another thing on this slide, real quick if I can before I move on, is we had a bunch of people write in and some of the answer options that aren't on this graph but did get cited a lot, unfortunately, is the obvious reduction in staff, hiring freezes, and delaying hardware, were three of the top write-ins. And another one was offshore outsourcing. So in addition to what we're seeing here, there were a lot of write-in options, and I just thought it would be important to state that, but essentially the cost optimization is by and far the highest one, and it's growing. So it's actually increased in our citations over the last year. >> And yeah, specifically consolidating redundant vendors. And so I actually thank you for bringing that other up, 'cause I had asked you, Eric, is there any evidence that repatriation is going on and we don't see it in the numbers, we don't see it even in the other, there was, I think very little or no mention of cloud repatriation, even though it might be happening in this in a smattering. >> Not a single mention, not one single mention. I went through it for you. Yep. Not one write-in. >> All right, let's move on. Number three, security leads M&A in 2023. Now you might say, "Oh, well that's a layup," but let me set this up Eric, because I didn't really do a great job with the slide. I hid the, what you've done, because you basically took, this is from the emerging technology survey with 1,181 responses from November. And what we did is we took Palo Alto and looked at the overlap in Palo Alto Networks accounts with these vendors that were showing on this chart. And Eric, I'm going to ask you to explain why we put a circle around OneTrust, but let me just set it up, and then have you comment on the slide and take, give us more detail. We're seeing private company valuations are off, you know, 10 to 40%. We saw a sneak, do a down round, but pretty good actually only down 12%. We've seen much higher down rounds. Palo Alto Networks we think is going to get busy. Again, they're an inquisitive company, they've been sort of quiet lately, and we think CrowdStrike, Cisco, Microsoft, Zscaler, we're predicting all of those will make some acquisitions and we're thinking that the targets are somewhere in this mess of security taxonomy. Other thing we're predicting AI meets cyber big time in 2023, we're going to probably going to see some acquisitions of those companies that are leaning into AI. We've seen some of that with Palo Alto. And then, you know, your comment to me, Eric, was "The RSA conference is going to be insane, hopping mad, "crazy this April," (Eric laughing) but give us your take on this data, and why the red circle around OneTrust? Take us back to that slide if you would, Alex. >> Sure. There's a few things here. First, let me explain what we're looking at. So because we separate the public companies and the private companies into two separate surveys, this allows us the ability to cross-reference that data. So what we're doing here is in our public survey, the tesis, everyone who cited some spending with Palo Alto, meaning they're a Palo Alto customer, we then cross-reference that with the private tech companies. Who also are they spending with? So what you're seeing here is an overlap. These companies that we have circled are doing the best in Palo Alto's accounts. Now, Palo Alto went and bought Twistlock a few years ago, which this data slide predicted, to be quite honest. And so I don't know if they necessarily are going to go after Snyk. Snyk, sorry. They already have something in that space. What they do need, however, is more on the authentication space. So I'm looking at OneTrust, with a 45% overlap in their overall net sentiment. That is a company that's already existing in their accounts and could be very synergistic to them. BeyondTrust as well, authentication identity. This is something that Palo needs to do to move more down that zero trust path. Now why did I pick Palo first? Because usually they're very inquisitive. They've been a little quiet lately. Secondly, if you look at the backdrop in the markets, the IPO freeze isn't going to last forever. Sooner or later, the IPO markets are going to open up, and some of these private companies are going to tap into public equity. In the meantime, however, cash funding on the private side is drying up. If they need another round, they're not going to get it, and they're certainly not going to get it at the valuations they were getting. So we're seeing valuations maybe come down where they're a touch more attractive, and Palo knows this isn't going to last forever. Cisco knows that, CrowdStrike, Zscaler, all these companies that are trying to make a push to become that vendor that you're consolidating in, around, they have a chance now, they have a window where they need to go make some acquisitions. And that's why I believe leading up to RSA, we're going to see some movement. I think it's going to pretty, a really exciting time in security right now. >> Awesome. Thank you. Great explanation. All right, let's go on the next one. Number four is, it relates to security. Let's stay there. Zero trust moves from hype to reality in 2023. Now again, you might say, "Oh yeah, that's a layup." A lot of these inbounds that we got are very, you know, kind of self-serving, but we always try to put some meat in the bone. So first thing we do is we pull out some commentary from, Eric, your roundtable, your insights roundtable. And we have a CISO from a global hospitality firm says, "For me that's the highest priority." He's talking about zero trust because it's the best ROI, it's the most forward-looking, and it enables a lot of the business transformation activities that we want to do. CISOs tell me that they actually can drive forward transformation projects that have zero trust, and because they can accelerate them, because they don't have to go through the hurdle of, you know, getting, making sure that it's secure. Second comment, zero trust closes that last mile where once you're authenticated, they open up the resource to you in a zero trust way. That's a CISO of a, and a managing director of a cyber risk services enterprise. Your thoughts on this? >> I can be here all day, so I'm going to try to be quick on this one. This is not a fluff piece on this one. There's a couple of other reasons this is happening. One, the board finally gets it. Zero trust at first was just a marketing hype term. Now the board understands it, and that's why CISOs are able to push through it. And what they finally did was redefine what it means. Zero trust simply means moving away from hardware security, moving towards software-defined security, with authentication as its base. The board finally gets that, and now they understand that this is necessary and it's being moved forward. The other reason it's happening now is hybrid work is here to stay. We weren't really sure at first, large companies were still trying to push people back to the office, and it's going to happen. The pendulum will swing back, but hybrid work's not going anywhere. By basically on our own data, we're seeing that 69% of companies expect remote and hybrid to be permanent, with only 30% permanent in office. Zero trust works for a hybrid environment. So all of that is the reason why this is happening right now. And going back to our previous prediction, this is why we're picking Palo, this is why we're picking Zscaler to make these acquisitions. Palo Alto needs to be better on the authentication side, and so does Zscaler. They're both fantastic on zero trust network access, but they need the authentication software defined aspect, and that's why we think this is going to happen. One last thing, in that CISO round table, I also had somebody say, "Listen, Zscaler is incredible. "They're doing incredibly well pervading the enterprise, "but their pricing's getting a little high," and they actually think Palo Alto is well-suited to start taking some of that share, if Palo can make one move. >> Yeah, Palo Alto's consolidation story is very strong. Here's my question and challenge. Do you and me, so I'm always hardcore about, okay, you've got to have evidence. I want to look back at these things a year from now and say, "Did we get it right? Yes or no?" If we got it wrong, we'll tell you we got it wrong. So how are we going to measure this? I'd say a couple things, and you can chime in. One is just the number of vendors talking about it. That's, but the marketing always leads the reality. So the second part of that is we got to get evidence from the buying community. Can you help us with that? >> (laughs) Luckily, that's what I do. I have a data company that asks thousands of IT decision-makers what they're adopting and what they're increasing spend on, as well as what they're decreasing spend on and what they're replacing. So I have snapshots in time over the last 11 years where I can go ahead and compare and contrast whether this adoption is happening or not. So come back to me in 12 months and I'll let you know. >> Now, you know, I will. Okay, let's bring up the next one. Number five, generative AI hits where the Metaverse missed. Of course everybody's talking about ChatGPT, we just wrote last week in a breaking analysis with John Furrier and Sarjeet Joha our take on that. We think 2023 does mark a pivot point as natural language processing really infiltrates enterprise tech just as Amazon turned the data center into an API. We think going forward, you're going to be interacting with technology through natural language, through English commands or other, you know, foreign language commands, and investors are lining up, all the VCs are getting excited about creating something competitive to ChatGPT, according to (indistinct) a hundred million dollars gets you a seat at the table, gets you into the game. (laughing) That's before you have to start doing promotion. But he thinks that's what it takes to actually create a clone or something equivalent. We've seen stuff from, you know, the head of Facebook's, you know, AI saying, "Oh, it's really not that sophisticated, ChatGPT, "it's kind of like IBM Watson, it's great engineering, "but you know, we've got more advanced technology." We know Google's working on some really interesting stuff. But here's the thing. ETR just launched this survey for the February survey. It's in the field now. We circle open AI in this category. They weren't even in the survey, Eric, last quarter. So 52% of the ETR survey respondents indicated a positive sentiment toward open AI. I added up all the sort of different bars, we could double click on that. And then I got this inbound from Scott Stevenson of Deep Graham. He said "AI is recession-proof." I don't know if that's the case, but it's a good quote. So bring this back up and take us through this. Explain this chart for us, if you would. >> First of all, I like Scott's quote better than the Facebook one. I think that's some sour grapes. Meta just spent an insane amount of money on the Metaverse and that's a dud. Microsoft just spent money on open AI and it is hot, undoubtedly hot. We've only been in the field with our current ETS survey for a week. So my caveat is it's preliminary data, but I don't care if it's preliminary data. (laughing) We're getting a sneak peek here at what is the number one net sentiment and mindshare leader in the entire machine-learning AI sector within a week. It's beating Data- >> 600. 600 in. >> It's beating Databricks. And we all know Databricks is a huge established enterprise company, not only in machine-learning AI, but it's in the top 10 in the entire survey. We have over 400 vendors in this survey. It's number eight overall, already. In a week. This is not hype. This is real. And I could go on the NLP stuff for a while. Not only here are we seeing it in open AI and machine-learning and AI, but we're seeing NLP in security. It's huge in email security. It's completely transforming that area. It's one of the reasons I thought Palo might take Abnormal out. They're doing such a great job with NLP in this email side, and also in the data prep tools. NLP is going to take out data prep tools. If we have time, I'll discuss that later. But yeah, this is, to me this is a no-brainer, and we're already seeing it in the data. >> Yeah, John Furrier called, you know, the ChatGPT introduction. He said it reminded him of the Netscape moment, when we all first saw Netscape Navigator and went, "Wow, it really could be transformative." All right, number six, the cloud expands to supercloud as edge computing accelerates and CloudFlare is a big winner in 2023. We've reported obviously on cloud, multi-cloud, supercloud and CloudFlare, basically saying what multi-cloud should have been. We pulled this quote from Atif Kahn, who is the founder and CTO of Alkira, thanks, one of the inbounds, thank you. "In 2023, highly distributed IT environments "will become more the norm "as organizations increasingly deploy hybrid cloud, "multi-cloud and edge settings..." Eric, from one of your round tables, "If my sources from edge computing are coming "from the cloud, that means I have my workloads "running in the cloud. "There is no one better than CloudFlare," That's a senior director of IT architecture at a huge financial firm. And then your analysis shows CloudFlare really growing in pervasion, that sort of market presence in the dataset, dramatically, to near 20%, leading, I think you had told me that they're even ahead of Google Cloud in terms of momentum right now. >> That was probably the biggest shock to me in our January 2023 tesis, which covers the public companies in the cloud computing sector. CloudFlare has now overtaken GCP in overall spending, and I was shocked by that. It's already extremely pervasive in networking, of course, for the edge networking side, and also in security. This is the number one leader in SaaSi, web access firewall, DDoS, bot protection, by your definition of supercloud, which we just did a couple of weeks ago, and I really enjoyed that by the way Dave, I think CloudFlare is the one that fits your definition best, because it's bringing all of these aspects together, and most importantly, it's cloud agnostic. It does not need to rely on Azure or AWS to do this. It has its own cloud. So I just think it's, when we look at your definition of supercloud, CloudFlare is the poster child. >> You know, what's interesting about that too, is a lot of people are poo-pooing CloudFlare, "Ah, it's, you know, really kind of not that sophisticated." "You don't have as many tools," but to your point, you're can have those tools in the cloud, Cloudflare's doing serverless on steroids, trying to keep things really simple, doing a phenomenal job at, you know, various locations around the world. And they're definitely one to watch. Somebody put them on my radar (laughing) a while ago and said, "Dave, you got to do a breaking analysis on CloudFlare." And so I want to thank that person. I can't really name them, 'cause they work inside of a giant hyperscaler. But- (Eric laughing) (Dave chuckling) >> Real quickly, if I can from a competitive perspective too, who else is there? They've already taken share from Akamai, and Fastly is their really only other direct comp, and they're not there. And these guys are in poll position and they're the only game in town right now. I just, I don't see it slowing down. >> I thought one of your comments from your roundtable I was reading, one of the folks said, you know, CloudFlare, if my workloads are in the cloud, they are, you know, dominant, they said not as strong with on-prem. And so Akamai is doing better there. I'm like, "Okay, where would you want to be?" (laughing) >> Yeah, which one of those two would you rather be? >> Right? Anyway, all right, let's move on. Number seven, blockchain continues to look for a home in the enterprise, but devs will slowly begin to adopt in 2023. You know, blockchains have got a lot of buzz, obviously crypto is, you know, the killer app for blockchain. Senior IT architect in financial services from your, one of your insight roundtables said quote, "For enterprises to adopt a new technology, "there have to be proven turnkey solutions. "My experience in talking with my peers are, "blockchain is still an open-source component "where you have to build around it." Now I want to thank Ravi Mayuram, who's the CTO of Couchbase sent in, you know, one of the predictions, he said, "DevOps will adopt blockchain, specifically Ethereum." And he referenced actually in his email to me, Solidity, which is the programming language for Ethereum, "will be in every DevOps pro's playbook, "mirroring the boom in machine-learning. "Newer programming languages like Solidity "will enter the toolkits of devs." His point there, you know, Solidity for those of you don't know, you know, Bitcoin is not programmable. Solidity, you know, came out and that was their whole shtick, and they've been improving that, and so forth. But it, Eric, it's true, it really hasn't found its home despite, you know, the potential for smart contracts. IBM's pushing it, VMware has had announcements, and others, really hasn't found its way in the enterprise yet. >> Yeah, and I got to be honest, I don't think it's going to, either. So when we did our top trends series, this was basically chosen as an anti-prediction, I would guess, that it just continues to not gain hold. And the reason why was that first comment, right? It's very much a niche solution that requires a ton of custom work around it. You can't just plug and play it. And at the end of the day, let's be very real what this technology is, it's a database ledger, and we already have database ledgers in the enterprise. So why is this a priority to move to a different database ledger? It's going to be very niche cases. I like the CTO comment from Couchbase about it being adopted by DevOps. I agree with that, but it has to be a DevOps in a very specific use case, and a very sophisticated use case in financial services, most likely. And that's not across the entire enterprise. So I just think it's still going to struggle to get its foothold for a little bit longer, if ever. >> Great, thanks. Okay, let's move on. Number eight, AWS Databricks, Google Snowflake lead the data charge with Microsoft. Keeping it simple. So let's unpack this a little bit. This is the shared accounts peer position for, I pulled data platforms in for analytics, machine-learning and AI and database. So I could grab all these accounts or these vendors and see how they compare in those three sectors. Analytics, machine-learning and database. Snowflake and Databricks, you know, they're on a crash course, as you and I have talked about. They're battling to be the single source of truth in analytics. They're, there's going to be a big focus. They're already started. It's going to be accelerated in 2023 on open formats. Iceberg, Python, you know, they're all the rage. We heard about Iceberg at Snowflake Summit, last summer or last June. Not a lot of people had heard of it, but of course the Databricks crowd, who knows it well. A lot of other open source tooling. There's a company called DBT Labs, which you're going to talk about in a minute. George Gilbert put them on our radar. We just had Tristan Handy, the CEO of DBT labs, on at supercloud last week. They are a new disruptor in data that's, they're essentially making, they're API-ifying, if you will, KPIs inside the data warehouse and dramatically simplifying that whole data pipeline. So really, you know, the ETL guys should be shaking in their boots with them. Coming back to the slide. Google really remains focused on BigQuery adoption. Customers have complained to me that they would like to use Snowflake with Google's AI tools, but they're being forced to go to BigQuery. I got to ask Google about that. AWS continues to stitch together its bespoke data stores, that's gone down that "Right tool for the right job" path. David Foyer two years ago said, "AWS absolutely is going to have to solve that problem." We saw them start to do it in, at Reinvent, bringing together NoETL between Aurora and Redshift, and really trying to simplify those worlds. There's going to be more of that. And then Microsoft, they're just making it cheap and easy to use their stuff, you know, despite some of the complaints that we hear in the community, you know, about things like Cosmos, but Eric, your take? >> Yeah, my concern here is that Snowflake and Databricks are fighting each other, and it's allowing AWS and Microsoft to kind of catch up against them, and I don't know if that's the right move for either of those two companies individually, Azure and AWS are building out functionality. Are they as good? No they're not. The other thing to remember too is that AWS and Azure get paid anyway, because both Databricks and Snowflake run on top of 'em. So (laughing) they're basically collecting their toll, while these two fight it out with each other, and they build out functionality. I think they need to stop focusing on each other, a little bit, and think about the overall strategy. Now for Databricks, we know they came out first as a machine-learning AI tool. They were known better for that spot, and now they're really trying to play catch-up on that data storage compute spot, and inversely for Snowflake, they were killing it with the compute separation from storage, and now they're trying to get into the MLAI spot. I actually wouldn't be surprised to see them make some sort of acquisition. Frank Slootman has been a little bit quiet, in my opinion there. The other thing to mention is your comment about DBT Labs. If we look at our emerging technology survey, last survey when this came out, DBT labs, number one leader in that data integration space, I'm going to just pull it up real quickly. It looks like they had a 33% overall net sentiment to lead data analytics integration. So they are clearly growing, it's fourth straight survey consecutively that they've grown. The other name we're seeing there a little bit is Cribl, but DBT labs is by far the number one player in this space. >> All right. Okay, cool. Moving on, let's go to number nine. With Automation mixer resurgence in 2023, we're showing again data. The x axis is overlap or presence in the dataset, and the vertical axis is shared net score. Net score is a measure of spending momentum. As always, you've seen UI path and Microsoft Power Automate up until the right, that red line, that 40% line is generally considered elevated. UI path is really separating, creating some distance from Automation Anywhere, they, you know, previous quarters they were much closer. Microsoft Power Automate came on the scene in a big way, they loom large with this "Good enough" approach. I will say this, I, somebody sent me a results of a (indistinct) survey, which showed UiPath actually had more mentions than Power Automate, which was surprising, but I think that's not been the case in the ETR data set. We're definitely seeing a shift from back office to front soft office kind of workloads. Having said that, software testing is emerging as a mainstream use case, we're seeing ML and AI become embedded in end-to-end automations, and low-code is serving the line of business. And so this, we think, is going to increasingly have appeal to organizations in the coming year, who want to automate as much as possible and not necessarily, we've seen a lot of layoffs in tech, and people... You're going to have to fill the gaps with automation. That's a trend that's going to continue. >> Yep, agreed. At first that comment about Microsoft Power Automate having less citations than UiPath, that's shocking to me. I'm looking at my chart right here where Microsoft Power Automate was cited by over 60% of our entire survey takers, and UiPath at around 38%. Now don't get me wrong, 38% pervasion's fantastic, but you know you're not going to beat an entrenched Microsoft. So I don't really know where that comment came from. So UiPath, looking at it alone, it's doing incredibly well. It had a huge rebound in its net score this last survey. It had dropped going through the back half of 2022, but we saw a big spike in the last one. So it's got a net score of over 55%. A lot of people citing adoption and increasing. So that's really what you want to see for a name like this. The problem is that just Microsoft is doing its playbook. At the end of the day, I'm going to do a POC, why am I going to pay more for UiPath, or even take on another separate bill, when we know everyone's consolidating vendors, if my license already includes Microsoft Power Automate? It might not be perfect, it might not be as good, but what I'm hearing all the time is it's good enough, and I really don't want another invoice. >> Right. So how does UiPath, you know, and Automation Anywhere, how do they compete with that? Well, the way they compete with it is they got to have a better product. They got a product that's 10 times better. You know, they- >> Right. >> they're not going to compete based on where the lowest cost, Microsoft's got that locked up, or where the easiest to, you know, Microsoft basically give it away for free, and that's their playbook. So that's, you know, up to UiPath. UiPath brought on Rob Ensslin, I've interviewed him. Very, very capable individual, is now Co-CEO. So he's kind of bringing that adult supervision in, and really tightening up the go to market. So, you know, we know this company has been a rocket ship, and so getting some control on that and really getting focused like a laser, you know, could be good things ahead there for that company. Okay. >> One of the problems, if I could real quick Dave, is what the use cases are. When we first came out with RPA, everyone was super excited about like, "No, UiPath is going to be great for super powerful "projects, use cases." That's not what RPA is being used for. As you mentioned, it's being used for mundane tasks, so it's not automating complex things, which I think UiPath was built for. So if you were going to get UiPath, and choose that over Microsoft, it's going to be 'cause you're doing it for more powerful use case, where it is better. But the problem is that's not where the enterprise is using it. The enterprise are using this for base rote tasks, and simply, Microsoft Power Automate can do that. >> Yeah, it's interesting. I've had people on theCube that are both Microsoft Power Automate customers and UiPath customers, and I've asked them, "Well you know, "how do you differentiate between the two?" And they've said to me, "Look, our users and personal productivity users, "they like Power Automate, "they can use it themselves, and you know, "it doesn't take a lot of, you know, support on our end." The flip side is you could do that with UiPath, but like you said, there's more of a focus now on end-to-end enterprise automation and building out those capabilities. So it's increasingly a value play, and that's going to be obviously the challenge going forward. Okay, my last one, and then I think you've got some bonus ones. Number 10, hybrid events are the new category. Look it, if I can get a thousand inbounds that are largely self-serving, I can do my own here, 'cause we're in the events business. (Eric chuckling) Here's the prediction though, and this is a trend we're seeing, the number of physical events is going to dramatically increase. That might surprise people, but most of the big giant events are going to get smaller. The exception is AWS with Reinvent, I think Snowflake's going to continue to grow. So there are examples of physical events that are growing, but generally, most of the big ones are getting smaller, and there's going to be many more smaller intimate regional events and road shows. These micro-events, they're going to be stitched together. Digital is becoming a first class citizen, so people really got to get their digital acts together, and brands are prioritizing earned media, and they're beginning to build their own news networks, going direct to their customers. And so that's a trend we see, and I, you know, we're right in the middle of it, Eric, so you know we're going to, you mentioned RSA, I think that's perhaps going to be one of those crazy ones that continues to grow. It's shrunk, and then it, you know, 'cause last year- >> Yeah, it did shrink. >> right, it was the last one before the pandemic, and then they sort of made another run at it last year. It was smaller but it was very vibrant, and I think this year's going to be huge. Global World Congress is another one, we're going to be there end of Feb. That's obviously a big big show, but in general, the brands and the technology vendors, even Oracle is going to scale down. I don't know about Salesforce. We'll see. You had a couple of bonus predictions. Quantum and maybe some others? Bring us home. >> Yeah, sure. I got a few more. I think we touched upon one, but I definitely think the data prep tools are facing extinction, unfortunately, you know, the Talons Informatica is some of those names. The problem there is that the BI tools are kind of including data prep into it already. You know, an example of that is Tableau Prep Builder, and then in addition, Advanced NLP is being worked in as well. ThoughtSpot, Intelius, both often say that as their selling point, Tableau has Ask Data, Click has Insight Bot, so you don't have to really be intelligent on data prep anymore. A regular business user can just self-query, using either the search bar, or even just speaking into what it needs, and these tools are kind of doing the data prep for it. I don't think that's a, you know, an out in left field type of prediction, but it's the time is nigh. The other one I would also state is that I think knowledge graphs are going to break through this year. Neo4j in our survey is growing in pervasion in Mindshare. So more and more people are citing it, AWS Neptune's getting its act together, and we're seeing that spending intentions are growing there. Tiger Graph is also growing in our survey sample. I just think that the time is now for knowledge graphs to break through, and if I had to do one more, I'd say real-time streaming analytics moves from the very, very rich big enterprises to downstream, to more people are actually going to be moving towards real-time streaming, again, because the data prep tools and the data pipelines have gotten easier to use, and I think the ROI on real-time streaming is obviously there. So those are three that didn't make the cut, but I thought deserved an honorable mention. >> Yeah, I'm glad you did. Several weeks ago, we did an analyst prediction roundtable, if you will, a cube session power panel with a number of data analysts and that, you know, streaming, real-time streaming was top of mind. So glad you brought that up. Eric, as always, thank you very much. I appreciate the time you put in beforehand. I know it's been crazy, because you guys are wrapping up, you know, the last quarter survey in- >> Been a nuts three weeks for us. (laughing) >> job. I love the fact that you're doing, you know, the ETS survey now, I think it's quarterly now, right? Is that right? >> Yep. >> Yep. So that's phenomenal. >> Four times a year. I'll be happy to jump on with you when we get that done. I know you were really impressed with that last time. >> It's unbelievable. This is so much data at ETR. Okay. Hey, that's a wrap. Thanks again. >> Take care Dave. Good seeing you. >> All right, many thanks to our team here, Alex Myerson as production, he manages the podcast force. Ken Schiffman as well is a critical component of our East Coast studio. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hoof is our editor-in-chief. He's at siliconangle.com. He's just a great editing for us. Thank you all. Remember all these episodes that are available as podcasts, wherever you listen, podcast is doing great. Just search "Breaking analysis podcast." Really appreciate you guys listening. I publish each week on wikibon.com and siliconangle.com, or you can email me directly if you want to get in touch, david.vellante@siliconangle.com. That's how I got all these. I really appreciate it. I went through every single one with a yellow highlighter. It took some time, (laughing) but I appreciate it. You could DM me at dvellante, or comment on our LinkedIn post and please check out etr.ai. Its data is amazing. Best survey data in the enterprise tech business. This is Dave Vellante for theCube Insights, powered by ETR. Thanks for watching, and we'll see you next time on "Breaking Analysis." (upbeat music beginning) (upbeat music ending)

Published Date : Jan 29 2023

SUMMARY :

insights from the Cube and ETR, do for the community, Dave, good to see you. actually come back to me if you would. It just stays at the top. the most aggressive to cut. that have the most to lose What's the primary method still leads the way, you know, So in addition to what we're seeing here, And so I actually thank you I went through it for you. I'm going to ask you to explain and they're certainly not going to get it to you in a zero trust way. So all of that is the One is just the number of So come back to me in 12 So 52% of the ETR survey amount of money on the Metaverse and also in the data prep tools. the cloud expands to the biggest shock to me "Ah, it's, you know, really and Fastly is their really the folks said, you know, for a home in the enterprise, Yeah, and I got to be honest, in the community, you know, and I don't know if that's the right move and the vertical axis is shared net score. So that's really what you want Well, the way they compete So that's, you know, One of the problems, if and that's going to be obviously even Oracle is going to scale down. and the data pipelines and that, you know, Been a nuts three I love the fact I know you were really is so much data at ETR. and we'll see you next time

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Eric	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Rob Hoof	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
10	QUANTITY	0.99+
Ravi Mayuram	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Dave	PERSON	0.99+
Atif Kahn	PERSON	0.99+
November	DATE	0.99+
Frank Slootman	PERSON	0.99+
APAC	ORGANIZATION	0.99+
Zscaler	ORGANIZATION	0.99+
Palo	ORGANIZATION	0.99+
David Foyer	PERSON	0.99+
February	DATE	0.99+
January 2023	DATE	0.99+
DBT Labs	ORGANIZATION	0.99+
October	DATE	0.99+
Rob Ensslin	PERSON	0.99+
Scott Stevenson	PERSON	0.99+
John Furrier	PERSON	0.99+
69%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
CrowdStrike	ORGANIZATION	0.99+
4.6%	QUANTITY	0.99+
10 times	QUANTITY	0.99+
2023	DATE	0.99+
Scott	PERSON	0.99+
1,181 responses	QUANTITY	0.99+
Palo Alto	ORGANIZATION	0.99+
third year	QUANTITY	0.99+
Boston	LOCATION	0.99+
Alex	PERSON	0.99+
thousands	QUANTITY	0.99+
OneTrust	ORGANIZATION	0.99+
45%	QUANTITY	0.99+
33%	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
two reasons	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
last year	DATE	0.99+
BeyondTrust	ORGANIZATION	0.99+
7%	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+

Oracle Aspires to be the Netflix of AI | Cube Conversation

(gentle music playing) >> For centuries, we've been captivated by the concept of machines doing the job of humans. And over the past decade or so, we've really focused on AI and the possibility of intelligent machines that can perform cognitive tasks. Now in the past few years, with the popularity of machine learning models ranging from recent ChatGPT to Bert, we're starting to see how AI is changing the way we interact with the world. How is AI transforming the way we do business? And what does the future hold for us there. At theCube, we've covered Oracle's AI and ML strategy for years, which has really been used to drive automation into Oracle's autonomous database. We've talked a lot about MySQL HeatWave in database machine learning, and AI pushed into Oracle's business apps. Oracle, it tends to lead in AI, but not competing as a direct AI player per se, but rather embedding AI and machine learning into its portfolio to enhance its existing products, and bring new services and offerings to the market. Now, last October at Cloud World in Las Vegas, Oracle partnered with Nvidia, which is the go-to AI silicon provider for vendors. And they announced an investment, a pretty significant investment to deploy tens of thousands more Nvidia GPUs to OCI, the Oracle Cloud Infrastructure and build out Oracle's infrastructure for enterprise scale AI. Now, Oracle CEO, Safra Catz said something to the effect of this alliance is going to help customers across industries from healthcare, manufacturing, telecoms, and financial services to overcome the multitude of challenges they face. Presumably she was talking about just driving more automation and more productivity. Now, to learn more about Oracle's plans for AI, we'd like to welcome in Elad Ziklik, who's the vice president of AI services at Oracle. Elad, great to see you. Welcome to the show. >> Thank you. Thanks for having me. >> You're very welcome. So first let's talk about Oracle's path to AI. I mean, it's the hottest topic going for years you've been incorporating machine learning into your products and services, you know, could you tell us what you've been working on, how you got here? >> So great question. So as you mentioned, I think most of the original four-way into AI was on embedding AI and using AI to make our applications, and databases better. So inside mySQL HeatWave, inside our autonomous database in power, we've been driving AI, all of course are SaaS apps. So Fusion, our large enterprise business suite for HR applications and CRM and ELP, and whatnot has built in AI inside it. Most recently, NetSuite, our small medium business SaaS suite started using AI for things like automated invoice processing and whatnot. And most recently, over the last, I would say two years, we've started exposing and bringing these capabilities into the broader OCI Oracle Cloud infrastructure. So the developers, and ISVs and customers can start using our AI capabilities to make their apps better and their experiences and business workflow better, and not just consume these as embedded inside Oracle. And this recent partnership that you mentioned with Nvidia is another step in bringing the best AI infrastructure capabilities into this platform so you can actually build any type of machine learning workflow or AI model that you want on Oracle Cloud. >> So when I look at the market, I see companies out there like DataRobot or C3 AI, there's maybe a half dozen that sort of pop up on my radar anyway. And my premise has always been that most customers, they don't want to become AI experts, they want to buy applications and have AI embedded or they want AI to manage their infrastructure. So my question to you is, how does Oracle help its OCI customers support their business with AI? >> So it's a great question. So I think what most customers want is business AI. They want AI that works for the business. They want AI that works for the enterprise. I call it the last mile of AI. And they want this thing to work. The majority of them don't want to hire a large and expensive data science teams to go and build everything from scratch. They just want the business problem solved by applying AI to it. My best analogy is Lego. So if you think of Lego, Lego has these millions Lego blocks that you can use to build anything that you want. But the majority of people like me or like my kids, they want the Lego death style kit or the Lego Eiffel Tower thing. They want a thing that just works, and it's very easy to use. And still Lego blocks, you still need to build some things together, which just works for the scenario that you're looking for. So that's our focus. Our focus is making it easy for customers to apply AI where they need to, in the right business context. So whether it's embedding it inside the business applications, like adding forecasting capabilities to your supply chain management or financial planning software, whether it's adding chat bots into the line of business applications, integrating these things into your analytics dashboard, even all the way to, we have a new platform piece we call ML applications that allows you to take a machine learning model, and scale it for the thousands of tenants that you would be. 'Cause this is a big problem for most of the ML use cases. It's very easy to build something for a proof of concept or a pilot or a demo. But then if you need to take this and then deploy it across your thousands of customers or your thousands of regions or facilities, then it becomes messy. So this is where we spend our time making it easy to take these things into production in the context of your business application or your business use case that you're interested in right now. >> So you mentioned chat bots, and I want to talk about ChatGPT, but my question here is different, we'll talk about that in a minute. So when you think about these chat bots, the ones that are conversational, my experience anyway is they're just meh, they're not that great. But the ones that actually work pretty well, they have a conditioned response. Now they're limited, but they say, which of the following is your problem? And then if that's one of the following is your problem, you can maybe solve your problem. But this is clearly a trend and it helps the line of business. How does Oracle think about these use cases for your customers? >> Yeah, so I think the key here is exactly what you said. It's about task completion. The general purpose bots are interesting, but as you said, like are still limited. They're getting much better, I'm sure we'll talk about ChatGPT. But I think what most enterprises want is around task completion. I want to automate my expense report processing. So today inside Oracle we have a chat bot where I submit my expenses the bot ask a couple of question, I answer them, and then I'm done. Like I don't need to go to our fancy application, and manually submit an expense report. I do this via Slack. And the key is around managing the right expectations of what this thing is capable of doing. Like, I have a story from I think five, six years ago when technology was much inferior than it is today. Well, one of the telco providers I was working with wanted to roll a chat bot that does realtime translation. So it was for a support center for of the call centers. And what they wanted do is, Hey, we have English speaking employees, whatever, 24/7, if somebody's calling, and the native tongue is different like Hebrew in my case, or Chinese or whatnot, then we'll give them a chat bot that they will interact with and will translate this on the fly and everything would work. And when they rolled it out, the feedback from customers was horrendous. Customers said, the technology sucks. It's not good. I hate it, I hate your company, I hate your support. And what they've done is they've changed the narrative. Instead of, you go to a support center, and you assume you're going to talk to a human, and instead you get a crappy chat bot, they're like, Hey, if you want to talk to a Hebrew speaking person, there's a four hour wait, please leave your phone and we'll call you back. Or you can try a new amazing Hebrew speaking AI powered bot and it may help your use case. Do you want to try it out? And some people said, yeah, let's try it out. Plus one to try it out. And the feedback, even though it was the exact same technology was amazing. People were like, oh my God, this is so innovative, this is great. Even though it was the exact same experience that they hated a few weeks earlier on. So I think the key lesson that I picked from this experience is it's all about setting the right expectations, and working around the right use case. If you are replacing a human, the level is different than if you are just helping or augmenting something that otherwise would take a lot of time. And I think this is the focus that we are doing, picking up the tasks that people want to accomplish or that enterprise want to accomplish for the customers, for the employees. And using chat bots to make those specific ones better rather than, hey, this is going to replace all humans everywhere, and just be better than that. >> Yeah, I mean, to the point you mentioned expense reports. I'm in a Twitter thread and one guy says, my favorite part of business travel is filling out expense reports. It's an hour of excitement to figure out which receipts won't scan. We can all relate to that. It's just the worst. When you think about companies that are building custom AI driven apps, what can they do on OCI? What are the best options for them? Do they need to hire an army of machine intelligence experts and AI specialists? Help us understand your point of view there. >> So over the last, I would say the two or three years we've developed a full suite of machine learning and AI services for, I would say probably much every use case that you would expect right now from applying natural language processing to understanding customer support tickets or social media, or whatnot to computer vision platforms or computer vision services that can understand and detect objects, and count objects on shelves or detect cracks in the pipe or defecting parts, all the way to speech services. It can actually transcribe human speech. And most recently we've launched a new document AI service. That can actually look at unstructured documents like receipts or invoices or government IDs or even proprietary documents, loan application, student application forms, patient ingestion and whatnot and completely automate them using AI. So if you want to do one of the things that are, I would say common bread and butter for any industry, whether it's financial services or healthcare or manufacturing, we have a suite of services that any developer can go, and use easily customized with their own data. You don't need to be an expert in deep learning or large language models. You could just use our automobile capabilities, and build your own version of the models. Just go ahead and use them. And if you do have proprietary complex scenarios that you need customer from scratch, we actually have the most cost effective platform for that. So we have the OCI data science as well as built-in machine learning platform inside the databases inside the Oracle database, and mySQL HeatWave that allow data scientists, python welding people that actually like to build and tweak and control and improve, have everything that they need to go and build the machine learning models from scratch, deploy them, monitor and manage them at scale in production environment. And most of it is brand new. So we did not have these technologies four or five years ago and we've started building them and they're now at enterprise scale over the last couple of years. >> So what are some of the state-of-the-art tools, that AI specialists and data scientists need if they're going to go out and develop these new models? >> So I think it's on three layers. I think there's an infrastructure layer where the Nvidia's of the world come into play. For some of these things, you want massively efficient, massively scaled infrastructure place. So we are the most cost effective and performant large scale GPU training environment today. We're going to be first to onboard the new Nvidia H100s. These are the new super powerful GPU's for large language model training. So we have that covered for you in case you need this 'cause you want to build these ginormous things. You need a data science platform, a platform where you can open a Python notebook, and just use all these fancy open source frameworks and create the models that you want, and then click on a button and deploy it. And it infinitely scales wherever you need it. And in many cases you just need the, what I call the applied AI services. You need the Lego sets, the Lego death style, Lego Eiffel Tower. So we have a suite of these sets for typical scenarios, whether it's cognitive services of like, again, understanding images, or documents all the way to solving particular business problems. So an anomaly detection service, demand focusing service that will be the equivalent of these Lego sets. So if this is the business problem that you're looking to solve, we have services out there where we can bring your data, call an API, train a model, get the model and use it in your production environment. So wherever you want to play, all the way into embedding this thing, inside this applications, obviously, wherever you want to play, we have the tools for you to go and engage from infrastructure to SaaS at the top, and everything in the middle. >> So when you think about the data pipeline, and the data life cycle, and the specialized roles that came out of kind of the (indistinct) era if you will. I want to focus on two developers and data scientists. So the developers, they hate dealing with infrastructure and they got to deal with infrastructure. Now they're being asked to secure the infrastructure, they just want to write code. And a data scientist, they're spending all their time trying to figure out, okay, what's the data quality? And they're wrangling data and they don't spend enough time doing what they want to do. So there's been a lack of collaboration. Have you seen that change, are these approaches allowing collaboration between data scientists and developers on a single platform? Can you talk about that a little bit? >> Yeah, that is a great question. One of the biggest set of scars that I have on my back from for building these platforms in other companies is exactly that. Every persona had a set of tools, and these tools didn't talk to each other and the handoff was painful. And most of the machine learning things evaporate or die on the floor because of this problem. It's very rarely that they are unsuccessful because the algorithm wasn't good enough. In most cases it's somebody builds something, and then you can't take it to production, you can't integrate it into your business application. You can't take the data out, train, create an endpoint and integrate it back like it's too painful. So the way we are approaching this is focused on this problem exactly. We have a single set of tools that if you publish a model as a data scientist and developers, and even business analysts that are seeing a inside of business application could be able to consume it. We have a single model store, a single feature store, a single management experience across the various personas that need to play in this. And we spend a lot of time building, and borrowing a word that cellular folks used, and I really liked it, building inside highways to make it easier to bring these insights into where you need them inside applications, both inside our applications, inside our SaaS applications, but also inside custom third party and even first party applications. And this is where a lot of our focus goes to just because we have dealt with so much pain doing this inside our own SaaS that we now have built the tools, and we're making them available for others to make this process of building a machine learning outcome driven insight in your app easier. And it's not just the model development, and it's not just the deployment, it's the entire journey of taking the data, building the model, training it, deploying it, looking at the real data that comes from the app, and creating this feedback loop in a more efficient way. And that's our focus area. Exactly this problem. >> Well thank you for that. So, last week we had our super cloud two event, and I had Juan Loza on and he spent a lot of time talking about how open Oracle is in its philosophy, and I got a lot of feedback. They were like, Oracle open, I don't really think, but the truth is if you think about database Oracle database, it never met a hardware platform that it didn't like. So in that sense it's open. So, but my point is, a big part of of machine learning and AI is driven by open source tools, frameworks, what's your open source strategy? What do you support from an open source standpoint? >> So I'm a strong believer that you don't actually know, nobody knows where the next slip fog or the next industry shifting innovation in AI is going to come from. If you look six months ago, nobody foreseen Dali, the magical text to image generation and the exploding brought into just art and design type of experiences. If you look six weeks ago, I don't think anybody's seen ChatGPT, and what it can do for a whole bunch of industries. So to me, assuming that a customer or partner or developer would want to lock themselves into only the tools that a specific vendor can produce is ridiculous. 'Cause nobody knows, if anybody claims that they know where the innovation is going to come from in a year or two, let alone in five or 10, they're just wrong or lying. So our strategy for Oracle is to, I call this the Netflix of AI. So if you think about Netflix, they produced a bunch of high quality shows on their own. A few years ago it was House of Cards. Last month my wife and I binge watched Ginny and Georgie, but they also curated a lot of shows that they found around the world and bought them to their customers. So it started with things like Seinfeld or Friends and most recently it was Squid games and those are famous Israeli TV series called Founder that Netflix bought in, and they bought it as is and they gave it the Netflix value. So you have captioning and you have the ability to speed the movie and you have it inside your app, and you can download it and watch it offline and everything, but nobody Netflix was involved in the production of these first seasons. Now if these things hunt and they're great, then the third season or the fourth season will get the full Netflix production value, high value budget, high value location shooting or whatever. But you as a customer, you don't care whether the producer and director, and screenplay writing is a Netflix employee or is somebody else's employee. It is fulfilled by Netflix. I believe that we will become, or we are looking to become the Netflix of AI. We are building a bunch of AI in a bunch of places where we think it's important and we have some competitive advantage like healthcare with Acellular partnership or whatnot. But I want to bring the best AI software and hardware to OCI and do a fulfillment by Oracle on that. So you'll get the Oracle security and identity and single bill and everything you'd expect from a company like Oracle. But we don't have to be building the data science, and the models for everything. So this means both open source recently announced a partnership with Anaconda, the leading provider of Python distribution in the data science ecosystem where we are are doing a joint strategic partnership of bringing all the goodness into Oracle customers as well as in the process of doing the same with Nvidia, and all those software libraries, not just the Hubble, both for other stuff like Triton, but also for healthcare specific stuff as well as other ISVs, other AI leading ISVs that we are in the process of partnering with to get their stuff into OCI and into Oracle so that you can truly consume the best AI hardware, and the best AI software in the world on Oracle. 'Cause that is what I believe our customers would want the ability to choose from any open source engine, and honestly from any ISV type of solution that is AI powered and they want to use it in their experiences. >> So you mentioned ChatGPT, I want to talk about some of the innovations that are coming. As an AI expert, you see ChatGPT on the one hand, I'm sure you weren't surprised. On the other hand, maybe the reaction in the market, and the hype is somewhat surprising. You know, they say that we tend to under or over-hype things in the early stages and under hype them long term, you kind of use the internet as example. What's your take on that premise? >> So. I think that this type of technology is going to be an inflection point in how software is being developed. I truly believe this. I think this is an internet style moment, and the way software interfaces, software applications are being developed will dramatically change over the next year two or three because of this type of technologies. I think there will be industries that will be shifted. I think education is a good example. I saw this thing opened on my son's laptop. So I think education is going to be transformed. Design industry like images or whatever, it's already been transformed. But I think that for mass adoption, like beyond the hype, beyond the peak of inflected expectations, if I'm using Gartner terminology, I think certain things need to go and happen. One is this thing needs to become more reliable. So right now it is a complete black box that sometimes produce magic, and sometimes produce just nonsense. And it needs to have better explainability and better lineage to, how did you get to this answer? 'Cause I think enterprises are going to really care about the things that they surface with the customers or use internally. So I think that is one thing that's going to come out. And the other thing that's going to come out is I think it's going to come industry specific large language models or industry specific ChatGPTs. Something like how OpenAI did co-pilot for writing code. I think we will start seeing this type of apps solving for specific business problems, understanding contracts, understanding healthcare, writing doctor's notes on behalf of doctors so they don't have to spend time manually recording and analyzing conversations. And I think that would become the sweet spot of this thing. There will be companies, whether it's OpenAI or Microsoft or Google or hopefully Oracle that will use this type of technology to solve for specific very high value business needs. And I think this will change how interfaces happen. So going back to your expense report, the world of, I'm going to go into an app, and I'm going to click on seven buttons in order to get some job done like this world is gone. Like I'm going to say, hey, please do this and that. And I expect an answer to come out. I've seen a recent demo about, marketing in sales. So a customer sends an email that is interested in something and then a ChatGPT powered thing just produces the answer. I think this is how the world is going to evolve. Like yes, there's a ton of hype, yes, it looks like magic and right now it is magic, but it's not yet productive for most enterprise scenarios. But in the next 6, 12, 24 months, this will start getting more dependable, and it's going to change how these industries are being managed. Like I think it's an internet level revolution. That's my take. >> It's very interesting. And it's going to change the way in which we have. Instead of accessing the data center through APIs, we're going to access it through natural language processing and that opens up technology to a huge audience. Last question, is a two part question. And the first part is what you guys are working on from the futures, but the second part of the question is, we got data scientists and developers in our audience. They love the new shiny toy. So give us a little glimpse of what you're working on in the future, and what would you say to them to persuade them to check out Oracle's AI services? >> Yep. So I think there's two main things that we're doing, one is around healthcare. With a new recent acquisition, we are spending a significant effort around revolutionizing healthcare with AI. Of course many scenarios from patient care using computer vision and cameras through automating, and making better insurance claims to research and pharma. We are making the best models from leading organizations, and internal available for hospitals and researchers, and insurance providers everywhere. And we truly are looking to become the leader in AI for healthcare. So I think that's a huge focus area. And the second part is, again, going back to the enterprise AI angle. Like we want to, if you have a business problem that you want to apply here to solve, we want to be your platform. Like you could use others if you want to build everything complicated and whatnot. We have a platform for that as well. But like, if you want to apply AI to solve a business problem, we want to be your platform. We want to be the, again, the Netflix of AI kind of a thing where we are the place for the greatest AI innovations accessible to any developer, any business analyst, any user, any data scientist on Oracle Cloud. And we're making a significant effort on these two fronts as well as developing a lot of the missing pieces, and building blocks that we see are needed in this space to make truly like a great experience for developers and data scientists. And what would I recommend? Get started, try it out. We actually have a shameless sales plug here. We have a free deal for all of our AI services. So it typically cost you nothing. I would highly recommend to just go, and try these things out. Go play with it. If you are a python welding developer, and you want to try a little bit of auto mail, go down that path. If you're not even there and you're just like, hey, I have these customer feedback things and I want to try out, if I can understand them and apply AI and visualize, and do some cool stuff, we have services for that. My recommendation is, and I think ChatGPT got us 'cause I see people that have nothing to do with AI, and can't even spell AI going and trying it out. I think this is the time. Go play with these things, go play with these technologies and find what AI can do to you or for you. And I think Oracle is a great place to start playing with these things. >> Elad, thank you. Appreciate you sharing your vision of making Oracle the Netflix of AI. Love that and really appreciate your time. >> Awesome. Thank you. Thank you for having me. >> Okay. Thanks for watching this Cube conversation. This is Dave Vellante. We'll see you next time. (gentle music playing)

Published Date : Jan 24 2023

SUMMARY :

AI and the possibility Thanks for having me. I mean, it's the hottest So the developers, So my question to you is, and scale it for the thousands So when you think about these chat bots, and the native tongue It's just the worst. So over the last, and create the models that you want, of the (indistinct) era if you will. So the way we are approaching but the truth is if you the movie and you have it inside your app, and the hype is somewhat surprising. and the way software interfaces, and what would you say to them and you want to try a of making Oracle the Netflix of AI. Thank you for having me. We'll see you next time.

ENTITIES

Entity	Category	Confidence
Netflix	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Nvidia	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Elad Ziklik	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Safra Catz	PERSON	0.99+
Elad	PERSON	0.99+
thousands	QUANTITY	0.99+
Anaconda	ORGANIZATION	0.99+
two part	QUANTITY	0.99+
fourth season	QUANTITY	0.99+
House of Cards	TITLE	0.99+
Lego	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
first seasons	QUANTITY	0.99+
Seinfeld	TITLE	0.99+
Last month	DATE	0.99+
third season	QUANTITY	0.99+
four hour	QUANTITY	0.99+
last week	DATE	0.99+
Hebrew	OTHER	0.99+
Las Vegas	LOCATION	0.99+
last October	DATE	0.99+
OCI	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
both	QUANTITY	0.99+
two fronts	QUANTITY	0.99+
first part	QUANTITY	0.99+
Juan Loza	PERSON	0.99+
Founder	TITLE	0.99+
four	DATE	0.99+
six weeks ago	DATE	0.99+
today	DATE	0.99+
two years	QUANTITY	0.99+
python	TITLE	0.99+
five	QUANTITY	0.99+
a year	QUANTITY	0.99+
six months ago	DATE	0.99+
two developers	QUANTITY	0.99+
first	QUANTITY	0.98+
Python	TITLE	0.98+
H100s	COMMERCIAL_ITEM	0.98+
five years ago	DATE	0.98+
one	QUANTITY	0.98+
Friends	TITLE	0.98+
one guy	QUANTITY	0.98+
10	QUANTITY	0.97+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Bob Muglia, George Gilbert & Tristan Handy | How Supercloud will Support a new Class of Data Apps

(upbeat music) >> Hello, everybody. This is Dave Vellante. Welcome back to Supercloud2, where we're exploring the intersection of data analytics and the future of cloud. In this segment, we're going to look at how the Supercloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications, they're not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend, George Gilbert, former Gartner Analyst, Wiki Bond market analyst, former equities analyst as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs and Bob Muglia, who's the former President of Microsoft's Enterprise business and former CEO of Snowflake. Welcome all, gentlemen. Thank you for coming on the program. >> Good to be here. >> Thanks for having us. >> Hey, look, I'm going to start actually with the SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input, some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? >> Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think it can be tightened a bit, and I think it's helpful to do it in as short a way as possible. And so in the last day I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. This is one day old, so gimme a break if it's going to change. And of course we have to follow the industry, and so that, and whatever the industry decides, but let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. >> Boom. Nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that or what are your thoughts on the whole SuperCloud concept? >> So as I read through this, I fully realize that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the Confluence, Snowflake, you know, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this, and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out Snowflake for being an example of this, but I think that there are like many folks, myself included, for whom this world like fully exists today. >> Yeah, I think that's a fair, I dunno if it's criticism, but people observe, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Burns Lee saying, well, this is what the internet was supposed to be. It was supposed to be Web 2.0, so maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you? When people talk about data products, is that what they mean? Are we talking about something more, different? What are data apps to you? >> Well, to understand data apps, it's useful to contrast them to something, and I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytic services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. >> Got it. You know, George, you and I were talking about, you know, it comes back to SuperCloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? >> Oh, geez. This is like a conversation that I don't know has an end. It's like been, I write a substack, and there's like this little community of people who all write substack. We argue with each other about these kinds of things. Like, you know, as many different takes on this question as you can find, but the way that I think about it is that data products are atomic units of functionality that are fundamentally data driven in nature. So a data product can be as simple as an interactive dashboard that is like actually had design thinking put into it and serves a particular user group and has like actually gone through kind of a product development life cycle. And then a data app or data application is a kind of cohesive end-to-end experience that often encompasses like many different data products. So from my perspective there, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in, you know, software engineering for, you know, as long as there have been computers. >> You know, Jamak Dagani oftentimes uses the, you know, she doesn't name Spotify, but I think it's Spotify as that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take, like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data, and then you got this system that kind of orchestrates the business process, like runs a forecast. But in this data driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain and then actually building a plan? George, is that how you see it? >> I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or driver. But the app then looks at the location of all the drivers in proximity, and it matches a driver to a rider. It calculates an ETA to the rider. It calculates an ETA then to the destination, and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to, you know, calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that, how does that take us down that path? >> So what we've seen in the past, I dunno, decade or so, is that one of the most successful business models in infrastructure is taking hard things and rolling 'em up behind APIs. You take messaging, you take payments, and you all of a sudden increase the capability of kind of your median application developer. And you say, you know, previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments, and now you can focus on your business logic, and just create the thing. One of, interestingly, one of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse, inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily, but in fact that's not the case. It's actually quite challenging to, and involves a lot of data engineering pipeline and all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes, and they don't need to. >> So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that, that's a well scoped sort of API that people can work within. How much can you extend that to say non-calculated business rules or governance information like data reliability rules, things like that, or even, you know, features for an AIML feature store. In other words, it starts, you started pragmatically, but how far can you grow? >> Bob is waiting with bated breath to answer this question. I'm, just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of a thing, get it out there, and see if people use it. But the idea that, the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. And what we're trying to do here is make it very easy to access the logical manifestation of the data, and metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that. And I'm sure that there will be more kind of logical structures as well. >> So, Bob, chime in on this. You know, what's your thoughts on the right architecture behind this, and how do we get there? >> Yeah, well first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never going to go away. It's very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're going to move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing, and you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step. And learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits and any form of computers. But what we do is what I would call implicit modeling, which means that it's the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happen and during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing. A number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs. You know, some of these incredible machine learning programs in foundation models like GPT and DALL-E and all of the things that are happening in these global scale models, but also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that, that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. >> Oh, oh wow. There's a lot to unpack there. So let me ask the Colombo question, Tristan, we've been making fun of your youth. We're just, we're just jealous. Colombo, I'll explain it offline maybe. >> I watch Colombo. >> Okay. All right, good. So but today if you think about the application stack and the data stack, which is largely an analytics pipeline. They're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack cause the data stack today is so, you know, hard to manage. But but do those worlds have to come together? And you know, through that model, I guess abstraction or translation that Bob was just describing, how do you guys think about that? Who wants to take that? >> I think it's inevitable that data and AI are going to become closer together? I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse portmanteau or not. There's also, there's a next generation of data tech that is still in the like early stage of being developed. There's a company that I love that is essentially Cross Cloud Lambda, and it's just a wonderful abstraction for computing. So I think that, you know, people have been predicting that these worlds are going to come together for awhile. A16Z wrote a great post on this back in I think 2020, predicting this, and I've been predicting this since since 2020. But what's not clear is the timeline, but I think that this is still just as inevitable as it's been. >> Who's that that does Cross Cloud? >> Let me follow up on. >> Who's that, Tristan, that does Cross Cloud Lambda? Can you name names? >> Oh, they're called Modal Labs. >> Modal Labs, yeah, of course. All right, go ahead, George. >> Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data centric, where data's not locked inside or behind the APIs of different applications so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics you're going to add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time then why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative? Or where, what type of limitations would you face? >> Well, I think that it is the way the world will evolve is to start with the modern data stack and, you know, which is operational applications going through a data pipeline into some form of data lake, data warehouse, the Lakehouse, whatever you want to call it. And then, you know, this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is of course building their data warehouse. Certainly Google, Microsoft and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not going to go away because that infrastructure is relatively new. It's just being deployed by companies, and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you need to, if you want to model all, if you want to talk about all the semantics of even data, you need to think about all of the relationships. You need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, you know, a large financial organization that is building out these semantic layers. They're further along than many companies are. And you know, I asked what they're building it on, and you know, it's not surprising they're using a, they're using combinations of some form of search together with, you know, textual based search together with a document oriented database. In this case it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They don't really, they can't manage the complicated relationships that are required. They can't issue the queries that are required. And so a new generation of database needs to be developed. And fortunately, you know, that is happening. The world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a big query, you know, you get tens of different joins coming together, and that query is going to take a really long time. Well, fortunately, technology is evolving, and it's possible with new join algorithms, worst case, optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important, and I think that will be a requirement to have this semantically reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run I think we'll see a new set of databases to support these models. >> So Tristan, you got to respond to that, right? You got to, so take the example of Snowflake. We know it doesn't deal well with complex joins, but they're, they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem, and others, but please, your thoughts on what Bob just shared. >> Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? >> No, I can talk about it. I mean, we can talk about it. Look, the company I've been working with is Relational AI, and they're doing this work to actually first of all work across the industry with academics and research, you know, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published, just like SQL, the underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that. And those algorithms actually go back to system R and what IBM developed in the 1970s. We're just, there's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, you know, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what, to Codd, and what he wrote, he defined two things. He defined a relational calculus and relational algebra. And essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics. And it's possible to have systems that can do everything and that can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. >> So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic layer using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry, if not more. But the wonderful thing about DBT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine grain detail, more procedural, then we're going to support that too. And so I'm excited about all of it. >> Yeah, so interpreting that short answer, you're basically saying, cause Bob was just kind of pointing to you as incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? >> I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to like use different workflows to actually make use of that technology. So yeah, yeah, we are a very incremental company. We're going to continue being that way. >> Well, I think Bob was using incremental as a pejorative. I mean, I, but to your point, a lot. >> No, I don't think so. I want to stop that. No, I don't think it's pejorative at all. I think incremental, incremental is usually the most successful path. >> Yes, of course. >> In my experience. >> We agree, we agree on that. >> Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's going to go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now you can argue whether it's two years, three years, five years, or 10 years, but I'd be shocked if it didn't happen in 10 years. >> Yeah, so we all agree that incremental is less disruptive. Boom, but Tristan, you're, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision, and then Bob, and I'm inferring from Bob's comments that maybe you don't think that's the case, but please. >> No, no, no. I think that, so Bob, let me put words into your mouth and you tell me if you disagree, DBT is completely useless in a world where a large scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large scale computing platforms. And to the extent that those platforms extend their functionality to bring more capabilities, we will also service those capabilities. >> Let me try and bridge the two. >> Yeah, yeah, so Bob, Bob, Bob, do you concur with what Tristan just said? >> Absolutely, I mean there's nothing to argue with in what Tristan just said. >> I wanted. >> And it's what he's doing. It'll continue to, I believe he'll continue to do it, and I think it's a very good thing for the industry. You know, I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about Cosmos with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay. That's the purpose of it. It becomes a poor man's join engine. And I kind of go, I know there's a better answer than that. I know there is, but that's kind of where we are state of the art right now. >> George, we got to wrap it. So give us the last word here. Go ahead, George. >> Okay, I just, I think there's a way to tie together what Tristan and Bob are both talking about, and I want them to validate it, which is for five years we're going to be adding or some number of years more and more semantics to the operational and analytic data that we have, starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by not having, by having the knowledge graph store sort of the joins, all the complex relationships among the data, but having the semantics in the DBT layer? >> Well, I think this, okay, I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they want to ask questions, they want to use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored and structured, or at least metadata is stored and structured. Again, I'm not saying that anytime in the next 10 years, SQL is going to go away. In fact, more SQL will be written in the future than has been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean that's what they are today. They're incredibly powerful at working with large amounts of data, and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. >> But Tristan, bring us home here. It sounds like, let me put pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a supercloud, sort of where data lives irrespective of physical location. We're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. >> 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers, that we can join data across different physical regions and that we as practitioners are going to kind of start forgetting about details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh I think really starts to get very, very critical as a concept. >> Guys, great conversation. It was really awesome to have you. I can't thank you enough for spending time with us. Really appreciate it. >> Thanks a lot. >> All right. This is Dave Vellante for George Gilbert, John Furrier, and the entire Cube community. Keep it right there for more content. You're watching SuperCloud2. (upbeat music)

Published Date : Jan 4 2023

SUMMARY :

and the future of cloud. And Bob, you have some really and I think it's helpful to do it I'm going to go back and And I noticed that you is that what they mean? that we're familiar with, you know, it comes back to SuperCloud, is that data products are George, is that how you see it? that don't require a human to is that one of the most And DBT has, you know, the And I'm sure that there will be more on the right architecture is that in the next 10 years, So let me ask the Colombo and the data stack, which is that is still in the like Modal Labs, yeah, of course. and that sits above the and that query is going to So Tristan, you got to and that can store all of the that there is to do for the pointing to you as incremental, but allowing practitioners to I mean, I, but to your point, a lot. the most successful path. that that's the way the that you believe you have the architecture and you tell me if you disagree, there's nothing to argue with And I kind of go, I know there's George, we got to wrap it. and more of those semantics and the semantics that that requires. is that whatever happens in the future, and that we as practitioners I can't thank you enough John Furrier, and the

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
Steve Mullaney	PERSON	0.99+
Katie	PERSON	0.99+
David Floyer	PERSON	0.99+
Charles	PERSON	0.99+
Mike Dooley	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris	PERSON	0.99+
Tristan Handy	PERSON	0.99+
Bob	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Wolf	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Merim	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Rossi	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Chris Wegmann	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Eric	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Caterpillar	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Josh	PERSON	0.99+
Europe	LOCATION	0.99+
Jerome	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lori MacVittie	PERSON	0.99+
2007	DATE	0.99+
Seattle	LOCATION	0.99+
10	QUANTITY	0.99+
five	QUANTITY	0.99+
Ali Ghodsi	PERSON	0.99+
Peter McKee	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
India	LOCATION	0.99+
Mike	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Tanuja Randery	PERSON	0.99+

Itamar Ankorion, Qlik & Peter MacDonald, Snowflake | AWS re:Invent 2022

(upbeat music) >> Hello, welcome back to theCUBE's AWS RE:Invent 2022 Coverage. I'm John Furrier, host of theCUBE. Got a great lineup here, Itamar Ankorion SVP Technology Alliance at Qlik and Peter McDonald, vice President, cloud partnerships and business development Snowflake. We're going to talk about bringing SAP data to life, for joint Snowflake, Qlik and AWS Solution. Gentlemen, thanks for coming on theCUBE Really appreciate it. >> Thank you. >> Thank you, great meeting you John. >> Just to get started, introduce yourselves to the audience, then going to jump into what you guys are doing together, unique relationship here, really compelling solution in cloud. Big story about applications and scale this year. Let's introduce yourselves. Peter, we'll start with you. >> Great. I'm Peter MacDonald. I am vice president of Cloud Partners and business development here at Snowflake. On the Cloud Partner side, that means I manage AWS relationship along with Microsoft and Google Cloud. What we do together in terms of complimentary products, GTM, co-selling, things like that. Importantly, working with other third parties like Qlik for joint solutions. On business development, it's negotiating custom commercial partnerships, large companies like Salesforce and Dell, smaller companies at most for our venture portfolio. >> Thanks Peter and hi John. It's great to be back here. So I'm Itamar Ankorion and I'm the senior vice president responsible for technology alliances here at Qlik. With that, own strategic alliances, including our key partners in the cloud, including Snowflake and AWS. I've been in the data and analytics enterprise software market for 20 plus years, and my main focus is product management, marketing, alliances, and business development. I joined Qlik about three and a half years ago through the acquisition of Attunity, which is now the foundation for Qlik data integration. So again, we focus in my team on creating joint solution alignment with our key partners to provide more value to our customers. >> Great to have both you guys, senior executives in the industry on theCUBE here, talking about data, obviously bringing SAP data to life is the theme of this segment, but this reinvent, it's all about the data, big data end-to-end story, a lot about data being intrinsic as the CEO says on stage around in the organizations in all aspects. Take a minute to explain what you guys are doing as from a company standpoint. Snowflake and Qlik and the solutions, why here at AWS? Peter, we'll start with you at Snowflake, what you guys do as a company, your mission, your focus. >> That was great, John. Yeah, so here at Snowflake, we focus on the data platform and until recently, data platforms required expensive on-prem hardware appliances. And despite all that expense, customers had capacity constraints, inexpensive maintenance, and had limited functionality that all impeded these organizations from reaching their goals. Snowflake is a cloud native SaaS platform, and we've become so successful because we've addressed these pain points and have other new special features. For example, securely sharing data across both the organization and the value chain without copying the data, support for new data types such as JSON and structured data, and also advance in database data governance. Snowflake integrates with complimentary AWS services and other partner products. So we can enable holistic solutions that include, for example, here, both Qlik and AWS SageMaker, and comprehend and bring those to joint customers. Our customers want to convert data into insights along with advanced analytics platforms in AI. That is how they make holistic data-driven solutions that will give them competitive advantage. With Snowflake, our approach is to focus on customer solutions that leverage data from existing systems such as SAP, wherever they are in the cloud or on-premise. And to do this, we leverage partners like Qlik native US to help customers transform their businesses. We provide customers with a premier data analytics platform as a result. Itamar, why don't you talk about Qlik a little bit and then we can dive into the specific SAP solution here and some trends >> Sounds great, Peter. So Qlik provides modern data integration and analytics software used by over 38,000 customers worldwide. Our focus is to help our customers turn data into value and help them close the gap between data all the way through insight and action. We offer click data integration and click data analytics. Click data integration helps to automate the data pipelines to deliver data to where they want to use them in real-time and make the data ready for analytics and then Qlik data analytics is a robust platform for analytics and business intelligence has been a leader in the Gartner Magic Quadrant for over 11 years now in the market. And both of these come together into what we call Qlik Cloud, which is our SaaS based platform. So providing a more seamless way to consume all these services and accelerate time to value with customer solutions. In terms of partnerships, both Snowflake and AWS are very strategic to us here at Qlik, so we have very comprehensive investment to ensure strong joint value proposition to we can bring to our mutual customers, everything from aligning our roadmaps through optimizing and validating integrations, collaborating on best practices, packaging joint solutions like the one we'll talk about today. And with that investment, we are an elite level, top level partner with Snowflake. We fly that our technology is Snowflake-ready across the entire product set and we have hundreds of joint customers together and with AWS we've also partnered for a long time. We're here to reinvent. We've been here with the first reinvent since the inaugural one, so it kind of gives you an idea for how long we've been working with AWS. We provide very comprehensive integration with AWS data analytics services, and we have several competencies ranging from data analytics to migration and modernization. So that's our focus and again, we're excited about working with Snowflake and AWS to bring solutions together to market. >> Well, I'm looking forward to unpacking the solutions specifically, and congratulations on the continued success of both your companies. We've been following them obviously for a very long time and seeing the platform evolve beyond just SaaS and a lot more going on in cloud these days, kind of next generation emerging. You know, we're seeing a lot of macro trends that are going to be powering some of the things we're going to get into real quickly. But before we get into the solution, what are some of those power dynamics in the industry that you're seeing in trends specifically that are impacting your customers that are taking us down this road of getting more out of the data and specifically the SAP, but in general trends and dynamics. What are you hearing from your customers? Why do they care? Why are they going down this road? Peter, we'll start with you. >> Yeah, I'll go ahead and start. Thanks. Yeah, I'd say we continue to see customers being, being very eager to transform their businesses and they know they need to leverage technology and data to do so. They're also increasingly depending upon the cloud to bring that agility, that elasticity, new functionality necessary to react in real-time to every evolving customer needs. You look at what's happened over the last three years, and boy, the macro environment customers, it's all changing so fast. With our partnerships with AWS and Qlik, we've been able to bring to market innovative solutions like the one we're announcing today that spans all three companies. It provides a holistic solution and an integrated solution for our customer. >> Itamar let's get into it, you've been with theCUBE, you've seen the journey, you have your own journey, many, many years, you've seen the waves. What's going on now? I mean, what's the big wave? What's the dynamic powering this trend? >> Yeah, in a nutshell I'll call it, it's all about time. You know, it's time to value and it's about real-time data. I'll kind of talk about that a bit. So, I mean, you hear a lot about the data being the new oil, but it's definitely, we see more and more customers seeing data as their critical enabler for innovation and digital transformation. They look for ways to monetize data. They look as the data as the way in which they can innovate and bring different value to the customers. So we see customers want to use more data so to get more value from data. We definitely see them wanting to do it faster, right, than before. And we definitely see them looking for agility and automation as ways to accelerate time to value, and also reduce overall costs. I did mention real-time data, so we definitely see more and more customers, they want to be able to act and make decisions based on fresh data. So yesterday's data is just not good enough. >> John: Yeah. >> It's got to be down to the hour, down to the minutes and sometimes even lower than that. And then I think we're also seeing customers look to their core business systems where they have a lot of value, like the SAP, like mainframe and thinking, okay, our core data is there, how can we get more value from this data? So that's key things we see all the time with customers. >> Yeah, we did a big editorial segment this year on, we called data as code. Data as code is kind of a riff on infrastructure as code and you start to see data becoming proliferating into all aspects, fresh data. It's not just where you store it, it's how you share it, it's how you turn it into an application intrinsically involved in all aspects. This is the big theme this year and that's driving all the conversations here at RE:Invent. And I'm guaranteeing you, it's going to happen for another five and 10 years. It's not stopping. So I got to get into the solution, you guys mentioned SAP and you've announced the solution by Qlik, Snowflake and AWS for your customers using SAP. Can you share more about this solution? What's unique about it? Why is it important and why now? Peter, Itamar, we'll start with you first. >> Let me jump in, this is really, I'll jump because I'm excited. We're very excited about this solution and it's also a solution by the way and again, we've seen proven customer success with it. So to your point, it's ready to scale, it's starting, I think we're going to see a lot of companies doing this over the next few years. But before we jump to the solution, let me maybe take a few minutes just to clarify the need, why we're seeing, why we're seeing customers jump to do this. So customers that use SAP, they use it to manage the core of their business. So think order processing, management, finance, inventory, supply chain, and so much more. So if you're running SAP in your company, that data creates a great opportunity for you to drive innovation and modernization. So what we see customers want to do, they want to do more with their data and more means they want to take SAP with non-SAP data and use it together to drive new insights. They want to use real-time data to drive real-time analytics, which they couldn't do to date. They want to bring together descriptive with predictive analytics. So adding machine learning in AI to drive more value from the data. And naturally they want to do it faster. So find ways to iterate faster on their solutions, have freedom with the data and agility. And I think this is really where cloud data platforms like Snowflake and AWS, you know, bring that value to be able to drive that. Now to do that you need to unlock the SAP data, which is a lot of also where Qlik comes in because typical challenges these customers run into is the complexity, inherent in SAP data. Tens of thousands of tables, proprietary formats, complex data models, licensing restrictions, and more than, you have performance issues, they usually run into how do we handle the throughput, the volumes while maintaining lower latency and impact. Where do we find knowledge to really understand how to get all this done? So these are the things we've looked at when we came together to create a solution and make it unique. So when you think about its uniqueness, because we put together a lot, and I'll go through three, four key things that come together to make this unique. First is about data delivery. How do you have the SAP data delivery? So how do you get it from ECC, from HANA from S/4HANA, how do you deliver the data and the metadata and how that integration well into Snowflake. And what we've done is we've focused a lot on optimizing that process and the continuous ingestion, so the real-time ingestion of the data in a way that works really well with the Snowflake system, data cloud. Second thing is we looked at SAP data transformation, so once the data arrives at Snowflake, how do we turn it into being analytics ready? So that's where data transformation and data worth automation come in. And these are all elements of this solution. So creating derivative datasets, creating data marts, and all of that is done by again, creating an optimized integration that pushes down SQL based transformations, so they can be processed inside Snowflake, leveraging its powerful engine. And then the third element is bringing together data visualization analytics that can also take all the data now that in organizing inside Snowflake, bring other data in, bring machine learning from SageMaker, and then you go to create a seamless integration to bring analytic applications to life. So these are all things we put together in the solution. And maybe the last point is we actually took the next step with this and we created something we refer to as solution accelerators, which we're really, really keen about. Think about this as prepackaged templates for common business analytic needs like order to cash, finance, inventory. And we can either dig into that a little more later, but this gets the next level of value to the customers all built into this joint solution. >> Yeah, I want to get to the accelerators, but real quick, Peter, your reaction to the solution, what's unique about it? And obviously Snowflake, we've been seeing the progression data applications, more developers developing on top of Snowflake, data as code kind of implies developer ecosystem. This is kind of interesting. I mean, you got partnering with Qlik and AWS, it's kind of a developer-like thinking real solution. What's unique about this SAP solution that's, that's different than what customers can get anywhere else or not? >> Yeah, well listen, I think first of all, you have to start with the idea of the solution. This are three companies coming together to build a holistic solution that is all about, you know, creating a great opportunity to turn SAP data into value this is Itamar was talking about, that's really what we're talking about here and there's a lot of technology underneath it. I'll talk more about the Snowflake technology, what's involved here, and then cover some of the AWS pieces as well. But you know, we're focusing on getting that value out and accelerating time to value for our joint customers. As Itamar was saying, you know, there's a lot of complexity with the SAP data and a lot of value there. How can we manage that in a prepackaged way, bringing together best of breed solutions with proven capabilities and bringing this to market quickly for our joint customers. You know, Snowflake and AWS have been strong partners for a number of years now, and that's not only on how Snowflake runs on top of AWS, but also how we integrate with their complementary analytics and then all products. And so, you know, we want to be able to leverage those in addition to what Qlik is bringing in terms of the data transformations, bringing data out of SAP in the visualization as well. All very critical. And then we want to bring in the predictive analytics, AWS brings and what Sage brings. We'll talk about that a little bit later on. Some of the technologies that we're leveraging are some of our latest cutting edge technologies that really make things easier for both our partners and our customers. For example, Qlik leverages Snowflakes recently released Snowpark for Python functionality to push down those data transformations from clicking the Snowflake that Itamar's mentioning. And while we also leverage Snowpark for integrations with Amazon SageMaker, but there's a lot of great new technology that just makes this easy and compelling for customers. >> I think that's the big word, easy button here for what may look like a complex kind of integration, kind of turnkey, really, really compelling example of the modern era we're living in, as we always say in theCUBE. You mentioned accelerators, SAP accelerators. Can you give an example of how that works with the technology from the third party providers to deliver this business value Itamar, 'cause that was an interesting comment. What's the example? Give an example of this acceleration. >> Yes, certainly. I think this is something that really makes this truly, truly unique in the industry and again, a great opportunity for customers. So we kind talked earlier about there's a lot of things that need to be done with SP data to turn it to value. And these accelerator, as the name suggests, are designed to do just that, to kind of jumpstart the process and reduce the time and the risk involved in such project. So again, these are pre-packaged templates. We basically took a lot of knowledge, and a lot of configurations, best practices about to get things done and we put 'em together. So think about all the steps, it includes things like data extraction, so already knowing which tables, all the relevant tables that you need to get data from in the contexts of the solution you're looking for, say like order to cash, we'll get back to that one. How do you continuously deliver that data into Snowflake in an in efficient manner, handling things like data type mappings, metadata naming conventions and transformations. The data models you build all the way to data mart definitions and all the transformations that the data needs to go through moving through steps until it's fully analytics ready. And then on top of that, even adding a library of comprehensive analytic dashboards and integrations through machine learning and AI and put all of that in a way that's in pre-integrated and tested to work with Snowflake and AWS. So this is where again, you get this entire recipe that's ready. So take for example, I think I mentioned order to cash. So again, all these things I just talked about, I mean, for those who are not familiar, I mean order to cash is a critical business process for every organization. So especially if you're in retail, manufacturing, enterprise, it's a big... This is where, you know, starting with booking a sales order, following by fulfilling the order, billing the customer, then managing the accounts receivable when the customer actually pays, right? So this all process, you got sales order fulfillment and the billing impacts customer satisfaction, you got receivable payments, you know, the impact's working capital, cash liquidity. So again, as a result this order to cash process is a lifeblood for many businesses and it's critical to optimize and understand. So the solution accelerator we created specifically for order to cash takes care of understanding all these aspects and the data that needs to come with it. So everything we outline before to make the data available in Snowflake in a way that's really useful for downstream analytics, along with dashboards that are already common for that, for that use case. So again, this enables customers to gain real-time visibility into their sales orders, fulfillment, accounts receivable performance. That's what the Excel's are all about. And very similarly, we have another one for example, for finance analytics, right? So this will optimize financial data reporting, helps customers get insights into P&L, financial risk of stability or inventory analytics that helps with, you know, improve planning and inventory management, utilization, increased efficiencies, you know, so in supply chain. So again, these accelerators really help customers get a jumpstart and move faster with their solutions. >> Peter, this is the easy button we just talked about, getting things going, you know, get the ball rolling, get some acceleration. Big part of this are the three companies coming together doing this. >> Yeah, and to build on what Itamar just said that the SAP data obviously has tremendous value. Those sales orders, distribution data, financial data, bringing that into Snowflake makes it easily accessible, but also it enables it to be combined with other data too, is one of the things that Snowflake does so well. So you can get a full view of the end-to-end process and the business overall. You know, for example, I'll just take one, you know, one example that, that may not come to mind right away, but you know, looking at the impact of weather conditions on supply chain logistics is relevant and material and have interest to our customers. How do you bring those different data sets together in an easy way, bringing the data out of SAP, bringing maybe other data out of other systems through Qlik or through Snowflake, directly bringing data in from our data marketplace and bring that all together to make it work. You know, fundamentally organizational silos and the data fragmentation exist otherwise make it really difficult to drive modern analytics projects. And that in turn limits the value that our customers are getting from SAP data and these other data sets. We want to enable that and unleash. >> Yeah, time for value. This is great stuff. Itamar final question, you know, what are customers using this? What do you have? I'm sure you have customers examples already using the solution. Can you share kind of what these examples look like in the use cases and the value? >> Oh yeah, absolutely. Thank you. Happy to. We have customers across different, different sectors. You see manufacturing, retail, energy, oil and gas, CPG. So again, customers in those segments, typically sectors typically have SAP. So we have customers in all of them. A great example is like Siemens Energy. Siemens Energy is a global provider of gas par services. You know, over what, 28 billion, 30 billion in revenue. 90,000 employees. They operate globally in over 90 countries. So they've used SAP HANA as a core system, so it's running on premises, multiple locations around the world. And what they were looking for is a way to bring all these data together so they can innovate with it. And the thing is, Peter mentioned earlier, not just the SAP data, but also bring other data from other systems to bring it together for more value. That includes finance data, these logistics data, these customer CRM data. So they bring data from over 20 different SAP systems. Okay, with Qlik data integration, feeding that into Snowflake in under 20 minutes, 24/7, 365, you know, days a year. Okay, they get data from over 20,000 tables, you know, over million, hundreds of millions of records daily going in. So it is a great example of the type of scale, scalability, agility and speed that they can get to drive these kind of innovation. So that's a great example with Siemens. You know, another one comes to mind is a global manufacturer. Very similar scenario, but you know, they're using it for real-time executive reporting. So it's more like feasibility to the production data as well as for financial analytics. So think, think, think about everything from audit to texts to innovate financial intelligence because all the data's coming from SAP. >> It's a great time to be in the data business again. It keeps getting better and better. There's more data coming. It's not stopping, you know, it's growing so fast, it keeps coming. Every year, it's the same story, Peter. It's like, doesn't stop coming. As we wrap up here, let's just get customers some information on how to get started. I mean, obviously you're starting to see the accelerators, it's a great program there. What a great partnership between the two companies and AWS. How can customers get started to learn about the solution and take advantage of it, getting more out of their SAP data, Peter? >> Yeah, I think the first place to go to is talk to Snowflake, talk to AWS, talk to our account executives that are assigned to your account. Reach out to them and they will be able to educate you on the solution. We have packages up very nicely and can be deployed very, very quickly. >> Well gentlemen, thank you so much for coming on. Appreciate the conversation. Great overview of the partnership between, you know, Snowflake and Qlik and AWS on a joint solution. You know, getting more out of the SAP data. It's really kind of a key, key solution, bringing SAP data to life. Thanks for coming on theCUBE. Appreciate it. >> Thank you. >> Thank you John. >> Okay, this is theCUBE coverage here at RE:Invent 2022. I'm John Furrier, your host of theCUBE. Thanks for watching. (upbeat music)

Published Date : Dec 1 2022

SUMMARY :

bringing SAP data to life, great meeting you John. then going to jump into what On the Cloud Partner side, and I'm the senior vice and the solutions, and the value chain and accelerate time to value that are going to be powering and data to do so. What's the dynamic powering this trend? You know, it's time to value all the time with customers. and that's driving all the and it's also a solution by the way I mean, you got partnering and bringing this to market of the modern era we're living in, that the data needs to go through getting things going, you know, Yeah, and to build in the use cases and the value? agility and speed that they can get It's a great time to be to educate you on the solution. key solution, bringing SAP data to life. Okay, this is theCUBE

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Dell	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Siemens	ORGANIZATION	0.99+
Peter MacDonald	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter McDonald	PERSON	0.99+
Qlik	ORGANIZATION	0.99+
28 billion	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Tens	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Siemens Energy	ORGANIZATION	0.99+
20 plus years	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
third element	QUANTITY	0.99+
First	QUANTITY	0.99+
three	QUANTITY	0.99+
Itamar	PERSON	0.99+
over 20,000 tables	QUANTITY	0.99+
both	QUANTITY	0.99+
90,000 employees	QUANTITY	0.99+
first	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
Cloud Partners	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
over 38,000 customers	QUANTITY	0.99+
under 20 minutes	QUANTITY	0.99+
10 years	QUANTITY	0.99+
five	QUANTITY	0.99+
Excel	TITLE	0.99+
one	QUANTITY	0.99+
over 11 years	QUANTITY	0.98+
Snowpark	TITLE	0.98+
Second thing	QUANTITY	0.98+

Robert Nishihara, Anyscale | AWS re:Invent 2022 - Global Startup Program

>>Well, hello everybody. John Walls here and continuing our coverage here at AWS Reinvent 22 on the queue. We continue our segments here in the Global Startup program, which of course is sponsored by AWS Startup Showcase, and with us to talk about any scale as the co-founder and CEO of the company, Robert and n, you are Robert. Good to see you. Thanks for joining us. >>Yeah, great. And thank you. >>You bet. Yeah. Glad to have you aboard here. So let's talk about Annie Scale, first off, for those at home and might not be familiar with what you do. Yeah. Because you've only been around for a short period of time, you're telling me >>Company's about >>Three years now. Three >>Years old, >>Yeah. Yeah. So tell us all about it. Yeah, >>Absolutely. So one of the biggest things happening in computing right now is the proliferation of ai. AI is just spreading throughout every industry has the potential to transform every industry. But the thing about doing AI is that it's incredibly computationally intensive. So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, you're doing it across many machines, many gpu, many compute resources, and that's incredibly hard to do. It requires a lot of software engineering expertise, a lot of infrastructure expertise, a lot of cloud computing expertise to build the software infrastructure and distributed systems to really scale AI across all of the, across the cloud. And to do it in a way where you're really getting value out of ai. And so that is the, the problem statement that AI has tremendous potential. It's incredibly hard to do because of the, the scale required. >>And what we are building at any scale is really trying to make that easy. So trying to get to the point where, as a developer, if you know how to program on your laptop, then if you know how to program saying Python on your laptop, then that's enough, right? Then you can do ai, you can get value out of it, you can scale it, you can build the kinds of, you know, incredibly powerful applica AI applications that companies like Google and, and Facebook and others can build. But you don't have to learn about all of the distributed systems and infrastructure. It just, you know, we'll handle that for you. So that's, if we're successful, you know, that's what we're trying to achieve here. >>Yeah. What, what makes AI so hard to work with? I mean, you talk about the complexity. Yeah. A lot of moving parts. I mean, literally moving parts, but, but what is it in, in your mind that, that gets people's eyes spinning a little bit when they, they look at great potential. Yeah. But also they look at the downside of maybe having to work your way through Pike mere of sorts. >>So, so the potential is definitely there, but it's important to remember that a lot of AI initiatives fail. Like a lot of initiative AI initiatives, something like 80 or 90% don't make it out of, you know, the research or prototyping phase and inter production. Hmm. So, some of the things that are hard about AI and the reasons that AI initiatives can fail, one is the scale required, you know, moving. It's one thing to develop something on your laptop, it's another thing to run it across thousands of machines. So that's scale, right? Another is the transition from development and prototyping to production. Those are very different, have very different requirements. Absolutely. A lot of times it's different teams within a company. They have different tech stacks, different software they're using. You know, we hear companies say that when they move from develop, you know, once they prototype and develop a model, it could take six to 12 weeks to get that model in production. >>And that often involves rewriting a lot of code and handing it off to another team. So the transition from development to production is, is a big challenge. So the scale, the development to production handoff. And then lastly, a big challenge is around flexibility. So AI's a fast moving field, you see new developments, new algorithms, new models coming out all the time. And a lot of teams we work with, you know, they've, they've built infrastructure. They're using products out there to do ai, but they've found that it's sort of locking them into rigid workflows or specific tools, and they don't have the flexibility to adopt new algorithms or new strategies or approaches as they're being developed as they come out. And so they, but their developers want the flexibility to use the latest tools, the latest strategies. And so those are some of the main problems we see. It's really like, how do you scale scalability? How do you move easily from development and production and back? And how do you remain flexible? How do you adapt and, and use the best tools that are coming out? And so those are, yeah, just those are and often reasons that people start to use Ray, which is our open source project in any scale, which is our, our product. So tell >>Me about Ray, right? Yeah. Opensource project. I think you said you worked on it >>At Berkeley. That's right. Yeah. So before this company, I did a PhD in machine learning at Berkeley. And one of the challenges that we were running into ourselves, we were trying to do machine learning. We actually weren't infrastructure or distributed systems people, but we found ourselves in order to do machine learning, we found ourselves building all sorts of tools, ad hoc tools and systems to scale the machine learning, to be able to run it in a reasonable amount of time and to be able to leverage the compute that we needed. And it wasn't just us people all across, you know, machine learning researchers, machine learning practitioners were building their own tooling and infrastructure. And that was one of the things that we felt was really holding back progress. And so that's how we slowly and kind of gradually got into saying, Hey, we could build better tools here. >>We could build, we could try to make this easier to do so that all of these people don't have to build their own infrastructure. They can focus on the actual machine learning applications that they're trying to build. And so we started, Ray started this open source project for basically scaling Python applications and scaling machine learning applications. And, well, initially we were running around Berkeley trying to get all of our friends to try it out and, and adopt it and, you know, and give us feedback. And if it didn't work, we would debug it right away. And that slow, you know, that gradually turned into more companies starting to adopt it, bigger teams starting to adopt it, external contributors starting to, to contribute back to the open source project and make it better. And, you know, before you know it, we were hosting meetups, giving to talks, running tutorials, and the project was just taking off. And so that's a big part of what we continue to develop today at any scale, is like really fostering this open source community, growing the open source user base, making sure Ray is just the best way to scale Python applications and, and machine learning applications. >>So, so this was a graduate school project That's right. You say on, on your way to getting your doctorate and now you commercializing now, right? Yeah. I mean, so you're being able to offer it, first off, what a journey that was, right? I mean, who would've thought Absolutely. I guess you probably did think that at some point, but >>No, you know, when we started, when we were working on Ray, we actually didn't anticipate becoming a company, or we at least just weren't looking that far ahead. We were really excited about solving this problem of making distributed computing easy, you know, getting to the point where developers just don't have to learn about infrastructure and distributed systems, but get all the benefits. And of course, it wasn't until, you know, later on as we were graduating from Berkeley and we wanted to continue really taking this project further and, and really solving this problem that it, we realized it made sense to start a company. >>So help me out, like, like what, what, and I might have missed this, so I apologize if I did, but in terms of, of Ray's that building block and essential for your, your ML or AI work down the road, you know, what, what is it doing for me or what, what will it allow me to do in either one of those realms that I, I can't do now? >>Yeah. And so, so like why use Ray versus not using Ray? Yeah, I think the, the answer is that you, you know, if you're doing ai, you need to scale. It's becoming, if you don't find that to be the case today, you probably will tomorrow, you know, or the day after that. And so it's really increasingly, it's a requirement. It's not an option. And so if you're scaling, if you're trying to build these scalable applications you are building, you're either going to use Ray or, or something like Ray or you're going to build the infrastructure yourself and building the infrastructure yourself, that's a long journey. >>So why take that on, right? >>And many of the companies we work with don't want to be in the business of building and managing infrastructure. No. Because, you know, if they, they want their their best engineers to build their product, right? To, to get their product to market faster. >>I want, I want you to do that for me. >>Right? Exactly. And so, you know, we can really accelerate what these teams can do and, you know, and if we can make the infrastructure something they just don't have to think about, that's, that's why you would choose to use Ray. >>Okay. You know, between a and I and ml are, are they different animals in terms of what you're trying to get done or what Ray can do? >>Yeah, and actually I should say like, it's not just, you know, teams that are new teams that are starting out, that are using Ray, many companies that have built, already built their own infrastructure will then switch to using Ray. And to give you a few examples, like Uber runs all their deep learning on Ray, okay. And, you know, open ai, which is really at the frontier of training large models and, and you know, pushing the boundaries of, of ai, they train their largest models using Ray. You know, companies like Shopify rebuilt their entire machine learning platform using Ray, >>But they started somewhere else. >>They had, this is all, you know, like, it's not like the v1, you know, of their, of their machine learning infrastructure. This is like, they did it a different way before, this is like the second version or the third iteration of of, of how they're doing it. And they realize often it's because, you know, I mean in the case of, of Uber, just to give you one example, they built a system called hova for scaling deep learning on a bunch of GPUs. Right Now, as you scale deep learning on GPUs for them, the bottleneck shifted away from, you know, as you scale GPU's training, the bottleneck shifted away from training and to the data ingest and pre-processing. And they wanted to scale data ingest and pre-processing on CPUs. So now Hova, it's a deep learning framework. It doesn't do the data ingest and pre-processing on CPUs, but you can, if you run Hova on top of Ray, you can scale training on GPUs. >>And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. You can pipeline them together. And that allowed them to train larger models on more data before, just to take one example, ETA prediction, if you get in an Uber, it tells you what time you're supposed to arrive. Sure. That uses a deep learning model called d eta. And before they were able to train on about two weeks worth of data. Now, you know, using Ray and for scaling the data, ingestive pre-processing and training, they can train on much more data. You know, you can get more accurate ETA predictions. So that's just one example of the kind of benefit they were able to get. Right. Also, because it's running on top of, of Ray and Ray has this ecosystem of libraries, you know, they can also use Ray's hyper parameter tuning library to do hyper parameter tuning for their deep learning models. >>They can also use it for inference and you know, because these are all built on top of Ray, they inherit the like, elasticity and fault tolerance of running on top of Ray. So really it simplifies things on the infrastructure side cuz there's just, if you have Ray as common infrastructure for your machine learning workloads, there's just one system to, to kind of manage and operate. And if you are, it simplifies things for the end users like the developers because from their perspective, they're just writing a Python application. They don't have to learn how to use three different distributed systems and stitch them together and all of this. >>So aws, before I let you go, how do they come into play here for you? I mean, are you part of the showcase, a startup showcase? So obviously a major partner and major figure in the offering that you're presenting >>People? Yeah, well you can run. So any scale is a managed ray service. Like any scale is just the best way to run Ray and deploy Ray. And we run on top of aws. So many of our customers are, you know, using Ray through any scale on aws. And so we work very closely together and, and you know, we have, we have joint customers and basically, and you know, a lot of the value that any scale is adding on top of Ray is around the production story. So basically, you know, things like high availability, things like failure handling, retry alerting, persistence, reproducibility, these are a lot of the value, the values of, you know, the value that our platform adds on top of the open source project. A lot of stuff as well around collaboration, you know, imagine you are, you, something goes wrong with your application, your production job, you want to debug it, you can just share the URL with your, your coworker. They can click a button, reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. And also a lot around, one thing that's, that's important for a lot of our customers is efficiency around cost. And so we >>Support every customer. >>Exactly. A lot of people are spending a lot of money on, on aws. Yeah. Right? And so any scale supports running out of the box on cheaper like spot instances, these preempt instances, which, you know, just reduce costs by quite a bit. And so things like that. >>Well, the company is any scale and you're on the show floor, right? So if you're having a chance to watch this during reinvent, go down and check 'em out. Robert Ashihara joining us here, the co-founder and ceo and Robert, thanks for being with us. Yeah. Here on the cube. Really enjoyed it. Me too. Thanks so much. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you go. Very nicely done. All right, we're gonna continue our coverage here on the Cube with more here from Las Vegas. We're the Venetian, we're AWS Reinvent 22 and you're watching the Cube, the leader in high tech coverage.

Published Date : Dec 1 2022

SUMMARY :

scale as the co-founder and CEO of the company, Robert and n, you are Robert. And thank you. for those at home and might not be familiar with what you do. Three years now. Yeah, So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, It just, you know, we'll handle that for you. I mean, you talk about the complexity. can fail, one is the scale required, you know, moving. And how do you remain flexible? I think you said you worked on it you know, machine learning researchers, machine learning practitioners were building their own tooling And, you know, before you know it, we were hosting meetups, I guess you probably did think that at some point, distributed computing easy, you know, getting to the point where developers just don't have to learn It's becoming, if you don't find that to be the case today, No. Because, you know, if they, they want their their best engineers to build their product, And so, you know, we can really accelerate what these teams can do to get done or what Ray can do? And to give you a few examples, like Uber runs all their deep learning on Ray, They had, this is all, you know, like, it's not like the v1, And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. And if you are, it simplifies things for the end users reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. these preempt instances, which, you know, just reduce costs by quite a bit. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you

ENTITIES

Entity	Category	Confidence
Robert	PERSON	0.99+
Robert Nishihara	PERSON	0.99+
John Walls	PERSON	0.99+
Robert Ashihara	PERSON	0.99+
six	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Ray	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Annie Scale	PERSON	0.99+
90%	QUANTITY	0.99+
Three	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
80	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Three years	QUANTITY	0.99+
Python	TITLE	0.99+
second version	QUANTITY	0.99+
tomorrow	DATE	0.99+
Facebook	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
12 weeks	QUANTITY	0.99+
today	DATE	0.99+
third iteration	QUANTITY	0.99+
one system	QUANTITY	0.99+
one example	QUANTITY	0.99+
Ray	ORGANIZATION	0.98+
three years	QUANTITY	0.98+
one	QUANTITY	0.97+
about two weeks	QUANTITY	0.96+
first	QUANTITY	0.96+
thousands of machines	QUANTITY	0.92+
aws	ORGANIZATION	0.91+
one thing	QUANTITY	0.91+
Anyscale	PERSON	0.9+
hova	TITLE	0.84+
Hova	TITLE	0.83+
Venetian	LOCATION	0.81+
money	QUANTITY	0.79+
Reinvent 22	EVENT	0.78+
Invent	EVENT	0.76+
three	QUANTITY	0.74+
Startup Showcase	EVENT	0.71+
Ray	TITLE	0.67+
Reinvent 22	TITLE	0.65+
2022 - Global Startup Program	TITLE	0.63+
things	QUANTITY	0.62+
ceo	PERSON	0.58+
Berkeley	ORGANIZATION	0.55+
v1	TITLE	0.47+
Startup	OTHER	0.38+

Ramesh Prabagaran, Prosimo | AWS re:Invent 2022

(gentle music) >> Hello, beautiful humans and welcome back to fabulous Las Vegas, where we are combating the dry air of the desert and all giggling about the rasp of our voice at this stage. We're theCUBE and we are live from AWS reinvent. I am Savannah Peterson, joined by the fabulous Paul Gillin. Paul, how are you holding up? How are your feet doing? >> My feet are, I can't feel them anymore. (both laugh) >> We can't feel much after these feet. >> Two miles. Just to get from, just to get to to the keynotes this morning. >> Did you do your cross training to prepare >> For, >> Apparently not well enough. (Savannah laughs) Not well enough. >> Well, it's great to have you here >> likewise. and I'm very excited for our next conversation. We've got Ramesh from Prosimo. >> Thank you. >> Savannah: Welcome to the show. How is the show going for you? How's your voice? >> Oh my God. I woke up this morning and I could not hear my own voice. I'm like, this is not me. I think it's the dry air here, so if I cough, I apologize in advance. But no, the show has been great. It's been nonstop at the booth. It's wonderful to see all the customers in one place so you don't have to schedule lots of meetings spread across three, four weeks. So you get to >> Savannah: Right. I, yeah >> So yesterday was like eight to six, nonstop and it was awesome, right? Because you get to meet all these guys. The other important thing is the focus on the right layer, right? Like, I loved the keynote from Adam. It was about applications, services, data. Nowhere in there was there like infrastructure. Like we are infrastructure, right? I actually love that because that's where the focus should be and that's what customers are caring about right? So it's, it's been great so far. >> Yeah. I'm so happy to hear your booth's packed. I know exactly what you mean. I mean, we're going to be talking about optimization. It's a theme, but we also optimize our time here >> Ramesh: Yeah. >> on the show floor by getting to engage with our community. Prosimo's been around for three years just in case folks aren't familiar, give us the pitch. >> Sure. We are in the cloud networking space, solving for two problems. What happens within the cloud as you bring up VPCs, vnet and workloads, how are they able to talk to each other, secure each other, and how to use those access workloads? Those are the two problems that we solve for. It stemmed from really us seeing a complete diversion in what cloud wants versus what network really focuses on. Cloud has been always focused on applications and speed of operations and network has always been about reliability, scalability, and robust architecture. And we didn't really see these things come together. So that's when prosimo was born. >> So what are some of the surprises newcomers to the cloud may encounter with networking, with cloud networking that was not a factor when they were fully on-prem? >> So the first thing is in the cloud, you can't deal with the workload the same way you dealt with in the data center. In the data center, you usually had pools of service. They were all allocated some level of addressing. And it was not about the workload, it was more about the identity, IP addresses and so forth. In the cloud, those things have completely gotten demolished, right? You have to refer to a S3 service as an S3 service. It's not an IP endpoint. IP endpoint comes and goes, right? >> Savannah: Yeah. >> And so you have to completely shift around that, right? >> Now, this actually challenges almost 10 years, 12, 20 years maybe, of networking that we knew about, right? So that's why cloud networking is almost night and day difference compared to regular networking right? And, we're seeing that and that's what we are really helping customers with. >> What are some of the trends that you're seeing? I, well actually, let me ask you this question. Do you, is there an industry or vertical you work with specifically? I would imagine most people across, >> Ramesh: The Yeah, across. >> Yeah. >> Anybody that has workloads in the cloud right? >> Yeah, right. >> Ramesh: That's, >> I mean I can't imagine any companies that would have that. >> Exactly. (Savannah laughs) >> What are some of the trends that you're seeing? I know we talk about time to value. We talk about cost optimization. Is that the top priority for your customers? >> Yeah. Up until end of last year, a lot of the focus was about speed of operations. And so people would look at what are the type of workloads? How do I enable things? How do I empower my development team? So, if I'm the cloud platform team responsible for connecting, securing and making sure my applications can get deployed smooth and fast, that was the primary focus. Fast forward to this year, we started to see this a little bit at the beginning of the year. Now it's in full force. It's about cost control, right? It's about egress charges coming out of the cloud. Suddenly the cloud bill and every single line item on the cloud bill is in focus, right? And so that has a direct impact on what does this mean for networking. Cloud networking for many may not be familiar, it's about 14% of the cloud bill. And so anything that materially moves the needle on the cloud networking costs can actually have a have a big impact, right? And so we have seen the focus on the speed of operations are still there but cloud cost control has become a big part of it. >> So where are the excesses? I mean, it's, it's a big part of the bill. Where can company, where do companies typically waste money in networking costs? >> So, if you bring a person who understands networking and networking architecture really, really well, they'll can build a solid architecture, but they'll not focus on operations and automation. If you bring a 25 year old, they will automate the heck out of it. They know python day in and day out. And so they'll automate the heck out of it but it will not be with a robust architecture, right? And so you, on one hand, you end up wasting because you do things very suboptimally. It's a solid architecture, it's a really good design but it's really bad for operations. In the other hand, with push of a button you can get anything done but underneath the covers, underneath the hood, if you look at it, it's a mess, right? And so you have more competence than necessary. And so, what customers want is really a best of both, right? You need solid architecture that has all the right principles but also you need the automation so that you don't employ four, five people and a whole toolkit in order to make things work, right? And that's where we see most of the efficiencies come from >> You said you were you were super busy at your booth. Do customers understand that this is a problem now? >> So more so now than I would say last year. The last reinvent when we had a session. >> Yeah. >> We had to educate a lot of people on these are the requirements for cloud networking. Thanks to Gartner, thanks to many of the sessions you guys have been doing as well. The focus and the education for what cloud networking requires has started to come about. Now, this is where the savviness of the customer is important, right? Like there are customers in different stages of their journey. Those that have been operating in the cloud for three years plus, know that they've crossed that initial phase, right? Like you have basic hygiene, you have certain things and moving from hundreds of VPCs to maybe about thousand, right? And so at that time, the set of challenges I need to work with are very, very different, right? So now increasingly we are seeing at the booth the challenges are, "Hey, I know how to operate in the cloud". Right? Like, "Don't talk to me about that." Right? "But how do I get from hundred to a thousand?" Because I have a gun to my head. My CIO has said, I need to decommission my data centers in the next couple of years and I need to go all in on cloud. Help me with that, right? And so it's the, I wouldn't call it like massive scale it's the scale from kind of the trivial to the next stage that's actually causing a lot of these problems to surface. >> It's that layer of transformation. >> Ramesh: Yeah. It's when you've made the commitment and now we've got to catch everything up >> [Ramesh} exactly. >> across the company locations and probably a variety of different silos doing different things. >> Ramesh: Exactly. Yeah. >> Super complex. So, how do folks get started with you? >> Yeah, so typically we start with like, even if the customer says, "Here's what my blueprint looks like." We say, "Bring two regions." That's it, two regions, a few workloads. We'll help you set up the connectivity, set up the secure access required, set up the foundational things There's a certain level of automation, right? Let's get to that point because governance is different. The cloud privileges are different so let's work through all of that, right? Usually this takes about a week or so. The actual proof of concept, proof of value can be done in a day, but getting permissions and what not takes about, about a week, right? And once you show two regions then it's actually game on, right? Then you go from 10 VPCs to a hundred to a thousand and it's just like one to one thing after another. So that's usually how we see customers get started. We have a full stack that covers kind of what does this mean for the network to application services to kind of layer seven and so forth. We tell the customer, as much as we want you to focus on the entire stack, let's start with one, right? Start baby steps, start with one. Because for many, cloud itself is, I wouldn't say new but they're in a region that's not comfortable, right? So you wannna, you don't want to throw too much at them. >> Savannah: Right. >> So we help them kind of progressively move towards different types of workplace. >> Savannah: Yeah. >> And you have a multicloud story as well. >> Ramesh: That's correct. >> So when companies begin to cross clouds with workloads, move them between clouds, what kinds of issues emerge then? >> Yeah, so there are two parts for this, right? There is the AWS and data center and then there is the AWS plus other clouds. Two different set of problems, actually, >> Paul: Hm-hmm. Hm-hmm. The AWS plus connectivity, back into my data center almost every single enterprise. We deal with kind of the global 2000. Every single one of them has that, right? And so we kind of, we go through a series of steps, come up with an architecture, deploy a solution. After that, it's, Hey, I have BigQuery in Google that needs to talk back to an S3 bucket out here. Like, no networking solution can help you with that. Like, you need like cloud native principles in order to come into the picture. So increasingly we are seeing requests for, hey I have a distributed workload. It's not, it's not that one single application is spread across multiple clouds, but I have these islands of workloads that all need to talk to each other. >> Paul: Right. And what I don't want to do is actually build highways that actually connect all these things together because that's a waste of time. I actually want to make sure that only these applications that care about the talking to each other, are allowed to talk to each other. So that's kind of one foundational thing that we see. A few others are around compliance and governance. So we say, Hey, if I'm a retailer, I need to have some workloads in Azure some in the GCP and so forth. So it depends on kind of the industry compliance, regulatory requirements and so forth. >> So many different needs >> Ramesh: Exactly. for so many different types of companies. But also, you know, creating that efficiency is so great. >> Ramesh: Yup. >> And especially that time to value tune, cost reduction >> Ramesh: Yup. doing a lot of great things for your customers. There's a note on my run sheet here that you've seen some success with Topgolf and I suspect we have some golfers in the audience. John even used to be a caddy. We had a caddy segment with someone who was a pro caddy. Drew, when we were at Cape Con. Tell us about that story. >> So it was a really wild idea. We said, okay people are going to be walking around 22,000 steps right? >> Savannah: Yeah. >> And so >> Like Paul, >> And, they're going to be talking to people, listening to sessions. So we said, let's, what do most others do? You set up some time in a restaurant, you come, you have a social time, and what not. We said, let's give people something different. So we reserve the Topgolf here and we opened it up. We initially paid for a certain number of things. It's actually gone three x of that right now. So we had in the Topgolf, can you give us like the entire thing? I think people just want to go do something different, right? >> Savannah: Yeah. >> And of course the topic is important but equally important is like, I just want to have a good time, right? >> Yeah. And if you, hit a few And there you go. >> It doesn't have to relate back to network >> Cloud, network. >> Yeah, exactly. And so >> Well, it's all about building community. >> Exactly. >> And especially right now, we all, you know, we're stronger together. >> Ramesh: Yup. We're entering a unique time, we're coming out of a unique time. >> Ramesh: Exactly. >> And, no, I think that's great. And we actually do a swag segment here on theCUBE, differentiating on the show floor. I mean, it's clear because of how thoughtful you are >> Ramesh: Yeah. there's a reason that your, that your booth is so busy. >> Ramesh: That's right. >> So what's next? What can you, can you give us a little sneak preview? What's coming out for you? >> Yeah, so, I'm sensitive and sympathetic to all the macroeconomic conditions that are happening but there's been, we have not skipped a beat. So our business is growing really well. Thanks to all the things that are happening in the cloud. Increasingly, folks are looking at, you know, how how do I move in mass into the cloud? And so a few themes have come about as a result. One, certainly around cost control. How do I, how do I make, how do we make sure that we help our customers in that journey, right? So we have a few things around those lines. Modernization, especially after you go through the first few workloads, the next few that come about are invariably modern workloads. And modern workloads is this sensitive thing where I think the ultra savvy developers know what to do but the infrastructure guys don't know what to do in order to serve, right? And so we have actually developed a set of capabilities to help with that kind of modernization, right? Because it's not enough if your apps are modernized, your infrastructure that serves the apps also need to be modernized. And so those are the, those are the things and certainly, getting our customers less than us. We want to get our customers to talk. And so you'll see quite a bit of that as well. >> I want to ask you about a statement that was in the notes that we were reading, running up this interview. Zero Trust network access is the next solution that will be disrupted. What do you mean by that? >> So, when we started the company about three years ago, zero test network access was there. It was about maybe two, three years old at that time. And so we said, it needs to be done differently in the cloud. Why? Because you are a user. You're trying to access an application in the cloud. Do you care what's in the middle? You really don't, you just want to be able to open up your laptop, go to dub dub something.com and you should be able to access, right? But that's not how the experience is today. There's invariably something that comes, a middle mile solution that comes in the middle, right? And then the guy needs to operationalize all of that. And that now passes on to you. You need to launch a an agent on your thing, connect into something. It just brings a lot of complexity, right? So we looked at that problem and we said, cloud has done really really a few things really, really well, right? It's literally at your doorstep. Cloud presence is literally at your doorstep. So as you open up your browser, connect from your home, I don't need anything in the middle. I am jumping straight into the cloud. And so when you do that, then you actually have the luxury of bringing a few capabilities to the entry point of the cloud so that security can be done better, posture control can be done better and so on and so forth. So we developed those capabilities almost three years ago. We have quite a few large enterprises that have deployed this. And we fundamentally believe on building on top of the hyperscale network because billions of tens of billions of dollars go into the investment here. And we want to be building a layer of value on top, right? And so we've been working closely with our AWS buddies here and actually built capabilities so that the infrastructure presence, the massive reach and also the underlying capabilities for zero trust are provided. But what the customer regains in terms of value is through our platform, right? And so we'll see a whole lot more innovation along these lines. Probably bad news for the Middle Mile provider who sit in the, in the middle because hey AWS is literally at your doorstep, so you have to rethink your strategy. >> Going to be a lot of agility >> Ramesh: Yes, absolutely. >> In a very different context than we normally use it in Nerdland. And no, I think that's great. So we have, it's an exciting time for you as a company. We have a new challenge here at Reinvent. >> Okay. >> On theCUBE. I know you're a venerable alumni. >> Yep. >> You have been on theCUBE multiple times with multiple companies which is very impressive. Which says a lot about you. Although given how fun this interview's been, I'm not surprised. Give us your 30 second, Instagram real highlight, sound bite on the biggest or most important theme or takeaway from this year's show. >> From this show? Yeah, so if you look across the keynotes in all the sessions, the focus is on data, services and the applications. So the biggest takeaway I would offer anybody is focus on that first because that's where the outcome needs to shine. The rest of the stuff is a means to an end. I am an infrastructure guy through and through, I have been for the last 20 years. It hurts me to say infrastructure is a means to end but it is, right. Let the people dealing with the infrastructure deal with the infrastructure. If you are a customer or a client of the service, focus on the outcome, focus on the apps, focus on the services focus on on the data. That would be the biggest takeaway. >> Savannah: I appreciate your >> Paul: Words of wisdom >> Savannah: transparency. Yeah, no, exactly. Words of wisdom and very honest words of wisdom. Really great to talk to you about intelligent infrastructure. >> Absolutely. >> Savannah: Thank you so much for being on the show, Ramesh. >> Thank you. >> Savannah: It's been, it's been awesome. Paul, it's always a pleasure. >> Likewise. Thank you all for tuning in today here live from the show floor at AWS, reinvent in beautiful sin city, in the high desert and the high end dry desert with Paul Gillin. My name is Savannah Peterson and you're watching theCUBE, the leader in high tech coverage. (gentle music)

Published Date : Nov 30 2022

SUMMARY :

of the desert and all My feet are, I can't feel them anymore. Just to get from, just to get to Apparently not well enough. and I'm very excited How is the show going for you? so you don't have to schedule lots Savannah: Right. the focus on the right layer, right? I know exactly what you mean. on the show floor by getting Those are the two problems In the data center, you that we knew about, right? What are some of the companies that would have that. (Savannah laughs) Is that the top priority a lot of the focus was I mean, it's, it's a big part of the bill. And so you have more you were super busy at your booth. So more so now than of the sessions you guys and now we've got to across the company locations and Ramesh: Exactly. how do folks get started with you? for the network to application services So we help them kind And you have a There is the AWS and data center in Google that needs to talk the talking to each other, But also, you know, creating golfers in the audience. people are going to be the entire thing? And there you go. And so Well, it's all about now, we all, you know, of a unique time. on the show floor. that your booth is so busy. are happening in the cloud. is the next solution so that the infrastructure presence, for you as a company. I know you're a venerable alumni. on the biggest or most focus on the apps, focus on the services to you about intelligent infrastructure. much for being on the show, Savannah: It's been, it's been awesome. and the high end dry desert

ENTITIES

Entity	Category	Confidence
Savannah	PERSON	0.99+
Ramesh	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Ramesh Prabagaran	PERSON	0.99+
Paul Gillin	PERSON	0.99+
two problems	QUANTITY	0.99+
John	PERSON	0.99+
12	QUANTITY	0.99+
Two miles	QUANTITY	0.99+
two regions	QUANTITY	0.99+
30 second	QUANTITY	0.99+
last year	DATE	0.99+
Las Vegas	LOCATION	0.99+
two parts	QUANTITY	0.99+
Adam	PERSON	0.99+
three years	QUANTITY	0.99+
Drew	PERSON	0.99+
yesterday	DATE	0.99+
Topgolf	ORGANIZATION	0.99+
hundred	QUANTITY	0.99+
today	DATE	0.98+
five people	QUANTITY	0.98+
four	QUANTITY	0.98+
eight	QUANTITY	0.98+
three	QUANTITY	0.98+
Prosimo	PERSON	0.98+
one	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
six	QUANTITY	0.98+
both	QUANTITY	0.98+
first	QUANTITY	0.98+
about a week	QUANTITY	0.97+
python	TITLE	0.97+
a day	QUANTITY	0.97+
first thing	QUANTITY	0.97+
zero trust	QUANTITY	0.97+
almost 10 years	QUANTITY	0.97+
two	QUANTITY	0.96+
end	DATE	0.96+
Reinvent	ORGANIZATION	0.95+
Prosimo	ORGANIZATION	0.95+
around 22,000 steps	QUANTITY	0.95+
billions of tens of billions of dollars	QUANTITY	0.95+
Instagram	ORGANIZATION	0.95+
this morning	DATE	0.94+
20 years	QUANTITY	0.94+

Shireesh Thota, SingleStore & Hemanth Manda, IBM | AWS re:Invent 2022

>>Good evening everyone and welcome back to Sparkly Sin City, Las Vegas, Nevada, where we are here with the cube covering AWS Reinvent for the 10th year in a row. John Furrier has been here for all 10. John, we are in our last session of day one. How does it compare? >>I just graduated high school 10 years ago. It's exciting to be, here's been a long time. We've gotten a lot older. My >>Got your brain is complex. You've been a lot in there. So fast. >>Graduated eight in high school. You know how it's No. All good. This is what's going on. This next segment, wrapping up day one, which is like the the kickoff. The Mondays great year. I mean Tuesdays coming tomorrow big days. The announcements are all around the kind of next gen and you're starting to see partnering and integration is a huge part of this next wave cuz API's at the cloud, next gen cloud's gonna be deep engineering integration and you're gonna start to see business relationships and business transformation scale a horizontally, not only across applications but companies. This has been going on for a while, covering it. This next segment is gonna be one of those things that we're gonna look at as something that's gonna happen more and more on >>Yeah, I think so. It's what we've been talking about all day. Without further ado, I would like to welcome our very exciting guest for this final segment, trust from single store. Thank you for being here. And we also have him on from IBM Data and ai. Y'all are partners. Been partners for about a year. I'm gonna go out on a limb only because their legacy and suspect that a few people, a few more people might know what IBM does versus what a single store does. So why don't you just give us a little bit of background so everybody knows what's going on. >>Yeah, so single store is a relational database. It's a foundational relational systems, but the thing that we do the best is what we call us realtime analytics. So we have these systems that are legacy, which which do operations or analytics. And if you wanted to bring them together, like most of the applications want to, it's really a big hassle. You have to build an ETL pipeline, you'd have to duplicate the data. It's really faulty systems all over the place and you won't get the insights really quickly. Single store is trying to solve that problem elegantly by having an architecture that brings both operational and analytics in one place. >>Brilliant. >>You guys had a big funding now expanding men. Sequel, single store databases, 46 billion again, databases. We've been saying this in the queue for 12 years have been great and recently not one database will rule the world. We know that. That's, everyone knows that databases, data code, cloud scale, this is the convergence now of all that coming together where data, this reinvent is the theme. Everyone will be talking about end to end data, new kinds of specialized services, faster performance, new kinds of application development. This is the big part of why you guys are working together. Explain the relationship, how you guys are partnering and engineering together. >>Yeah, absolutely. I think so ibm, right? I think we are mainly into hybrid cloud and ai and one of the things we are looking at is expanding our ecosystem, right? Because we have gaps and as opposed to building everything organically, we want to partner with the likes of single store, which have unique capabilities that complement what we have. Because at the end of the day, customers are looking for an end to end solution that's also business problems. And they are very good at real time data analytics and hit staff, right? Because we have transactional databases, analytical databases, data lakes, but head staff is a gap that we currently have. And by partnering with them we can essentially address the needs of our customers and also what we plan to do is try to integrate our products and solutions with that so that when we can deliver a solution to our customers, >>This is why I was saying earlier, I think this is a a tell sign of what's coming from a lot of use cases where people are partnering right now you got the clouds, a bunch of building blocks. If you put it together yourself, you can build a durable system, very stable if you want out of the box solution, you can get that pre-built, but you really can't optimize. It breaks, you gotta replace it. High level engineering systems together is a little bit different, not just buying something out of the box. You guys are working together. This is kind of an end to end dynamic that we're gonna hear a lot more about at reinvent from the CEO ofs. But you guys are doing it across companies, not just with aws. Can you guys share this new engineering business model use case? Do you agree with what I'm saying? Do you think that's No, exactly. Do you think John's crazy, crazy? I mean I all discourse, you got out of the box, engineer it yourself, but then now you're, when people do joint engineering project, right? They're different. Yeah, >>Yeah. No, I mean, you know, I think our partnership is a, is a testament to what you just said, right? When you think about how to achieve realtime insights, the data comes into the system and, and the customers and new applications want insights as soon as the data comes into the system. So what we have done is basically build an architecture that enables that we have our own storage and query engine indexing, et cetera. And so we've innovated in our indexing in our database engine, but we wanna go further than that. We wanna be able to exploit the innovation that's happening at ibm. A very good example is, for instance, we have a native connector with Cognos, their BI dashboards right? To reason data very natively. So we build a hyper efficient system that moves the data very efficiently. A very other good example is embedded ai. >>So IBM of course has built AI chip and they have basically advanced quite a bit into the embedded ai, custom ai. So what we have done is, is as a true marriage between the engineering teams here, we make sure that the data in single store can natively exploit that kind of goodness. So we have taken their libraries. So if you have have data in single store, like let's imagine if you have Twitter data, if you wanna do sentiment analysis, you don't have to move the data out model, drain the model outside, et cetera. We just have the pre-built embedded AI libraries already. So it's a, it's a pure engineering manage there that kind of opens up a lot more insights than just simple analytics and >>Cost by the way too. Moving data around >>Another big theme. Yeah. >>And latency and speed is everything about single store and you know, it couldn't have happened without this kind of a partnership. >>So you've been at IBM for almost two decades, don't look it, but at nearly 17 years in how has, and maybe it hasn't, so feel free to educate us. How has, how has IBM's approach to AI and ML evolved as well as looking to involve partnerships in the ecosystem as a, as a collaborative raise the water level together force? >>Yeah, absolutely. So I think when we initially started ai, right? I think we are, if you recollect Watson was the forefront of ai. We started the whole journey. I think our focus was more on end solutions, both horizontal and vertical. Watson Health, which is more vertically focused. We were also looking at Watson Assistant and Watson Discovery, which were more horizontally focused. I think it it, that whole strategy of the world period of time. Now we are trying to be more open. For example, this whole embedable AI that CICE was talking about. Yeah, it's essentially making the guts of our AI libraries, making them available for partners and ISVs to build their own applications and solutions. We've been using it historically within our own products the past few years, but now we are making it available. So that, how >>Big of a shift is that? Do, do you think we're seeing a more open and collaborative ecosystem in the space in general? >>Absolutely. Because I mean if you think about it, in my opinion, everybody is moving towards AI and that's the future. And you have two option. Either you build it on your own, which is gonna require significant amount of time, effort, investment, research, or you partner with the likes of ibm, which has been doing it for a while, right? And it has the ability to scale to the requirements of all the enterprises and partners. So you have that option and some companies are picking to do it on their own, but I believe that there's a huge amount of opportunity where people are looking to partner and source what's already available as opposed to investing from the scratch >>Classic buy versus build analysis for them to figure out, yeah, to get into the game >>And, and, and why reinvent the wheel when we're all trying to do things at, at not just scale but orders of magnitude faster and and more efficiently than we were before. It, it makes sense to share, but it's, it is, it does feel like a bit of a shift almost paradigm shift in, in the culture of competition versus how we're gonna creatively solve these problems. There's room for a lot of players here, I think. And yeah, it's, I don't >>Know, it's really, I wanted to ask if you don't mind me jumping in on that. So, okay, I get that people buy a bill I'm gonna use existing or build my own. The decision point on that is, to your point about the path of getting the path of AI is do I have the core competency skills, gap's a big issue. So, okay, the cube, if you had ai, we'd take it cuz we don't have any AI engineers around yet to build out on all the linguistic data we have. So we might use your ai but I might say this to then and we want to have a core competency. How do companies get that core competency going while using and partnering with, with ai? What you guys, what do you guys see as a way for them to get going? Because I think some people probably want to have core competency of >>Ai. Yeah, so I think, again, I think I, I wanna distinguish between a solution which requires core competency. You need expertise on the use case and you need expertise on your industry vertical and your customers versus the foundational components of ai, which are like, which are agnostic to the core competency, right? Because you take the foundational piece and then you further train it and define it for your specific use case. So we are not saying that we are experts in all the industry verticals. What we are good at is like foundational components, which is what we wanna provide. Got it. >>Yeah, that's the hard deep yes. Heavy lift. >>Yeah. And I can, I can give a color to that question from our perspective, right? When we think about what is our core competency, it's about databases, right? But there's a, some biotic relationship between data and ai, you know, they sort of like really move each other, right? You >>Need, they kind of can't have one without the other. You can, >>Right? And so the, the question is how do we make sure that we expand that, that that relationship where our customers can operationalize their AI applications closer to the data, not move the data somewhere else and do the modeling and then training somewhere else and dealing with multiple systems, et cetera. And this is where this kind of a cross engineering relationship helps. >>Awesome. Awesome. Great. And then I think companies are gonna want to have that baseline foundation and then start hiring in learning. It's like driving the car. You get the keys when you're ready to go. >>Yeah, >>Yeah. Think I'll give you a simple example, right? >>I want that turnkey lifestyle. We all do. Yeah, >>Yeah. Let me, let me just give you a quick analogy, right? For example, you can, you can basically make the engines and the car on your own or you can source the engine and you can make the car. So it's, it's basically an option that you can decide. The same thing with airplanes as well, right? Whether you wanna make the whole thing or whether you wanna source from someone who is already good at doing that piece, right? So that's, >>Or even create a new alloy for that matter. I mean you can take it all the way down in that analogy, >>Right? Is there a structural change and how companies are laying out their architecture in this modern era as we start to see this next let gen cloud emerge, teams, security teams becoming much more focused data teams. Its building into the DevOps into the developer pipeline, seeing that trend. What do you guys see in the modern data stack kind of evolution? Is there a data solutions architect coming? Do they exist yet? Is that what we're gonna see? Is it data as code automation? How do you guys see this landscape of the evolving persona? >>I mean if you look at the modern data stack as it is defined today, it is too detailed, it's too OSes and there are way too many layers, right? There are at least five different layers. You gotta have like a storage you replicate to do real time insights and then there's a query layer, visualization and then ai, right? So you have too many ETL pipelines in between, too many services, too many choke points, too many failures, >>Right? Etl, that's the dirty three letter word. >>Say no to ETL >>Adam Celeste, that's his quote, not mine. We hear that. >>Yeah. I mean there are different names to it. They don't call it etl, we call it replication, whatnot. But the point is hassle >>Data is getting more hassle. More >>Hassle. Yeah. The data is ultimately getting replicated in the modern data stack, right? And that's kind of one of our thesis at single store, which is that you'd have to converge not hyper specialize and conversation and convergence is possible in certain areas, right? When you think about operational analytics as two different aspects of the data pipeline, it is possible to bring them together. And we have done it, we have a lot of proof points to it, our customer stories speak to it and that is one area of convergence. We need to see more of it. The relationship with IBM is sort of another step of convergence wherein the, the final phases, the operation analytics is coming together and can we take analytics visualization with reports and dashboards and AI together. This is where Cognos and embedded AI comes into together, right? So we believe in single store, which is really conversions >>One single path. >>A shocking, a shocking tie >>Back there. So, so obviously, you know one of the things we love to joke about in the cube cuz we like to goof on the old enterprise is they solve complexity by adding more complexity. That's old. Old thinking. The new thinking is put it under the covers, abstract the way the complexities and make it easier. That's right. So how do you guys see that? Because this end to end story is not getting less complicated. It's actually, I believe increasing and complication complexity. However there's opportunities doing >>It >>More faster to put it under the covers or put it under the hood. What do you guys think about the how, how this new complexity gets managed or in this new data world we're gonna be coming in? >>Yeah, so I think you're absolutely right. It's the world is becoming more complex, technology is becoming more complex and I think there is a real need and it's not just from coming from us, it's also coming from the customers to simplify things. So our approach around AI is exactly that because we are essentially providing libraries, just like you have Python libraries, there are libraries now you have AI libraries that you can go infuse and embed deeply within applications and solutions. So it becomes integrated and simplistic for the customer point of view. From a user point of view, it's, it's very simple to consume, right? So that's what we are doing and I think single store is doing that with data, simplifying data and we are trying to do that with the rest of the portfolio, specifically ai. >>It's no wonder there's a lot of synergy between the two companies. John, do you think they're ready for the Instagram >>Challenge? Yes, they're ready. Uhoh >>Think they're ready. So we're doing a bit of a challenge. A little 32nd off the cuff. What's the most important takeaway? This could be your, think of it as your thought leadership sound bite from AWS >>2023 on Instagram reel. I'm scrolling. That's the Instagram, it's >>Your moment to stand out. Yeah, exactly. Stress. You look like you're ready to rock. Let's go for it. You've got that smile, I'm gonna let you go. Oh >>Goodness. You know, there is, there's this quote from astrophysics, space moves matter, a matter tells space how to curve. They have that kind of a relationship. I see the same between AI and data, right? They need to move together. And so AI is possible only with right data and, and data is meaningless without good insights through ai. They really have that kind of relationship and you would see a lot more of that happening in the future. The future of data and AI are combined and that's gonna happen. Accelerate a lot faster. >>Sures, well done. Wow. Thank you. I am very impressed. It's tough hacks to follow. You ready for it though? Let's go. Absolutely. >>Yeah. So just, just to add what is said, right, I think there's a quote from Rob Thomas, one of our leaders at ibm. There's no AI without ia. Essentially there's no AI without information architecture, which essentially data. But I wanna add one more thing. There's a lot of buzz around ai. I mean we are talking about simplicity here. AI in my opinion is three things and three things only. Either you use AI to predict future for forecasting, use AI to automate things. It could be simple, mundane task, it would be complex tasks depending on how exactly you want to use it. And third is to optimize. So predict, automate, optimize. Anything else is buzz. >>Okay. >>Brilliantly said. Honestly, I think you both probably hit the 32nd time mark that we gave you there. And the enthusiasm loved your hunger on that. You were born ready for that kind of pitch. I think they both nailed it for the, >>They nailed it. Nailed it. Well done. >>I I think that about sums it up for us. One last closing note and opportunity for you. You have a V 8.0 product coming out soon, December 13th if I'm not mistaken. You wanna give us a quick 15 second preview of that? >>Super excited about this. This is one of the, one of our major releases. So we are evolving the system on multiple dimensions on enterprise and governance and programmability. So there are certain features that some of our customers are aware of. We have made huge performance gains in our JSON access. We made it easy for people to consume, blossom on OnPrem and hybrid architectures. There are multiple other things that we're gonna put out on, on our site. So it's coming out on December 13th. It's, it's a major next phase of our >>System. And real quick, wasm is the web assembly moment. Correct. And the new >>About, we have pioneers in that we, we be wasm inside the engine. So you could run complex modules that are written in, could be C, could be rushed, could be Python. Instead of writing the the sequel and SQL as a store procedure, you could now run those modules inside. I >>Wanted to get that out there because at coupon we covered that >>Savannah Bay hot topic. Like, >>Like a blanket. We covered it like a blanket. >>Wow. >>On that glowing note, Dre, thank you so much for being here with us on the show. We hope to have both single store and IBM back on plenty more times in the future. Thank all of you for tuning in to our coverage here from Las Vegas in Nevada at AWS Reinvent 2022 with John Furrier. My name is Savannah Peterson. You're watching the Cube, the leader in high tech coverage. We'll see you tomorrow.

Published Date : Nov 29 2022

SUMMARY :

John, we are in our last session of day one. It's exciting to be, here's been a long time. So fast. The announcements are all around the kind of next gen So why don't you just give us a little bit of background so everybody knows what's going on. It's really faulty systems all over the place and you won't get the This is the big part of why you guys are working together. and ai and one of the things we are looking at is expanding our ecosystem, I mean I all discourse, you got out of the box, When you think about how to achieve realtime insights, the data comes into the system and, So if you have have data in single store, like let's imagine if you have Twitter data, if you wanna do sentiment analysis, Cost by the way too. Yeah. And latency and speed is everything about single store and you know, it couldn't have happened without this kind and maybe it hasn't, so feel free to educate us. I think we are, So you have that option and some in, in the culture of competition versus how we're gonna creatively solve these problems. So, okay, the cube, if you had ai, we'd take it cuz we don't have any AI engineers around yet You need expertise on the use case and you need expertise on your industry vertical and Yeah, that's the hard deep yes. you know, they sort of like really move each other, right? You can, And so the, the question is how do we make sure that we expand that, You get the keys when you're ready to I want that turnkey lifestyle. So it's, it's basically an option that you can decide. I mean you can take it all the way down in that analogy, What do you guys see in the modern data stack kind of evolution? I mean if you look at the modern data stack as it is defined today, it is too detailed, Etl, that's the dirty three letter word. We hear that. They don't call it etl, we call it replication, Data is getting more hassle. When you think about operational analytics So how do you guys see that? What do you guys think about the how, is exactly that because we are essentially providing libraries, just like you have Python libraries, John, do you think they're ready for the Instagram Yes, they're ready. A little 32nd off the cuff. That's the Instagram, You've got that smile, I'm gonna let you go. and you would see a lot more of that happening in the future. I am very impressed. I mean we are talking about simplicity Honestly, I think you both probably hit the 32nd time mark that we gave you there. They nailed it. I I think that about sums it up for us. So we are evolving And the new So you could run complex modules that are written in, could be C, We covered it like a blanket. On that glowing note, Dre, thank you so much for being here with us on the show.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
December 13th	DATE	0.99+
Shireesh Thota	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Adam Celeste	PERSON	0.99+
Rob Thomas	PERSON	0.99+
46 billion	QUANTITY	0.99+
12 years	QUANTITY	0.99+
John Furrier	PERSON	0.99+
three things	QUANTITY	0.99+
15 second	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
Python	TITLE	0.99+
10th year	QUANTITY	0.99+
two companies	QUANTITY	0.99+
third	QUANTITY	0.99+
32nd time	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
32nd	QUANTITY	0.99+
single store	QUANTITY	0.99+
Tuesdays	DATE	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.98+
10 years ago	DATE	0.98+
SingleStore	ORGANIZATION	0.98+
Single store	QUANTITY	0.98+
Hemanth Manda	PERSON	0.98+
Dre	PERSON	0.97+
eight	QUANTITY	0.96+
two option	QUANTITY	0.96+
day one	QUANTITY	0.96+
one more thing	QUANTITY	0.96+
one database	QUANTITY	0.95+
two different aspects	QUANTITY	0.95+
Mondays	DATE	0.95+
Instagram	ORGANIZATION	0.95+
IBM Data	ORGANIZATION	0.94+
10	QUANTITY	0.94+
about a year	QUANTITY	0.94+
CICE	ORGANIZATION	0.93+
three letter	QUANTITY	0.93+
today	DATE	0.93+
one place	QUANTITY	0.93+
Watson	TITLE	0.93+
One last	QUANTITY	0.92+
Cognos	ORGANIZATION	0.91+
Watson Assistant	TITLE	0.91+
nearly 17 years	QUANTITY	0.9+
Watson Health	TITLE	0.89+
Las Vegas, Nevada	LOCATION	0.89+
aws	ORGANIZATION	0.86+
one area	QUANTITY	0.86+
SQL	TITLE	0.86+
One single path	QUANTITY	0.85+
two decades	QUANTITY	0.8+
five different layers	QUANTITY	0.8+
Invent 2022	EVENT	0.77+
JSON	TITLE	0.77+

Hoshang Chenoy, Meraki & Matthew Scullion, Matillion | AWS re:Invent 2022

(upbeat music) >> Welcome back to Vegas. It's theCUBE live at AWS re:Invent 2022. We're hearing up to 50,000 people here. It feels like if the energy at this show is palpable. I love that. Lisa Martin here with Dave Vellante. Dave, we had the keynote this morning that Adam Selipsky delivered lots of momentum in his first year. One of the things that you said that you were looking in your breaking analysis that was released a few days ago, four trends and one of them, he said under Selipsky's rule in the 2020s, there's going to be a rush of data that will dwarf anything we have ever seen. >> Yeah, it was at least a quarter, maybe a third of his keynote this morning was all about data and the theme is simplifying data and doing better data integration, integrating across different data platforms. And we're excited to talk about that. Always want to simplify data. It's like the rush of data is so fast. It's hard for us to keep up. >> It is hard to keep that up. We're going to be talking with an alumni next about how his company is helping organizations like Cisco Meraki keep up with that data explosion. Please welcome back to the program, Matthew Scullion, the CEO of Matillion and how Hoshang Chenoy joins us, data scientist at Cisco Meraki. Guys, great to have you on the program. >> Thank you. >> Thank you for having us. >> So Matthew, we last saw you just a few months ago in Vegas at Snowflake Summits. >> Matthew: We only meet in Vegas. >> I guess we do, that's okay. Talk to us about some of the things, I know that Matillion is a data transformation solution that was originally introduced for AWS for Redshift. But talk to us about Matillion. What's gone on since we've seen you last? >> Well, I mean it's not that long ago but actually quite a lot. And it's all to do with exactly what you guys were just talking about there. This almost hard to comprehend way the world is changing with the amounts of data that we now can and need to put to work. And our worldview is there's no shortage of data but the choke points certainly one of the choke points. Maybe the choke point is our ability to make that data useful, to make it business ready. And we always talk about the end use cases. We talk about the dashboard or the AI model or the data science algorithm. But until before we can do any of that fun stuff, we have to refine raw data into business ready, usable data. And that's what Matillion is all about. And so since we last met, we've made a couple of really important announcements and possibly at the top of the list is what we call the data productivity cloud. And it's really squarely addressed this problem. It's the results of many years of work, really the apex of many years of the outsize engineering investment, Matillion loves to make. And the Data Productivity Cloud is all about helping organizations like Cisco Meraki and hundreds of others enterprise organizations around the world, get their data business ready, faster. >> Hoshang talk to us a little bit about what's going on at Cisco Meraki, how you're leveraging Matillion from a productivity standpoint. >> I've really been a Matillion fan for a while, actually even before Cisco Meraki at my previous company, LiveRamp. And you know, we brought Matillion to LiveRamp because you know, to Matthew's point, there is a stage in every data growth as I want to call it, where you have different companies at different stages. But to get data, data ready, you really need a platform like Matillion because it makes it really easy. So you have to understand Matillion, I think it's designed for someone that uses a lot of code but also someone that uses no code because the UI is so good. Someone like a marketer who doesn't really understand what's going on with that data but wants to be a data driven marketer when they look at the UI they immediately get it. They're just like, oh, I get what's happening with my data. And so that's the brilliance of Matillion and to get data to that data ready part, Matillion does a really, really good job because what we've been able to do is blend so many different data sources. So there is an abundance of data. Data is siloed though. And the connectivity between different data is getting harder and harder. And so here comes the Matillion with it's really simple solution, easy to use platform, powerful and we get to use all of that. So to really change the way we've thought about our analytics, the way we've progressed our division, yeah. >> You're always asking about superpowers and that is a superpower of Matillion 'cause you know, low-code, no-code sounds great but it only gets you a quarter of the way there, maybe 50% of the way there. You're kind of an "and" not an "or." >> That's a hundred percent right. And so I mentioned the Data Productivity Cloud earlier which is the name of this platform of technology we provide. That's all to do with making data business ready. And so I think one of the things we've seen in this industry over the past few years is a kind of extreme decomposition in terms of vendors of making data business ready. You've got vendors that just do loading, you've got vendors that just do a bit of data transformation, you've got vendors that do data ops and orchestration, you've got vendors that do reverse ETL. And so with the data productivity platform, you've got all of that. And particularly in this kind of, macroeconomic heavy weather that we're now starting to face, I think companies are looking for that. It's like, I don't want to buy five things, five sets of skills, five expensive licenses. I want one platform that can do it. But to your point David, it's the and not the or. We talk about the Data Productivity Cloud, the DPC, as being everyone ready. And what we mean by that is if you are the tech savvy marketer who wants to get a particular insight and you understand what a Rowan economy is, but you're not necessarily a hardcore super geeky data engineer then you can visual low-code, no-code, your data to a point where it's business ready. You can do that really quick. It's easy to understand, it's faster to ramp people onto those projects cause it like explains itself, faster to hand it over cause it's self-documenting. But, they'll always be individuals, teams, "and", "or" use cases that want to high-code as well. Maybe you want to code in SQL or Python, increasingly of course in DBT and you can do that on top of the Data Productivity Cloud as well. So you're not having to make a choice, but is that right? >> So one of the things that Matillion really delivers is speed to insight. I've always said that, you know, when you want to be business ready you want to make fast decisions, you want to act on data quickly, Matillion allows you to, this feed to insight is just unbelievably fast because you blend all of these different data sources, you can find the deficiencies in your process, you fix that and you can quickly turn things around and I don't think there's any other platform that I've ever used that has that ability. So the speed to insight is so tremendous with Matillion. >> The thing I always assume going on in our customers teams, like you run Hoshang is that the visual metaphor, be it around the orchestration and data ops jobs, be it around the transformation. I hope it makes it easier for teams not only to build it in the first place, but to live with it, right? To hand it over to other people and all that good stuff. Is that true? >> Let me highlight that a little bit more and better for you. So, say for example, if you don't have a platform like Matillion, you don't really have a central repository. >> Yeah. >> Where all of your codes meet, you could have a get repository, you could do all of those things. But, for example, for definitions, business definitions, any of those kind of things, you don't want it to live in just a spreadsheet. You want it to have a central platform where everybody can go in, there's detailed notes, copious notes that you can make on Matillion and people know exactly which flow to go to and be part of, and so I kind of think that that's really, really important because that's really helped us in a big, big way. 'Cause when I first got there, you know, you were pulling code from different scripts and things and you were trying to piece everything together. But when you have a platform like Matillion and you actually see it seamlessly across, it's just so phenomenal. >> So, I want to pick up on something Matthew said about, consolidating platforms and vendors because we have some data from PTR, one of our survey partners and they went out, every quarter they do surveys and they asked the customers that were going to decrease their spending in the quarter, "How are you going to do it?" And number one, by far, like, over a third said, "We're going to consolidate redundant vendors." Way ahead of cloud, we going to optimize cloud resource that was next at like 15%. So, confirms what you were saying and you're hearing that a lot. Will you wait? And I think we never get rid of stuff, we talk about it all the time. We call it GRS, get rid of stuff. Were you able to consolidate or at least minimize your expense around? >> Hoshang: Yeah, absolutely. >> What we were able to do is identify different parts of our tech stack that were just either deficient or duplicate, you know, so they're just like, we don't want any duplicate efforts, we just want to be able to have like, a single platform that does things, does things well and Matillion helped us identify all of those different and how do we choose the right tech stack. It's also about like Matillion is so easy to integrate with any tech stack, you know, it's just they have a generic API tool that you can log into anything besides all of the components that are already there. So it's a great platform to help you do that. >> And the three things we always say about the Data Productivity Cloud, everyone ready, we spoke about this is whether low-code, no-code, quasi-technical, quasi-business person using it, through to a high-end data engineer. You're going to feel at home on the DPC. The second one, which Hoshang was just alluding to there is stack ready, right? So it is built for AWS, built for Snowflake, built for Redshift, pure tight integration, push down ELT better than you could write yourself by hand. And then the final one is future ready, which is this idea that you can start now super easy. And we buy software quickly nowadays, right? We spin it up, we try it out and before we know it, the whole organization is using it. And so the future ready talks about that continuum of being able to launch in five minutes, learn it in five hours, deliver your first project in five days and yet still be happy that it's an enterprise scalable platform, five years down track including integrating with all the different things. So Matillion's job holding up the end of the bargain that Hoshang was just talking about there is to ensure we keep putting the features integrations and support into the Data Productivity Cloud to make sure that Hoshang's team can continue to live inside it and do all the things they need to do. >> Hoshang, you talked about the speed to insight being tremendously fast, but if I'm looking at Cisco Meraki from a high level business outcome perspective, what are some of those outcomes that a Matillion is helping Cisco Meraki to achieve. >> So I can just talk in general, not giving you like any specific numbers or anything, but for example, we were trying to understand how well our small and medium business campaigns were doing and we had to actually pull in data from multiple different sources. So not just, our instances of Marketo and Salesforce, we had to look at our internal databases. So Matillion helped us blend all of that together. Once I had all of that data blended, it was then ready to be analyzed. And once we had that analysis done, we were able to confirm that our SMB campaigns were doing well but these the things that we need to do to improve them. When we did that and all of that happened so quickly because they were like, well you need to get data from here, you need to get data from there. And we're like, great, we'll just plug, plug, plug. We put it all together, build transformations and you know we produced this insight and then we were able to reform, refine, and keep getting better and better at it. And you know, we had a 40X return on SMB campaigns. It's unbelievable. >> And there's the revenue tie in right there. >> Hoshang: Yeah. >> Matthew, I know you've been super busy, tons of meetings, you didn't get to see the whole keynote, but one of the themes of Adam Selipsky's keynote was, you know, the three letter word of ETL, they laid out a vision of zero ETL and then they announced zero ETL for Aurora and Redshift. And you think about ETL, I remember the days they said, "Okay, we're going to do ELT." Which is like, raising the debt ceiling, we're just going to kick the can down the road. So, what do you think about that vision? You know, how does it relate to what you guys are doing? >> So there was a, I don't know if this only works in the UK or it works globally. It was a good line many years ago. Rumors of my death are premature or so I think it was an obituary had gone out in the times by accident and that's how the guy responded to it. Something like that. It's a little bit like that. The announcement earlier within the AWS space of zero ETL between platforms like Aurora and Redshift and perhaps more over time is really about data movement, right? So it's about do I need to do a load of high cost in terms of coding and compute, movement of data between one platform, another. At Matillion, we've always seen data movement as an enabling technology, which gets you to the value add of transformation. My favorite metaphor to bring this to life is one of iron. So the world's made of iron, right? The world is literally made of iron ore but iron ore isn't useful until you turn it to steel. Loading data is digging out iron ore from the ground and moving it to the refinery. Transformation of data is turning iron ore into steel and what the announcements you saw earlier from AWS are more about the quarry to the factory bit than they are about the iron ore to the steel bit. And so, I think it's great that platforms are making it easier to move data between them, but it doesn't change the need for Hoshang's business professionals to refine that data into something useful to drive their marketing campaigns. >> Exactly, it's quarry to the factory and a very Snowflake like in a way, right? You make it easy to get in. >> It's like, don't get me wrong, I'm great to see investment going into the Redshift business and the AWS data analytics stack. We do a lot of business there. But yes, this stuff is also there on Snowflake, already. >> I mean come on, we've seen this for years. You know, I know there's a big love fest between Snowflake and AWS 'cause they're selling so much business in the field. But look that we saw it separating computing from storage, then AWS does it and now, you know, why not? It's good sense. That's what customers want. The customer obsessed data sharing is another thing. >> And if you take data sharing as an example from our friends at Snowflake, when that was announced a few people possibly, yourselves, said, "Oh, Matthew what do you think about this? You're in the data movement business." And I was like, "Ah, I'm not really actually, some of my competitors are in the data movement business. I have data movement as part of my platform. We don't charge directly for it. It's just part of the platform." And really what it's to do is to get the data into a place where you can do the fun stuff with it of refining into steel. And so if Snowflake or now AWS and the Redshift group are making that easier that's just faster to fun for me really. >> Yeah, sure. >> Last question, a question for both of you. If you had, you have a brand new shiny car, you got a bumper sticker that you want to put on that car to tell everyone about Matillion, everyone about Cisco Meraki, what does that bumper sticker say? >> So for Matillion, it says Matillion is the Data Productivity Cloud. We help you make your data business ready, faster. And then for a joke I'd write, "Which you are going to need in the face of this tsunami of data." So that's what mine would say. >> Love it. Hoshang, what would you say? >> I would say that Cisco makes some of the best products for IT professionals. And I don't think you can, really do the things you do in IT without any Cisco product. Really phenomenal products. And, we've gone so much beyond just the IT realm. So you know, it's been phenomenal. >> Awesome. Guys, it's been a pleasure having you back on the program. Congrats to you now Hoshang, an alumni of theCUBE. >> Thank you. >> But thank you for talking to us, Matthew, about what's going on with Matillion so much since we've seen you last. I can imagine how much worse going to go on until we see you again. But we appreciate, especially having the Cisco Meraki customer example that really articulates the value of data for everyone. We appreciate your insights and we appreciate your time. >> Thank you. >> Privilege to be here. Thanks for having us. >> Thank you. >> Pleasure. For our guests and Dave Vellante, I'm Lisa Martin. You're watching theCUBE, the leader in live enterprise and emerging tech coverage.

Published Date : Nov 29 2022

SUMMARY :

One of the things that you and the theme is simplifying data Guys, great to have you on the program. you just a few months ago What's gone on since we've seen you last? And the Data Productivity Cloud Hoshang talk to us a little And so that's the brilliance of Matillion but it only gets you a And so I mentioned the Data So the speed to insight is is that the visual metaphor, if you don't have a and things and you were trying So, confirms what you were saying to help you do that. and do all the things they need to do. Hoshang, you talked about the speed And you know, we had a 40X And there's the revenue to what you guys are doing? the guy responded to it. Exactly, it's quarry to the factory and the AWS data analytics stack. now, you know, why not? And if you take data you want to put on that car We help you make your data Hoshang, what would you say? really do the things you do in Congrats to you now Hoshang, until we see you again. Privilege to be here. the leader in live enterprise

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Matthew	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David	PERSON	0.99+
Matthew Scullion	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Adam Selipsky	PERSON	0.99+
Vegas	LOCATION	0.99+
Cisco	ORGANIZATION	0.99+
Hoshang	PERSON	0.99+
50%	QUANTITY	0.99+
five days	QUANTITY	0.99+
UK	LOCATION	0.99+
five hours	QUANTITY	0.99+
five minutes	QUANTITY	0.99+
Selipsky	PERSON	0.99+
Matillion	ORGANIZATION	0.99+
2020s	DATE	0.99+
Hoshang Chenoy	PERSON	0.99+
40X	QUANTITY	0.99+
15%	QUANTITY	0.99+
first project	QUANTITY	0.99+
Cisco Meraki	ORGANIZATION	0.99+
Aurora	ORGANIZATION	0.99+
five sets	QUANTITY	0.99+
Python	TITLE	0.99+
one	QUANTITY	0.99+
Meraki	PERSON	0.99+
one platform	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
SQL	TITLE	0.99+
second one	QUANTITY	0.98+
five years	QUANTITY	0.98+
five expensive licenses	QUANTITY	0.98+
first year	QUANTITY	0.98+
PTR	ORGANIZATION	0.98+
LiveRamp	ORGANIZATION	0.97+
Snowflake	TITLE	0.97+
three things	QUANTITY	0.97+
hundred percent	QUANTITY	0.96+
Matillion	PERSON	0.96+
zero	QUANTITY	0.95+
Redshift	TITLE	0.95+
over a third	QUANTITY	0.94+

Scott Castle, Sisense | AWS re:Invent 2022

>>Good morning fellow nerds and welcome back to AWS Reinvent. We are live from the show floor here in Las Vegas, Nevada. My name is Savannah Peterson, joined with my fabulous co-host John Furrier. Day two keynotes are rolling. >>Yeah. What do you thinking this? This is the day where everything comes, so the core gets popped off the bottle, all the announcements start flowing out tomorrow. You hear machine learning from swee lot more in depth around AI probably. And then developers with Verner Vos, the CTO who wrote the seminal paper in in early two thousands around web service that becames. So again, just another great year of next level cloud. Big discussion of data in the keynote bulk of the time was talking about data and business intelligence, business transformation easier. Is that what people want? They want the easy button and we're gonna talk a lot about that in this segment. I'm really looking forward to this interview. >>Easy button. We all want the >>Easy, we want the easy button. >>I love that you brought up champagne. It really feels like a champagne moment for the AWS community as a whole. Being here on the floor feels a bit like the before times. I don't want to jinx it. Our next guest, Scott Castle, from Si Sense. Thank you so much for joining us. How are you feeling? How's the show for you going so far? Oh, >>This is exciting. It's really great to see the changes that are coming in aws. It's great to see the, the excitement and the activity around how we can do so much more with data, with compute, with visualization, with reporting. It's fun. >>It is very fun. I just got a note. I think you have the coolest last name of anyone we've had on the show so far, castle. Oh, thank you. I'm here for it. I'm sure no one's ever said that before, but I'm so just in case our audience isn't familiar, tell us about >>Soy Sense is an embedded analytics platform. So we're used to take the queries and the analysis that you can power off of Aurora and Redshift and everything else and bring it to the end user in the applications they already know how to use. So it's all about embedding insights into tools. >>Embedded has been a, a real theme. Nobody wants to, it's I, I keep using the analogy of multiple tabs. Nobody wants to have to leave where they are. They want it all to come in there. Yep. Now this space is older than I think everyone at this table bis been around since 1958. Yep. How do you see Siente playing a role in the evolution there of we're in a different generation of analytics? >>Yeah, I mean, BI started, as you said, 58 with Peter Lu's paper that he wrote for IBM kind of get became popular in the late eighties and early nineties. And that was Gen one bi, that was Cognos and Business Objects and Lotus 1 23 think like green and black screen days. And the way things worked back then is if you ran a business and you wanted to get insights about that business, you went to it with a big check in your hand and said, Hey, can I have a report? And they'd come back and here's a report. And it wasn't quite right. You'd go back and cycle, cycle, cycle and eventually you'd get something. And it wasn't great. It wasn't all that accurate, but it's what we had. And then that whole thing changed in about two, 2004 when self-service BI became a thing. And the whole idea was instead of going to it with a big check in your hand, how about you make your own charts? >>And that was totally transformative. Everybody started doing this and it was great. And it was all built on semantic modeling and having very fast databases and data warehouses. Here's the problem, the tools to get to those insights needed to serve both business users like you and me and also power users who could do a lot more complex analysis and transformation. And as the tools got more complicated, the barrier to entry for everyday users got higher and higher and higher to the point where now you look, look at Gartner and Forester and IDC this year. They're all reporting in the same statistic. Between 10 and 20% of knowledge workers have learned business intelligence and everybody else is just waiting in line for a data analyst or a BI analyst to get a report for them. And that's why the focus on embedded is suddenly showing up so strong because little startups have been putting analytics into their products. People are seeing, oh my, this doesn't have to be hard. It can be easy, it can be intuitive, it can be native. Well why don't I have that for my whole business? So suddenly there's a lot of focus on how do we embed analytics seamlessly? How do we embed the investments people make in machine learning in data science? How do we bring those back to the users who can actually operationalize that? Yeah. And that's what Tysons does. Yeah. >>Yeah. It's interesting. Savannah, you know, data processing used to be what the IT department used to be called back in the day data processing. Now data processing is what everyone wants to do. There's a ton of data we got, we saw the keynote this morning at Adam Lesky. There was almost a standing of vision, big applause for his announcement around ML powered forecasting with Quick Site Cube. My point is people want automation. They want to have this embedded semantic layer in where they are not having all the process of ETL or all the muck that goes on with aligning the data. All this like a lot of stuff that goes on. How do you make it easier? >>Well, to be honest, I, I would argue that they don't want that. I think they, they think they want that, cuz that feels easier. But what users actually want is they want the insight, right? When they are about to make a decision. If you have a, you have an ML powered forecast, Andy Sense has had that built in for years, now you have an ML powered forecast. You don't need it two weeks before or a week after in a report somewhere. You need it when you're about to decide do I hire more salespeople or do I put a hundred grand into a marketing program? It's putting that insight at the point of decision that's important. And you don't wanna be waiting to dig through a lot of infrastructure to find it. You just want it when you need it. What's >>The alternative from a time standpoint? So real time insight, which is what you're saying. Yep. What's the alternative? If they don't have that, what's >>The alternative? Is what we are currently seeing in the market. You hire a bunch of BI analysts and data analysts to do the work for you and you hire enough that your business users can ask questions and get answers in a timely fashion. And by the way, if you're paying attention, there's not enough data analysts in the whole world to do that. Good luck. I am >>Time to get it. I really empathize with when I, I used to work for a 3D printing startup and I can, I have just, I mean, I would call it PTSD flashbacks of standing behind our BI guy with my list of queries and things that I wanted to learn more about our e-commerce platform in our, in our marketplace and community. And it would take weeks and I mean this was only in 2012. We're not talking 1958 here. We're talking, we're talking, well, a decade in, in startup years is, is a hundred years in the rest of the world life. But I think it's really interesting. So talk to us a little bit about infused and composable analytics. Sure. And how does this relate to embedded? Yeah. >>So embedded analytics for a long time was I want to take a dashboard I built in a BI environment. I wanna lift it and shift it into some other application so it's close to the user and that is the right direction to go. But going back to that statistic about how, hey, 10 to 20% of users know how to do something with that dashboard. Well how do you reach the rest of users? Yeah. When you think about breaking that up and making it more personalized so that instead of getting a dashboard embedded in a tool, you get individual insights, you get data visualizations, you get controls, maybe it's not even actually a visualization at all. Maybe it's just a query result that influences the ordering of a list. So like if you're a csm, you have a list of accounts in your book of business, you wanna rank those by who's priorities the most likely to churn. >>Yeah. You get that. How do you get that most likely to churn? You get it from your BI system. So how, but then the question is, how do I insert that back into the application that CSM is using? So that's what we talk about when we talk about Infusion. And SI started the infusion term about two years ago and now it's being used everywhere. We see it in marketing from Click and Tableau and from Looker just recently did a whole launch on infusion. The idea is you break this up into very small digestible pieces. You put those pieces into user experiences where they're relevant and when you need them. And to do that, you need a set of APIs, SDKs, to program it. But you also need a lot of very solid building blocks so that you're not building this from scratch, you're, you're assembling it from big pieces. >>And so what we do aty sense is we've got machine learning built in. We have an LQ built in. We have a whole bunch of AI powered features, including a knowledge graph that helps users find what else they need to know. And we, we provide those to our customers as building blocks so that they can put those into their own products, make them look and feel native and get that experience. In fact, one of the things that was most interesting this last couple of couple of quarters is that we built a technology demo. We integrated SI sensee with Office 365 with Google apps for business with Slack and MS teams. We literally just threw an Nlq box into Excel and now users can go in and say, Hey, which of my sales people in the northwest region are on track to meet their quota? And they just get the table back in Excel. They can build charts of it and PowerPoint. And then when they go to their q do their QBR next week or week after that, they just hit refresh to get live data. It makes it so much more digestible. And that's the whole point of infusion. It's bigger than just, yeah. The iframe based embedding or the JavaScript embedding we used to talk about four or five years >>Ago. APIs are very key. You brought that up. That's gonna be more of the integration piece. How does embedable and composable work as more people start getting on board? It's kind of like a Yeah. A flywheel. Yes. What, how do you guys see that progression? Cause everyone's copying you. We see that, but this is a, this means it's standard. People want this. Yeah. What's next? What's the, what's that next flywheel benefit that you guys coming out with >>Composability, fundamentally, if you read the Gartner analysis, right, they, when they talk about composable, they're talking about building pre-built analytics pieces in different business units for, for different purposes. And being able to plug those together. Think of like containers and services that can, that can talk to each other. You have a composition platform that can pull it into a presentation layer. Well, the presentation layer is where I focus. And so the, so for us, composable means I'm gonna have formulas and queries and widgets and charts and everything else that my, that my end users are gonna wanna say almost minority report style. If I'm not dating myself with that, I can put this card here, I can put that chart here. I can set these filters here and I get my own personalized view. But based on all the investments my organization's made in data and governance and quality so that all that infrastructure is supporting me without me worrying much about it. >>Well that's productivity on the user side. Talk about the software angle development. Yeah. Is your low code, no code? Is there coding involved? APIs are certainly the connective tissue. What's the impact to Yeah, the >>Developer. Oh. So if you were working on a traditional legacy BI platform, it's virtually impossible because this is an architectural thing that you have to be able to do. Every single tool that can make a chart has an API to embed that chart somewhere. But that's not the point. You need the life cycle automation to create models, to modify models, to create new dashboards and charts and queries on the fly. And be able to manage the whole life cycle of that. So that in your composable application, when you say, well I want chart and I want it to go here and I want it to do this and I want it to be filtered this way you can interact with the underlying platform. And most importantly, when you want to use big pieces like, Hey, I wanna forecast revenue for the next six months. You don't want it popping down into Python and writing that yourself. >>You wanna be able to say, okay, here's my forecasting algorithm. Here are the inputs, here's the dimensions, and then go and just put it somewhere for me. And so that's what you get withy sense. And there aren't any other analytics platforms that were built to do that. We were built that way because of our architecture. We're an API first product. But more importantly, most of the legacy BI tools are legacy. They're coming from that desktop single user, self-service, BI environment. And it's a small use case for them to go embedding. And so composable is kind of out of reach without a complete rebuild. Right? But with SI senses, because our bread and butter has always been embedding, it's all architected to be API first. It's integrated for software developers with gi, but it also has all those low code and no code capabilities for business users to do the minority report style thing. And it's assemble endless components into a workable digital workspace application. >>Talk about the strategy with aws. You're here at the ecosystem, you're in the ecosystem, you're leading product and they have a strategy. We know their strategy, they have some stuff, but then the ecosystem goes faster and ends up making a better product in most of the cases. If you compare, I know they'll take me to school on that, but I, that's pretty much what we report on. Mongo's doing a great job. They have databases. So you kind of see this balance. How are you guys playing in the ecosystem? What's the, what's the feedback? What's it like? What's going on? >>AWS is actually really our best partner. And the reason why is because AWS has been clear for many, many years. They build componentry, they build services, they build infrastructure, they build Redshift, they build all these different things, but they need, they need vendors to pull it all together into something usable. And fundamentally, that's what Cient does. I mean, we didn't invent sequel, right? We didn't invent jackal or dle. These are not, these are underlying analytics technologies, but we're taking the bricks out of the briefcase. We're assembling it into something that users can actually deploy for their use cases. And so for us, AWS is perfect because they focus on the hard bits. The the underlying technologies we assemble those make them usable for customers. And we get the distribution. And of course AWS loves that. Cause it drives more compute and it drives more, more consumption. >>How much do they pay you to say that >>Keynote, >>That was a wonderful pitch. That's >>Absolutely, we always say, hey, they got a lot of, they got a lot of great goodness in the cloud, but they're not always the best at the solutions and that they're trying to bring out, and you guys are making these solutions for customers. Yeah. That resonates with what they got with Amazon. For >>Example, we, last year we did a, a technology demo with Comprehend where we put comprehend inside of a semantic model and we would compile it and then send it back to Redshift. And it takes comprehend, which is a very cool service, but you kind of gotta be a coder to use it. >>I've been hear a lot of hype about the semantic layer. What is, what is going on with that >>Semantec layer is what connects the actual data, the tables in your database with how they're connected and what they mean so that a user like you or me who's saying I wanna bar chart with revenue over time can just work with revenue and time. And the semantic layer translates between what we did and what the database knows >>About. So it speaks English and then they converts it to data language. It's >>Exactly >>Right. >>Yeah. It's facilitating the exchange of information. And, and I love this. So I like that you actually talked about it in the beginning, the knowledge map and helping people figure out what they might not know. Yeah. I, I am not a bi analyst by trade and I, I don't always know what's possible to know. Yeah. And I think it's really great that you're doing that education piece. I'm sure, especially working with AWS companies, depending on their scale, that's gotta be a big part of it. How much is the community play a role in your product development? >>It's huge because I'll tell you, one of the challenges in embedding is someone who sees an amazing experience in outreach or in seismic. And to say, I want that. And I want it to be exactly the way my product is built, but I don't wanna learn a lot. And so you, what you want do is you want to have a community of people who have already built things who can help lead the way. And our community, we launched a new version of the SES community in early 2022 and we've seen a 450% growth in the c in that community. And we've gone from an average of one response, >>450%. I just wanna put a little exclamation point on that. Yeah, yeah. That's awesome. We, >>We've tripled our organic activity. So now if you post this Tysons community, it used to be, you'd get one response maybe from us, maybe from from a customer. Now it's up to three. And it's continuing to trend up. So we're, it's >>Amazing how much people are willing to help each other. If you just get in the platform, >>Do it. It's great. I mean, business is so >>Competitive. I think it's time for the, it's time. I think it's time. Instagram challenge. The reels on John. So we have a new thing. We're gonna run by you. Okay. We just call it the bumper sticker for reinvent. Instead of calling it the Instagram reels. If we're gonna do an Instagram reel for 30 seconds, what would be your take on what's going on this year at Reinvent? What you guys are doing? What's the most important story that you would share with folks on Instagram? >>You know, I think it's really what, what's been interesting to me is the, the story with Redshift composable, sorry. No, composable, Redshift Serverless. Yeah. One of the things I've been >>Seeing, we know you're thinking about composable a lot. Yes. Right? It's, it's just, it's in there, it's in your mouth. Yeah. >>So the fact that Redshift Serverless is now kind becoming the defacto standard, it changes something for, for my customers. Cuz one of the challenges with Redshift that I've seen in, in production is if as people use it more, you gotta get more boxes. You have to manage that. The fact that serverless is now available, it's, it's the default means it now people are just seeing Redshift as a very fast, very responsive repository. And that plays right into the story I'm telling cuz I'm telling them it's not that hard to put some analysis on top of things. So for me it's, it's a, maybe it's a narrow Instagram reel, but it's an >>Important one. Yeah. And that makes it better for you because you get to embed that. Yeah. And you get access to better data. Faster data. Yeah. Higher quality, relevant, updated. >>Yep. Awesome. As it goes into that 80% of knowledge workers, they have a consumer great expectation of experience. They're expecting that five ms response time. They're not waiting 2, 3, 4, 5, 10 seconds. They're not trained on theola expectations. And so it's, it matters a lot. >>Final question for you. Five years out from now, if things progress the way they're going with more innovation around data, this front end being very usable, semantic layer kicks in, you got the Lambda and you got serverless kind of coming in, helping out along the way. What's the experience gonna look like for a user? What's it in your mind's eye? What's that user look like? What's their experience? >>I, I think it shifts almost every role in a business towards being a quantitative one. Talking about, Hey, this is what I saw. This is my hypothesis and this is what came out of it. So here's what we should do next. I, I'm really excited to see that sort of scientific method move into more functions in the business. Cuz for decades it's been the domain of a few people like me doing strategy, but now I'm seeing it in CSMs, in support people and sales engineers and line engineers. That's gonna be a big shift. Awesome. >>Thank >>You Scott. Thank you so much. This has been a fantastic session. We wish you the best at si sense. John, always pleasure to share the, the stage with you. Thank you to everybody who's attuning in, tell us your thoughts. We're always eager to hear what, what features have got you most excited. And as you know, we will be live here from Las Vegas at reinvent from the show floor 10 to six all week except for Friday. We'll give you Friday off with John Furrier. My name's Savannah Peterson. We're the cube, the the, the leader in high tech coverage.

Published Date : Nov 29 2022

SUMMARY :

We are live from the show floor here in Las Vegas, Nevada. Big discussion of data in the keynote bulk of the time was We all want the How's the show for you going so far? the excitement and the activity around how we can do so much more with data, I think you have the coolest last name of anyone we've had on the show so far, queries and the analysis that you can power off of Aurora and Redshift and everything else and How do you see Siente playing a role in the evolution there of we're in a different generation And the way things worked back then is if you ran a business and you wanted to get insights about that business, the tools to get to those insights needed to serve both business users like you and me the muck that goes on with aligning the data. And you don't wanna be waiting to dig through a lot of infrastructure to find it. What's the alternative? and data analysts to do the work for you and you hire enough that your business users can ask questions And how does this relate to embedded? Maybe it's just a query result that influences the ordering of a list. And SI started the infusion term And that's the whole point of infusion. That's gonna be more of the integration piece. And being able to plug those together. What's the impact to Yeah, the And most importantly, when you want to use big pieces like, Hey, I wanna forecast revenue for And so that's what you get withy sense. How are you guys playing in the ecosystem? And the reason why is because AWS has been clear for That was a wonderful pitch. the solutions and that they're trying to bring out, and you guys are making these solutions for customers. which is a very cool service, but you kind of gotta be a coder to use it. I've been hear a lot of hype about the semantic layer. And the semantic layer translates between It's So I like that you actually talked about it in And I want it to be exactly the way my product is built, but I don't wanna I just wanna put a little exclamation point on that. And it's continuing to trend up. If you just get in the platform, I mean, business is so What's the most important story that you would share with One of the things I've been Seeing, we know you're thinking about composable a lot. right into the story I'm telling cuz I'm telling them it's not that hard to put some analysis on top And you get access to better data. And so it's, it matters a lot. What's the experience gonna look like for a user? see that sort of scientific method move into more functions in the business. And as you know, we will be live here from Las Vegas at reinvent from the show floor

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
2012	DATE	0.99+
Peter Lu	PERSON	0.99+
Friday	DATE	0.99+
80%	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
John	PERSON	0.99+
450%	QUANTITY	0.99+
Excel	TITLE	0.99+
10	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Savannah Peterson	PERSON	0.99+
John Furrier	PERSON	0.99+
Office 365	TITLE	0.99+
IDC	ORGANIZATION	0.99+
1958	DATE	0.99+
PowerPoint	TITLE	0.99+
20%	QUANTITY	0.99+
Forester	ORGANIZATION	0.99+
Python	TITLE	0.99+
Verner Vos	PERSON	0.99+
early 2022	DATE	0.99+
Gartner	ORGANIZATION	0.99+
last year	DATE	0.99+
10 seconds	QUANTITY	0.99+
five ms	QUANTITY	0.99+
Las Vegas, Nevada	LOCATION	0.99+
this year	DATE	0.99+
first product	QUANTITY	0.99+
aws	ORGANIZATION	0.98+
one response	QUANTITY	0.98+
late eighties	DATE	0.98+
Five years	QUANTITY	0.98+
2	QUANTITY	0.98+
tomorrow	DATE	0.98+
Savannah	PERSON	0.98+
Scott Castle	PERSON	0.98+
one	QUANTITY	0.98+
Sisense	PERSON	0.97+
5	QUANTITY	0.97+
English	OTHER	0.96+
Click and Tableau	ORGANIZATION	0.96+
Andy Sense	PERSON	0.96+
Looker	ORGANIZATION	0.96+
two weeks	DATE	0.96+
next week	DATE	0.96+
early nineties	DATE	0.95+
Instagram	ORGANIZATION	0.95+
serverless	TITLE	0.94+
AWS Reinvent	ORGANIZATION	0.94+
Mongo	ORGANIZATION	0.93+
single	QUANTITY	0.93+
Aurora	TITLE	0.92+
Lotus 1 23	TITLE	0.92+
One	QUANTITY	0.92+
JavaScript	TITLE	0.92+
SES	ORGANIZATION	0.92+
next six months	DATE	0.91+
MS	ORGANIZATION	0.91+
five years	QUANTITY	0.89+
six	QUANTITY	0.89+
a week	DATE	0.89+
Soy Sense	TITLE	0.89+
hundred grand	QUANTITY	0.88+
Redshift	TITLE	0.88+
Adam Lesky	PERSON	0.88+
Day two keynotes	QUANTITY	0.87+
floor 10	QUANTITY	0.86+
two thousands	QUANTITY	0.85+
Redshift Serverless	TITLE	0.85+
both business	QUANTITY	0.84+
3	QUANTITY	0.84+

Peter MacDonald & Itamar Ankorion | AWS re:Invent 2022

Published Date : Nov 23 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Siemens	ORGANIZATION	0.99+
Peter MacDonald	PERSON	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter McDonald	PERSON	0.99+
Itamar Ankorion	PERSON	0.99+
Qlik	ORGANIZATION	0.99+
28 billion	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Tens	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Siemens Energy	ORGANIZATION	0.99+
20 plus years	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
First	QUANTITY	0.99+
three	QUANTITY	0.99+
Itamar	PERSON	0.99+
over 20,000 tables	QUANTITY	0.99+
both	QUANTITY	0.99+
90,000 employees	QUANTITY	0.99+
first	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
Cloud Partners	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
over 38,000 customers	QUANTITY	0.99+
under 20 minutes	QUANTITY	0.99+
10 years	QUANTITY	0.99+
five	QUANTITY	0.99+
Excel	TITLE	0.99+
one	QUANTITY	0.99+
over 11 years	QUANTITY	0.98+
Snowpark	TITLE	0.98+
Second thing	QUANTITY	0.98+

Dhabaleswar “DK” Panda, Ohio State State University | SuperComputing 22

>>Welcome back to The Cube's coverage of Supercomputing Conference 2022, otherwise known as SC 22 here in Dallas, Texas. This is day three of our coverage, the final day of coverage here on the exhibition floor. I'm Dave Nicholson, and I'm here with my co-host, tech journalist extraordinaire, Paul Gillum. How's it going, >>Paul? Hi, Dave. It's going good. >>And we have a wonderful guest with us this morning, Dr. Panda from the Ohio State University. Welcome Dr. Panda to the Cube. >>Thanks a lot. Thanks a lot to >>Paul. I know you're, you're chopping at >>The bit, you have incredible credentials, over 500 papers published. The, the impact that you've had on HPC is truly remarkable. But I wanted to talk to you specifically about a product project you've been working on for over 20 years now called mva, high Performance Computing platform that's used by more than 32 organ, 3,200 organizations across 90 countries. You've shepherded this from, its, its infancy. What is the vision for what MVA will be and and how is it a proof of concept that others can learn from? >>Yeah, Paul, that's a great question to start with. I mean, I, I started with this conference in 2001. That was the first time I came. It's very coincidental. If you remember the Finman Networking Technology, it was introduced in October of 2000. Okay. So in my group, we were working on NPI for Marinette Quadrics. Those are the old technology, if you can recollect when Finman was there, we were the very first one in the world to really jump in. Nobody knew how to use Infin van in an HPC system. So that's how the Happy Project was born. And in fact, in super computing 2002 on this exhibition floor in Baltimore, we had the first demonstration, the open source happy, actually is running on an eight node infinite van clusters, eight no zeros. And that was a big challenge. But now over the years, I means we have continuously worked with all infinite van vendors, MPI Forum. >>We are a member of the MPI Forum and also all other network interconnect. So we have steadily evolved this project over the last 21 years. I'm very proud of my team members working nonstop, continuously bringing not only performance, but scalability. If you see now INFIN event are being deployed in 8,000, 10,000 node clusters, and many of these clusters actually use our software, stack them rapid. So, so we have done a lot of, like our focuses, like we first do research because we are in academia. We come up with good designs, we publish, and in six to nine months, we actually bring it to the open source version and people can just download and then use it. And that's how currently it's been used by more than 3000 orange in 90 countries. And, but the interesting thing is happening, your second part of the question. Now, as you know, the field is moving into not just hvc, but ai, big data, and we have those support. This is where like we look at the vision for the next 20 years, we want to design this MPI library so that not only HPC but also all other workloads can take advantage of it. >>Oh, we have seen libraries that become a critical develop platform supporting ai, TensorFlow, and, and the pie torch and, and the emergence of, of, of some sort of default languages that are, that are driving the community. How, how important are these frameworks to the, the development of the progress making progress in the HPC world? >>Yeah, no, those are great. I mean, spite our stencil flow, I mean, those are the, the now the bread and butter of deep learning machine learning. Am I right? But the challenge is that people use these frameworks, but continuously models are becoming larger. You need very first turnaround time. So how do you train faster? How do you do influencing faster? So this is where HPC comes in and what exactly what we have done is actually we have linked floor fighters to our happy page because now you see the MPI library is running on a million core system. Now your fighters and tenor four clan also be scaled to to, to those number of, large number of course and gps. So we have actually done that kind of a tight coupling and that helps the research to really take advantage of hpc. >>So if, if a high school student is thinking in terms of interesting computer science, looking for a place, looking for a university, Ohio State University, bruns, world renowned, widely known, but talk about what that looks like from a day on a day to day basis in terms of the opportunity for undergrad and graduate students to participate in, in the kind of work that you do. What is, what does that look like? And is, and is that, and is that a good pitch to for, for people to consider the university? >>Yes. I mean, we continuously, from a university perspective, by the way, the Ohio State University is one of the largest single campus in, in us, one of the top three, top four. We have 65,000 students. Wow. It's one of the very largest campus. And especially within computer science where I am located, high performance computing is a very big focus. And we are one of the, again, the top schools all over the world for high performance computing. And we also have very strength in ai. So we always encourage, like the new students who like to really work on top of the art solutions, get exposed to the concepts, principles, and also practice. Okay. So, so we encourage those people that wish you can really bring you those kind of experience. And many of my past students, staff, they're all in top companies now, have become all big managers. >>How, how long, how long did you say you've been >>At 31 >>Years? 31 years. 31 years. So, so you, you've had people who weren't alive when you were already doing this stuff? That's correct. They then were born. Yes. They then grew up, yes. Went to university graduate school, and now they're on, >>Now they're in many top companies, national labs, all over the universities, all over the world. So they have been trained very well. Well, >>You've, you've touched a lot of lives, sir. >>Yes, thank you. Thank >>You. We've seen really a, a burgeoning of AI specific hardware emerge over the last five years or so. And, and architectures going beyond just CPUs and GPUs, but to Asics and f PGAs and, and accelerators, does this excite you? I mean, are there innovations that you're seeing in this area that you think have, have great promise? >>Yeah, there is a lot of promise. I think every time you see now supercomputing technology, you see there is sometime a big barrier comes barrier jump. Rather I'll say, new technology comes some disruptive technology, then you move to the next level. So that's what we are seeing now. A lot of these AI chips and AI systems are coming up, which takes you to the next level. But the bigger challenge is whether it is cost effective or not, can that be sustained longer? And this is where commodity technology comes in, which commodity technology tries to take you far longer. So we might see like all these likes, Gaudi, a lot of new chips are coming up, can they really bring down the cost? If that cost can be reduced, you will see a much more bigger push for AI solutions, which are cost effective. >>What, what about on the interconnect side of things, obvi, you, you, your, your start sort of coincided with the initial standards for Infin band, you know, Intel was very, very, was really big in that, in that architecture originally. Do you see interconnects like RDMA over converged ethernet playing a part in that sort of democratization or commoditization of things? Yes. Yes. What, what are your thoughts >>There for internet? No, this is a great thing. So, so we saw the infinite man coming. Of course, infinite Man is, commod is available. But then over the years people have been trying to see how those RDMA mechanisms can be used for ethernet. And then Rocky has been born. So Rocky has been also being deployed. But besides these, I mean now you talk about Slingshot, the gray slingshot, it is also an ethernet based systems. And a lot of those RMA principles are actually being used under the hood. Okay. So any modern networks you see, whether it is a Infin and Rocky Links art network, rock board network, you name any of these networks, they are using all the very latest principles. And of course everybody wants to make it commodity. And this is what you see on the, on the slow floor. Everybody's trying to compete against each other to give you the best performance with the lowest cost, and we'll see whoever wins over the years. >>Sort of a macroeconomic question, Japan, the US and China have been leapfrogging each other for a number of years in terms of the fastest supercomputer performance. How important do you think it is for the US to maintain leadership in this area? >>Big, big thing, significantly, right? We are saying that I think for the last five to seven years, I think we lost that lead. But now with the frontier being the number one, starting from the June ranking, I think we are getting that leadership back. And I think it is very critical not only for fundamental research, but for national security trying to really move the US to the leading edge. So I hope us will continue to lead the trend for the next few years until another new system comes out. >>And one of the gating factors, there is a shortage of people with data science skills. Obviously you're doing what you can at the university level. What do you think can change at the secondary school level to prepare students better to, for data science careers? >>Yeah, I mean that is also very important. I mean, we, we always call like a pipeline, you know, that means when PhD levels we are expecting like this even we want to students to get exposed to, to, to many of these concerts from the high school level. And, and things are actually changing. I mean, these days I see a lot of high school students, they, they know Python, how to program in Python, how to program in sea object oriented things. Even they're being exposed to AI at that level. So I think that is a very healthy sign. And in fact we, even from Ohio State side, we are always engaged with all this K to 12 in many different programs and then gradually trying to take them to the next level. And I think we need to accelerate also that in a very significant manner because we need those kind of a workforce. It is not just like a building a system number one, but how do we really utilize it? How do we utilize that science? How do we propagate that to the community? Then we need all these trained personal. So in fact in my group, we are also involved in a lot of cyber training activities for HPC professionals. So in fact, today there is a bar at 1 1 15 I, yeah, I think 1215 to one 15. We'll be talking more about that. >>About education. >>Yeah. Cyber training, how do we do for professionals? So we had a funding together with my co-pi, Dr. Karen Tom Cook from Ohio Super Center. We have a grant from NASA Science Foundation to really educate HPT professionals about cyber infrastructure and ai. Even though they work on some of these things, they don't have the complete knowledge. They don't get the time to, to learn. And the field is moving so fast. So this is how it has been. We got the initial funding, and in fact, the first time we advertised in 24 hours, we got 120 application, 24 hours. We couldn't even take all of them. So, so we are trying to offer that in multiple phases. So, so there is a big need for those kind of training sessions to take place. I also offer a lot of tutorials at all. Different conference. We had a high performance networking tutorial. Here we have a high performance deep learning tutorial, high performance, big data tutorial. So I've been offering tutorials at, even at this conference since 2001. Good. So, >>So in the last 31 years, the Ohio State University, as my friends remind me, it is properly >>Called, >>You've seen the world get a lot smaller. Yes. Because 31 years ago, Ohio, in this, you know, of roughly in the, in the middle of North America and the United States was not as connected as it was to everywhere else in the globe. So that's, that's pro that's, I i it kind of boggles the mind when you think of that progression over 31 years, but globally, and we talk about the world getting smaller, we're sort of in the thick of, of the celebratory seasons where, where many, many groups of people exchange gifts for varieties of reasons. If I were to offer you a holiday gift, that is the result of what AI can deliver the world. Yes. What would that be? What would, what would, what would the first thing be? This is, this is, this is like, it's, it's like the genie, but you only get one wish. >>I know, I know. >>So what would the first one be? >>Yeah, it's very hard to answer one way, but let me bring a little bit different context and I can answer this. I, I talked about the happy project and all, but recently last year actually we got awarded an S f I institute award. It's a 20 million award. I am the overall pi, but there are 14 universities involved. >>And who is that in that institute? >>What does that Oh, the I ici. C e. Okay. I cycle. You can just do I cycle.ai. Okay. And that lies with what exactly what you are trying to do, how to bring lot of AI for masses, democratizing ai. That's what is the overall goal of this, this institute, think of like a, we have three verticals we are working think of like one is digital agriculture. So I'll be, that will be my like the first ways. How do you take HPC and AI to agriculture the world as though we just crossed 8 billion people. Yeah, that's right. We need continuous food and food security. How do we grow food with the lowest cost and with the highest yield? >>Water >>Consumption. Water consumption. Can we minimize or minimize the water consumption or the fertilization? Don't do blindly. Technologies are out there. Like, let's say there is a weak field, A traditional farmer see that, yeah, there is some disease, they will just go and spray pesticides. It is not good for the environment. Now I can fly it drone, get images of the field in the real time, check it against the models, and then it'll tell that, okay, this part of the field has disease. One, this part of the field has disease. Two, I indicate to the, to the tractor or the sprayer saying, okay, spray only pesticide one, you have pesticide two here. That has a big impact. So this is what we are developing in that NSF A I institute I cycle ai. We also have, we have chosen two additional verticals. One is animal ecology, because that is very much related to wildlife conservation, climate change, how do you understand how the animals move? Can we learn from them? And then see how human beings need to act in future. And the third one is the food insecurity and logistics. Smart food distribution. So these are our three broad goals in that institute. How do we develop cyber infrastructure from below? Combining HP c AI security? We have, we have a large team, like as I said, there are 40 PIs there, 60 students. We are a hundred members team. We are working together. So, so that will be my wish. How do we really democratize ai? >>Fantastic. I think that's a great place to wrap the conversation here On day three at Supercomputing conference 2022 on the cube, it was an honor, Dr. Panda working tirelessly at the Ohio State University with his team for 31 years toiling in the field of computer science and the end result, improving the lives of everyone on Earth. That's not a stretch. If you're in high school thinking about a career in computer science, keep that in mind. It isn't just about the bits and the bobs and the speeds and the feeds. It's about serving humanity. Maybe, maybe a little, little, little too profound a statement, I would argue not even close. I'm Dave Nicholson with the Queue, with my cohost Paul Gillin. Thank you again, Dr. Panda. Stay tuned for more coverage from the Cube at Super Compute 2022 coming up shortly. >>Thanks a lot.

Published Date : Nov 17 2022

SUMMARY :

Welcome back to The Cube's coverage of Supercomputing Conference 2022, And we have a wonderful guest with us this morning, Dr. Thanks a lot to But I wanted to talk to you specifically about a product project you've So in my group, we were working on NPI for So we have steadily evolved this project over the last 21 years. that are driving the community. So we have actually done that kind of a tight coupling and that helps the research And is, and is that, and is that a good pitch to for, So, so we encourage those people that wish you can really bring you those kind of experience. you were already doing this stuff? all over the world. Thank this area that you think have, have great promise? I think every time you see now supercomputing technology, with the initial standards for Infin band, you know, Intel was very, very, was really big in that, And this is what you see on the, Sort of a macroeconomic question, Japan, the US and China have been leapfrogging each other for a number the number one, starting from the June ranking, I think we are getting that leadership back. And one of the gating factors, there is a shortage of people with data science skills. And I think we need to accelerate also that in a very significant and in fact, the first time we advertised in 24 hours, we got 120 application, that's pro that's, I i it kind of boggles the mind when you think of that progression over 31 years, I am the overall pi, And that lies with what exactly what you are trying to do, to the tractor or the sprayer saying, okay, spray only pesticide one, you have pesticide two here. I think that's a great place to wrap the conversation here On

ENTITIES

Entity	Category	Confidence
Dave Nicholson	PERSON	0.99+
Paul Gillum	PERSON	0.99+
Dave	PERSON	0.99+
Paul Gillin	PERSON	0.99+
October of 2000	DATE	0.99+
Paul	PERSON	0.99+
NASA Science Foundation	ORGANIZATION	0.99+
2001	DATE	0.99+
Baltimore	LOCATION	0.99+
8,000	QUANTITY	0.99+
14 universities	QUANTITY	0.99+
31 years	QUANTITY	0.99+
20 million	QUANTITY	0.99+
24 hours	QUANTITY	0.99+
last year	DATE	0.99+
Karen Tom Cook	PERSON	0.99+
60 students	QUANTITY	0.99+
Ohio State University	ORGANIZATION	0.99+
90 countries	QUANTITY	0.99+
six	QUANTITY	0.99+
Earth	LOCATION	0.99+
Panda	PERSON	0.99+
today	DATE	0.99+
65,000 students	QUANTITY	0.99+
3,200 organizations	QUANTITY	0.99+
North America	LOCATION	0.99+
Python	TITLE	0.99+
United States	LOCATION	0.99+
Dallas, Texas	LOCATION	0.99+
over 500 papers	QUANTITY	0.99+
June	DATE	0.99+
One	QUANTITY	0.99+
more than 32 organ	QUANTITY	0.99+
120 application	QUANTITY	0.99+
Ohio	LOCATION	0.99+
more than 3000 orange	QUANTITY	0.99+
first ways	QUANTITY	0.99+
one	QUANTITY	0.99+
nine months	QUANTITY	0.99+
40 PIs	QUANTITY	0.99+
Asics	ORGANIZATION	0.99+
MPI Forum	ORGANIZATION	0.98+
China	ORGANIZATION	0.98+
Two	QUANTITY	0.98+
Ohio State State University	ORGANIZATION	0.98+
8 billion people	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
HP	ORGANIZATION	0.97+
Dr.	PERSON	0.97+
over 20 years	QUANTITY	0.97+
US	ORGANIZATION	0.97+
Finman	ORGANIZATION	0.97+
Rocky	PERSON	0.97+
Japan	ORGANIZATION	0.97+
first time	QUANTITY	0.97+
first demonstration	QUANTITY	0.96+
31 years ago	DATE	0.96+
Ohio Super Center	ORGANIZATION	0.96+
three broad goals	QUANTITY	0.96+
one wish	QUANTITY	0.96+
second part	QUANTITY	0.96+
31	QUANTITY	0.96+
Cube	ORGANIZATION	0.95+
eight	QUANTITY	0.95+
over 31 years	QUANTITY	0.95+
10,000 node clusters	QUANTITY	0.95+
day three	QUANTITY	0.95+
first	QUANTITY	0.95+
INFIN	EVENT	0.94+
seven years	QUANTITY	0.94+
Dhabaleswar “DK” Panda	PERSON	0.94+
three	QUANTITY	0.93+
S f I institute	TITLE	0.93+
first thing	QUANTITY	0.93+

Breaking Analysis: Snowflake caught in the storm clouds

>> From the CUBE Studios in Palo Alto in Boston, bringing you data driven insights from the Cube and ETR. This is Breaking Analysis with Dave Vellante. >> A better than expected earnings report in late August got people excited about Snowflake again, but the negative sentiment in the market is weighed heavily on virtually all growth tech stocks and Snowflake is no exception. As we've stressed many times the company's management is on a long term mission to dramatically simplify the way organizations use data. Snowflake is tapping into a multi hundred billion dollar total available market and continues to grow at a rapid pace. In our view, Snowflake is embarking on its third major wave of innovation data apps, while its first and second waves are still bearing significant fruit. Now for short term traders focused on the next 90 or 180 days, that probably doesn't matter. But those taking a longer view are asking, "Should we still be optimistic about the future of this high flyer or is it just another over hyped tech play?" Hello and welcome to this week's Wiki Bond Cube Insights powered by ETR. Snowflake's Quarter just ended. And in this breaking analysis we take a look at the most recent survey data from ETR to see what clues and nuggets we can extract to predict the near term future in the long term outlook for Snowflake which is going to announce its earnings at the end of this month. Okay, so you know the story. If you've been investor in Snowflake this year, it's been painful. We said at IPO, "If you really want to own this stock on day one, just hold your nose and buy it." But like most IPOs we said there will be likely a better entry point in the future, and not surprisingly that's been the case. Snowflake IPOed a price of 120, which you couldn't touch on day one unless you got into a friends and family Delio. And if you did, you're still up 5% or so. So congratulations. But at one point last year you were up well over 200%. That's been the nature of this volatile stock, and I certainly can't help you with the timing of the market. But longer term Snowflake is targeting 10 billion in revenue for fiscal year 2028. A big number. Is it achievable? Is it big enough? Tell you what, let's come back to that. Now shorter term, our expert trader and breaking analysis contributor Chip Simonton said he got out of the stock a while ago after having taken a shot at what turned out to be a bear market rally. He pointed out that the stock had been bouncing around the 150 level for the last few months and broke that to the downside last Friday. So he'd expect 150 is where the stock is going to find resistance on the way back up, but there's no sign of support right now. He said maybe at 120, which was the July low and of course the IPO price that we just talked about. Now, perhaps earnings will be a catalyst, when Snowflake announces on November 30th, but until the mentality toward growth tech changes, nothing's likely to change dramatically according to Simonton. So now that we have that out of the way, let's take a look at the spending data for Snowflake in the ETR survey. Here's a chart that shows the time series breakdown of snowflake's net score going back to the October, 2021 survey. Now at that time, Snowflake's net score stood at a robust 77%. And remember, net score is a measure of spending velocity. It's a proprietary network, and ETR derives it from a quarterly survey of IT buyers and asks the respondents, "Are you adopting the platform new? Are you spending 6% or more? Is you're spending flat? Is you're spending down 6% or worse? Or are you leaving the platform decommissioning?" You subtract the percent of customers that are spending less or churning from those that are spending more and adopting or adopting and you get a net score. And that's expressed as a percentage of customers responding. In this chart we show Snowflake's in out of the total survey which ranges... The total survey ranges between 1,200 and 1,400 each quarter. And the very last column... Oh sorry, very last row, we show the number of Snowflake respondents that are coming in the survey from the Fortune 500 and the Global 2000. Those are two very important Snowflake constituencies. Now what this data tells us is that Snowflake exited 2021 with very strong momentum in a net score of 82%, which is off the charts and it was actually accelerating from the previous survey. Now by April that sentiment had flipped and Snowflake came down to earth with a 68% net score. Still highly elevated relative to its peers, but meaningfully down. Why was that? Because we saw a drop in new ads and an increase in flat spend. Then into the July and most recent October surveys, you saw a significant drop in the percentage of customers that were spending more. Now, notably, the percentage of customers who are contemplating adding the platform is actually staying pretty strong, but it is off a bit this past survey. And combined with a slight uptick in planned churn, net score is now down to 60%. That uptick from 0% and 1% and then 3%, it's still small, but that net score at 60% is still 20 percentage points higher than our highly elevated benchmark of 40% as you recall from listening to earlier breaking analysis. That 40% range is we consider a milestone. Anything above that is actually quite strong. But again, Snowflake is down and coming back to churn, while 3% churn is very low, in previous quarters we've seen Snowflake 0% or 1% decommissions. Now the last thing to note in this chart is the meaningful uptick in survey respondents that are citing, they're using the Snowflake platform. That's up to 212 in the survey. So look, it's hard to imagine that Snowflake doesn't feel the softening in the market like everyone else. Snowflake is guiding for around 60% growth in product revenue against the tough compare from a year ago with a 2% operating margin. So like every company, the reaction of the street is going to come down to how accurate or conservative the guide is from their CFO. Now, earlier this year, Snowflake acquired a company called Streamlit for around $800 million. Streamlit is an open source Python library and it makes it easier to build data apps with machine learning, obviously a huge trend. And like Snowflake, generally its focus is on simplifying the complex, in this case making data science easier to integrate into data apps that business people can use. So we were excited this summer in the July ETR survey to see that they added some nice data and pick on Streamlit, which we're showing here in comparison to Snowflake's core business on the left hand side. That's the data warehousing, the Streamlit pieces on the right hand side. And we show again net score over time from the previous survey for Snowflake's core database and data warehouse offering again on the left as compared to a Streamlit on the right. Snowflake's core product had 194 responses in the October, 22 survey, Streamlit had an end of 73, which is up from 52 in the July survey. So significant uptick of people responding that they're doing business in adopting Streamlit. That was pretty impressive to us. And it's hard to see, but the net scores stayed pretty constant for Streamlit at 51%. It was 52% I think in the previous quarter, well over that magic 40% mark. But when you blend it with Snowflake, it does sort of bring things down a little bit. Now there are two key points here. One is that the acquisition seems to have gained exposure right out of the gate as evidenced by the large number of responses. And two, the spending momentum. Again while it's lower than Snowflake overall, and when you blend it with Snowflake it does pull it down, it's very healthy and steady. Now let's do a little pure comparison with some of our favorite names in this space. This chart shows net score or spending velocity in the Y-axis, an overlap or presence, pervasiveness if you will, in the data set on the X-axis. That red dotted line again is that 40% highly elevated net score that we like to talk about. And that table inserted informs us as to how the companies are plotted, where the dots set up, the net score, the ins. And we're comparing a number of database players, although just a caution, Oracle includes all of Oracle including its apps. But we just put it in there for reference because it is the leader in database. Right off the bat, Snowflake jumps out with a net score of 64%. The 60% from the earlier chart, again included Streamlit. So you can see its core database, data warehouse business actually is higher than the total company average that we showed you before 'cause the Streamlit is blended in. So when you separate it out, Streamlit is right on top of data bricks. Isn't that ironic? Only Snowflake and Databricks in this selection of names are above the 40% level. You see Mongo and Couchbase, they know they're solid and Teradata cloud actually showing pretty well compared to some of the earlier survey results. Now let's isolate on the database data platform sector and see how that shapes up. And for this analysis, same XY dimensions, we've added the big giants, AWS and Microsoft and Google. And notice that those three plus Snowflake are just at or above the 40% line. Snowflake continues to lead by a significant margin in spending momentum and it keeps creeping to the right. That's that end that we talked about earlier. Now here's an interesting tidbit. Snowflake is often asked, and I've asked them myself many times, "How are you faring relative to AWS, Microsoft and Google, these big whales with Redshift and Synapse and Big Query?" And Snowflake has been telling folks that 80% of its business comes from AWS. And when Microsoft heard that, they said, "Whoa, wait a minute, Snowflake, let's partner up." 'Cause Microsoft is smart, and they understand that the market is enormous. And if they could do better with Snowflake, one, they may steal some business from AWS. And two, even if Snowflake is winning against some of the Microsoft database products, if it wins on Azure, Microsoft is going to sell more compute and more storage, more AI tools, more other stuff to these customers. Now AWS is really aggressive from a partnering standpoint with Snowflake. They're openly negotiating, not openly, but they're negotiating better prices. They're realizing that when it comes to data, the cheaper that you make the offering, the more people are going to consume. At scale economies and operating leverage are really powerful things at volume that kick in. Now Microsoft, they're coming along, they obviously get it, but Google is seemingly resistant to that type of go to market partnership. Rather than lean into Snowflake as a great partner Google's field force is kind of fighting fashion. Google itself at Cloud next heavily messaged what they call the open data cloud, which is a direct rip off of Snowflake. So what can we say about Google? They continue to be kind of behind the curve when it comes to go to market. Now just a brief aside on the competitive posture. I've seen Slootman, Frank Slootman, CEO of Snowflake in action with his prior companies and how he depositioned the competition. At Data Domain, he eviscerated a company called Avamar with their, what he called their expensive and slow post process architecture. I think he actually called it garbage, if I recall at one conference I heard him speak at. And that sort of destroyed BMC when he was at ServiceNow, kind of positioning them as the equivalent of the department of motor vehicles. And so it's interesting to hear how Snowflake openly talks about the data platforms of AWS, Microsoft, Google, and data bricks. I'll give you this sort of short bumper sticker. Redshift is just an on-prem database that AWS morphed to the cloud, which by the way is kind of true. They actually did a brilliant job of it, but it's basically a fact. Microsoft Excel, a collection of legacy databases, which also kind of morphed to run in the cloud. And even Big Query, which is considered cloud native by many if not most, is being positioned by Snowflake as originally an on-prem database to support Google's ad business, maybe. And data bricks is for those people smart enough to get it to Berkeley that love complexity. And now Snowflake doesn't, they don't mention Berkeley as far as I know. That's my addition. But you get the point. And the interesting thing about Databricks and Snowflake is a while ago in the cube I said that there was a new workload type emerging around data where you have AWS cloud, Snowflake obviously for the cloud database and Databricks data for the data science and EML, you bring those things together and there's this new workload emerging that's going to be very powerful in the future. And it's interesting to see now the aspirations of all three of these platforms are colliding. That's quite a dynamic, especially when you see both Snowflake and Databricks putting venture money and getting their hooks into the loyalties of the same companies like DBT labs and Calibra. Anyway, Snowflake's posture is that we are the pioneer in cloud native data warehouse, data sharing and now data apps. And our platform is designed for business people that want simplicity. The other guys, yes, they're formidable, but we Snowflake have an architectural lead and of course we run in multiple clouds. So it's pretty strong positioning or depositioning, you have to admit. Now I'm not sure I agree with the big query knockoffs completely. I think that's a bit of a stretch, but snowflake, as we see in the ETR survey data is winning. So in thinking about the longer term future, let's talk about what's different with Snowflake, where it's headed and what the opportunities are for the company. Snowflake put itself on the map by focusing on simplifying data analytics. What's interesting about that is the company's founders are as you probably know from Oracle. And rather than focusing on transactional data, which is Oracle's sweet spot, the stuff they worked on when they were at Oracle, the founder said, "We're going to go somewhere else. We're going to attack the data warehousing problem and the data analytics problem." And they completely re-imagined the database and how it could be applied to solve those challenges and reimagine what was possible if you had virtually unlimited compute and storage capacity. And of course Snowflake became famous for separating the compute from storage and being able to completely shut down compute so you didn't have to pay for it when you're not using it. And the ability to have multiple clusters hit the same data without making endless copies and a consumption/cloud pricing model. And then of course everyone on the planet realized, "Wow, that's a pretty good idea." Every venture capitalist in Silicon Valley has been funding companies to copy that move. And that today has pretty much become mainstream in table stakes. But I would argue that Snowflake not only had the lead, but when you look at how others are approaching this problem, it's not necessarily as clean and as elegant. Some of the startups, the early startups I think get it and maybe had an advantage of starting later, which can be a disadvantage too. But AWS is a good example of what I'm saying here. Is its version of separating compute from storage was an afterthought and it's good, it's... Given what they had it was actually quite clever and customers like it, but it's more of a, "Okay, we're going to tier to storage to lower cost, we're going to sort of dial down the compute not completely, we're not going to shut it off, we're going to minimize the compute required." It's really not true as separation is like for instance Snowflake has. But having said that, we're talking about competitors with lots of resources and cohort offerings. And so I don't want to make this necessarily all about the product, but all things being equal architecture matters, okay? So that's the cloud S-curve, the first one we're showing. Snowflake's still on that S-curve, and in and of itself it's got legs, but it's not what's going to power the company to 10 billion. The next S-curve we denote is the multi-cloud in the middle. And now while 80% of Snowflake's revenue is AWS, Microsoft is ramping up and Google, well, we'll see. But the interesting part of that curve is data sharing, and this idea of data clean rooms. I mean it really should be called the data sharing curve, but I have my reasons for calling it multi-cloud. And this is all about network effects and data gravity, and you're seeing this play out today, especially in industries like financial services and healthcare and government that are highly regulated verticals where folks are super paranoid about compliance. There not going to share data if they're going to get sued for it, if they're going to be in the front page of the Wall Street Journal for some kind of privacy breach. And what Snowflake has done is said, "Put all the data in our cloud." Now, of course now that triggers a lot of people because it's a walled garden, okay? It is. That's the trade off. It's not the Wild West, it's not Windows, it's Mac, it's more controlled. But the idea is that as different parts of the organization or even partners begin to share data that they need, it's got to be governed, it's got to be secure, it's got to be compliant, it's got to be trusted. So Snowflake introduced the idea of, they call these things stable edges. I think that's the term that they use. And they track a metric around stable edges. And so a stable edge, or think of it as a persistent edge is an ongoing relationship between two parties that last for some period of time, more than a month. It's not just a one shot deal, one a done type of, "Oh guys shared it for a day, done." It sent you an FTP, it's done. No, it's got to have trajectory over time. Four weeks or six weeks or some period of time that's meaningful. And that metric is growing. Now I think sort of a different metric that they track. I think around 20% of Snowflake customers are actively sharing data today and then they track the number of those edge relationships that exist. So that's something that's unique. Because again, most data sharing is all about making copies of data. That's great for storage companies, it's bad for auditors, and it's bad for compliance officers. And that trend is just starting out, that middle S-curve, it's going to kind of hit the base of that steep part of the S-curve and it's going to have legs through this decade we think. And then finally the third wave that we show here is what we call super cloud. That's why I called it multi-cloud before, so it could invoke super cloud. The idea that you've built a PAS layer that is purpose built for a specific objective, and in this case it's building data apps that are cloud native, shareable and governed. And is a long-term trend that's going to take some time to develop. I mean, application development platforms can take five to 10 years to mature and gain significant adoption, but this one's unique. This is a critical play for Snowflake. If it's going to compete with the big cloud players, it has to have an app development framework like Snowpark. It has to accommodate new data types like transactional data. That's why it announced this thing called UniStore last June, Snowflake a summit. And the pattern that's forming here is Snowflake is building layer upon layer with its architecture at the core. It's not currently anyway, it's not going out and saying, "All right, we're going to buy a company that's got to another billion dollars in revenue and that's how we're going to get to 10 billion." So it's not buying its way into new markets through revenue. It's actually buying smaller companies that can complement Snowflake and that it can turn into revenue for growth that fit in to the data cloud. Now as to the 10 billion by fiscal year 28, is that achievable? That's the question. Yeah, I think so. Would the momentum resources go to market product and management prowess that Snowflake has? Yes, it's definitely achievable. And one could argue to $10 billion is too conservative. Indeed, Snowflake CFO, Mike Scarpelli will fully admit his forecaster built on existing offerings. He's not including revenue as I understand it from all the new stuff that's in the pipeline because he doesn't know what it's going to look like. He doesn't know what the adoption is going to look like. He doesn't have data on that adoption, not just yet anyway. And now of course things can change quite dramatically. It's possible that is forecast for existing businesses don't materialize or competition picks them off or a company like Databricks actually is able in the longer term replicate the functionality of Snowflake with open source technologies, which would be a very competitive source of innovation. But in our view, there's plenty of room for growth, the market is enormous and the real key is, can and will Snowflake deliver on the promises of simplifying data? Of course we've heard this before from data warehouse, the data mars and data legs and master data management and ETLs and data movers and data copiers and Hadoop and a raft of technologies that have not lived up to expectations. And we've also, by the way, seen some tremendous successes in the software business with the likes of ServiceNow and Salesforce. So will Snowflake be the next great software name and hit that 10 billion magic mark? I think so. Let's reconnect in 2028 and see. Okay, we'll leave it there today. I want to thank Chip Simonton for his input to today's episode. Thanks to Alex Myerson who's on production and manages the podcast. Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hove is our Editor in Chief over at Silicon Angle. He does some great editing for us. Check it out for all the news. Remember all these episodes are available as podcasts. Wherever you listen, just search Breaking Analysis podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch David.vallante@siliconangle.com. DM me @dvellante or comment on our LinkedIn post. And please do check out etr.ai, they've got the best survey data in the enterprise tech business. This is Dave Vellante for the CUBE Insights, powered by ETR. Thanks for watching, thanks for listening and we'll see you next time on breaking analysis. (upbeat music)

Published Date : Nov 10 2022

SUMMARY :

insights from the Cube and ETR. And the ability to have multiple

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Mike Scarpelli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
November 30th	DATE	0.99+
Ken Schiffman	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Chip Simonton	PERSON	0.99+
October, 2021	DATE	0.99+
Rob Hove	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Four weeks	QUANTITY	0.99+
July	DATE	0.99+
six weeks	QUANTITY	0.99+
10 billion	QUANTITY	0.99+
five	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Slootman	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
6%	QUANTITY	0.99+
80%	QUANTITY	0.99+
last year	DATE	0.99+
October	DATE	0.99+
Silicon Valley	LOCATION	0.99+
40%	QUANTITY	0.99+
1,400	QUANTITY	0.99+
$10 billion	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
April	DATE	0.99+
3%	QUANTITY	0.99+
77%	QUANTITY	0.99+
64%	QUANTITY	0.99+
60%	QUANTITY	0.99+
194 responses	QUANTITY	0.99+
Kristin Martin	PERSON	0.99+
two parties	QUANTITY	0.99+
51%	QUANTITY	0.99+
2%	QUANTITY	0.99+
Silicon Angle	ORGANIZATION	0.99+
fiscal year 28	DATE	0.99+
billion dollars	QUANTITY	0.99+
0%	QUANTITY	0.99+
Avamar	ORGANIZATION	0.99+
52%	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
2028	DATE	0.99+
Mongo	ORGANIZATION	0.99+
Data Domain	ORGANIZATION	0.99+
1%	QUANTITY	0.99+
late August	DATE	0.99+
two	QUANTITY	0.99+
three	QUANTITY	0.99+
fiscal year 2028	DATE	0.99+

Anais Dotis Georgiou, InfluxData | Evolving InfluxDB into the Smart Data Platform

>>Okay, we're back. I'm Dave Valante with The Cube and you're watching Evolving Influx DB into the smart data platform made possible by influx data. Anna East Otis Georgio is here. She's a developer advocate for influx data and we're gonna dig into the rationale and value contribution behind several open source technologies that Influx DB is leveraging to increase the granularity of time series analysis analysis and bring the world of data into realtime analytics. Anna is welcome to the program. Thanks for coming on. >>Hi, thank you so much. It's a pleasure to be here. >>Oh, you're very welcome. Okay, so IO X is being touted as this next gen open source core for Influx db. And my understanding is that it leverages in memory, of course for speed. It's a kilo store, so it gives you compression efficiency, it's gonna give you faster query speeds, it gonna use store files and object storages. So you got very cost effective approach. Are these the salient points on the platform? I know there are probably dozens of other features, but what are the high level value points that people should understand? >>Sure, that's a great question. So some of the main requirements that IOCs is trying to achieve and some of the most impressive ones to me, the first one is that it aims to have no limits on cardinality and also allow you to write any kind of event data that you want, whether that's lift tag or a field. It also wants to deliver the best in class performance on analytics queries. In addition to our already well served metrics queries, we also wanna have operator control over memory usage. So you should be able to define how much memory is used for buffering caching and query processing. Some other really important parts is the ability to have bulk data export and import, super useful. Also, broader ecosystem compatibility where possible we aim to use and embrace emerging standards in the data analytics ecosystem and have compatibility with things like sql, Python, and maybe even pandas in the future. >>Okay, so a lot there. Now we talked to Brian about how you're using Rust and and which is not a new programming language and of course we had some drama around Russ during the pandemic with the Mozilla layoffs, but the formation of the Russ Foundation really addressed any of those concerns. You got big guns like Amazon and Google and Microsoft throwing their collective weights behind it. It's really, adoption is really starting to get steep on the S-curve. So lots of platforms, lots of adoption with rust, but why rust as an alternative to say c plus plus for example? >>Sure, that's a great question. So Rust was chosen because of his exceptional performance and rebi reliability. So while rust is synt tactically similar to c c plus plus and it has similar performance, it also compiles to a native code like c plus plus. But unlike c plus plus, it also has much better memory safety. So memory safety is protection against bugs or security vulnerabilities that lead to excessive memory usage or memory leaks. And rust achieves this memory safety due to its like innovative type system. Additionally, it doesn't allow for dangling pointers and dangling pointers are the main classes of errors that lead to exploitable security vulnerabilities in languages like c plus plus. So Russ like helps meet that requirement of having no limits on card for example, because it's, we're also using the Russ implementation of Apache Arrow and this control over memory and also Russ, Russ Russ's packaging system called crates IO offers everything that you need out of the box to have features like AY and a weight to fixed race conditions to protect against buffering overflows and to ensure thread safe ay caching structures as well. So essentially it's just like has all the control, all the fine grain control, you need to take advantage of memory and all your resources as well as possible so that you can handle those really, really high ity use cases. >>Yeah, and the more I learned about the the new engine and the, and the platform IOCs et cetera, you know, you, you see things like, you know, the old days not even to even today you do a lot of garbage collection in these, in these systems and there's an inverse, you know, impact relative to performance. So it looks like you're really, you know, the community is modernizing the platform, but I wanna talk about Apache Arrow for a moment. It's designed to address the constraints that are associated with analyzing large data sets. We, we know that, but please explain why, what, what is Arrow and and what does it bring to Influx db? >>Sure, yeah. So Arrow is a, a framework for defining in memory calmer data and so much of the efficiency and performance of IOCs comes from taking advantage of calmer data structures. And I will, if you don't mind, take a moment to kind of illustrate why calmer data structures are so valuable. Let's pretend that we are gathering field data about the temperature in our room and also maybe the temperature of our stove. And in our table we have those two temperature values as well as maybe a measurement value, timestamp value, maybe some other tag values that describe what room and what house, et cetera we're getting this data from. And so you can picture this table where we have like two rows with the two temperature values for both our room and the stove. Well usually our room temperature is regulated so those values don't change very often. >>So when you have calm oriented st calm oriented storage, essentially you take each row, each column and group it together. And so if that's the case and you're just taking temperature values from the room and a lot of those temperature values are the same, then you'll, you might be able to imagine how equal values will then neighbor each other and when they neighbor each other in the storage format. This provides a really perfect opportunity for cheap compression. And then this cheap compression enables high cardinality use cases. It also enables for faster scan rates. So if you wanna define like the min and max value of the temperature in the room across a thousand different points, you only have to get those a thousand different points in order to answer that question and you have those immediately available to you. But let's contrast this with a row oriented storage solution instead so that we can understand better the benefits of calmer oriented storage. >>So if you had a row oriented storage, you'd first have to look at every field like the temperature in, in the room and the temperature of the stove. You'd have to go across every tag value that maybe describes where the room is located or what model the stove is. And every timestamp you'd then have to pluck out that one temperature value that you want at that one times stamp and do that for every single row. So you're scanning across a ton more data and that's why row oriented doesn't provide the same efficiency as calmer and Apache Arrow is in memory calmer data, calmer data fit framework. So that's where a lot of the advantages come >>From. Okay. So you've basically described like a traditional database, a row approach, but I've seen like a lot of traditional databases say, okay, now we've got, we can handle colo format versus what you're talking about is really, you know, kind of native it, is it not as effective as the, is the form not as effective because it's largely a, a bolt on? Can you, can you like elucidate on that front? >>Yeah, it's, it's not as effective because you have more expensive compression and because you can't scan across the values as quickly. And so those are, that's pretty much the main reasons why, why RO row oriented storage isn't as efficient as calm, calmer oriented storage. >>Yeah. Got it. So let's talk about Arrow data fusion. What is data fusion? I know it's written in rust, but what does it bring to to the table here? >>Sure. So it's an extensible query execution framework and it uses Arrow as its in memory format. So the way that it helps influx DB IOx is that okay, it's great if you can write unlimited amount of cardinality into influx cbis, but if you don't have a query engine that can successfully query that data, then I don't know how much value it is for you. So data fusion helps enable the, the query process and transformation of that data. It also has a PANDAS API so that you could take advantage of PDA's data frames as well and all of the machine learning tools associated with pandas. >>Okay. You're also leveraging par K in the platform course. We heard a lot about Par K in the middle of the last decade cuz as a storage format to improve on Hadoop column stores. What are you doing with Par K and why is it important? >>Sure. So Par K is the calm oriented durable file format. So it's important because it'll enable bulk import and bulk export. It has compatibility with Python and pandas so it supports a broader ecosystem. Parque files also take very little disc disc space and they're faster to scan because again they're column oriented in particular, I think PAR K files are like 16 times cheaper than CSV files, just as kind of a point of reference. And so that's essentially a lot of the, the benefits of par k. >>Got it. Very popular. So and these, what exactly is influx data focusing on as a committer to these projects? What is your focus? What's the value that you're bringing to the community? >>Sure. So Influx DB first has contributed a lot of different, different things to the Apache ecosystem. For example, they contribute an implementation of Apache Arrow and go and that will support clearing with flux. Also, there has been a quite a few contributions to data fusion for things like memory optimization and supportive additional SQL features like support for timestamp, arithmetic and support for exist clauses and support for memory control. So yeah, Influx has contributed a a lot to the Apache ecosystem and continues to do so. And I think kind of the idea here is that if you can improve these upstream projects and then the long term strategy here is that the more you contribute and build those up, then the more you will perpetuate that cycle of improvement and the more we will invest in our own project as well. So it's just that kind of symbiotic relationship and appreciation of the open source community. >>Yeah. Got it. You got that virtuous cycle going, the people call it the flywheel. Give us your last thoughts and kind of summarize, you know, where what, what the big takeaways are from your perspective. >>So I think the big takeaway is that influx data is doing a lot of really exciting things with Influx DB IOCs and I really encourage if you are interested in learning more about the technologies that Influx is leveraging to produce IOCs, the challenges associated with it and all of the hard work questions and I just wanna learn more, then I would encourage you to go to the monthly tech talks and community office hours and they are on every second Wednesday of the month at 8:30 AM Pacific time. There's also a community forums and a community Slack channel. Look for the influx D DB underscore IAC channel specifically to learn more about how to join those office hours and those monthly tech tech talks as well as ask any questions they have about IOCs, what to expect and what you'd like to learn more about. I as a developer advocate, I wanna answer your questions. So if there's a particular technology or stack that you wanna dive deeper into and want more explanation about how influx TB leverages it to build IOCs, I will be really excited to produce content on that topic for you. >>Yeah, that's awesome. You guys have a really rich community, collaborate with your peers, solve problems, and you guys super responsive, so really appreciate that. All right, thank you so much and East for explaining all this open source stuff to the audience and why it's important to the future of data. >>Thank you. I really appreciate it. >>All right, you're very welcome. Okay, stay right there and in a moment I'll be back with Tim Yokum. He's the director of engineering for Influx Data and we're gonna talk about how you update a SaaS engine while the plane is flying at 30,000 feet. You don't wanna miss this.

Published Date : Nov 8 2022

SUMMARY :

to increase the granularity of time series analysis analysis and bring the world of data Hi, thank you so much. So you got very cost effective approach. it aims to have no limits on cardinality and also allow you to write any kind of event data that So lots of platforms, lots of adoption with rust, but why rust as an all the fine grain control, you need to take advantage of even to even today you do a lot of garbage collection in these, in these systems and And so you can picture this table where we have like two rows with the two temperature values for order to answer that question and you have those immediately available to you. to pluck out that one temperature value that you want at that one times stamp and do that for every about is really, you know, kind of native it, is it not as effective as the, Yeah, it's, it's not as effective because you have more expensive compression and because So let's talk about Arrow data fusion. It also has a PANDAS API so that you could take advantage of What are you doing with So it's important What's the value that you're bringing to the community? here is that the more you contribute and build those up, then the kind of summarize, you know, where what, what the big takeaways are from your perspective. So if there's a particular technology or stack that you wanna dive deeper into and want and you guys super responsive, so really appreciate that. I really appreciate it. Influx Data and we're gonna talk about how you update a SaaS engine while

ENTITIES

Entity	Category	Confidence
Tim Yokum	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Anna	PERSON	0.99+
James Bellenger	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
16 times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Python	TITLE	0.99+
mobile.twitter.com	OTHER	0.99+
Influx Data	ORGANIZATION	0.99+
iOS	TITLE	0.99+
Twitter	ORGANIZATION	0.99+
30,000 feet	QUANTITY	0.99+
Russ Foundation	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Twitter Lite	TITLE	0.99+
two rows	QUANTITY	0.99+
200 megabyte	QUANTITY	0.99+
Node	TITLE	0.99+
Three months ago	DATE	0.99+
one application	QUANTITY	0.99+
both places	QUANTITY	0.99+
each row	QUANTITY	0.99+
Par K	TITLE	0.99+
Anais Dotis Georgiou	PERSON	0.99+
one language	QUANTITY	0.98+
first one	QUANTITY	0.98+
15 engineers	QUANTITY	0.98+
Anna East Otis Georgio	PERSON	0.98+
both	QUANTITY	0.98+
one second	QUANTITY	0.98+
25 engineers	QUANTITY	0.98+
About 800 people	QUANTITY	0.98+
sql	TITLE	0.98+
Node Summit 2017	EVENT	0.98+
two temperature values	QUANTITY	0.98+
one times	QUANTITY	0.98+
c plus plus	TITLE	0.97+
Rust	TITLE	0.96+
SQL	TITLE	0.96+
today	DATE	0.96+
Influx	ORGANIZATION	0.95+
under 600 kilobytes	QUANTITY	0.95+
first	QUANTITY	0.95+
c plus plus	TITLE	0.95+
Apache	ORGANIZATION	0.95+
par K	TITLE	0.94+
React	TITLE	0.94+
Russ	ORGANIZATION	0.94+
About three months ago	DATE	0.93+
8:30 AM Pacific time	DATE	0.93+
twitter.com	OTHER	0.93+
last decade	DATE	0.93+
Node	ORGANIZATION	0.92+
Hadoop	TITLE	0.9+
InfluxData	ORGANIZATION	0.89+
c c plus plus	TITLE	0.89+
Cube	ORGANIZATION	0.89+
each column	QUANTITY	0.88+
InfluxDB	TITLE	0.86+
Influx DB	TITLE	0.86+
Mozilla	ORGANIZATION	0.86+
DB IOx	TITLE	0.85+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for python: