Ed Walsh & Thomas Hazel | A New Database Architecture for Supercloud

(bright music) >> Hi, everybody, this is Dave Vellante, welcome back to Supercloud 2. Last August, at the first Supercloud event, we invited the broader community to help further define Supercloud, we assessed its viability, and identified the critical elements and deployment models of the concept. The objectives here at Supercloud too are, first of all, to continue to tighten and test the concept, the second is, we want to get real world input from practitioners on the problems that they're facing and the viability of Supercloud in terms of applying it to their business. So on the program, we got companies like Walmart, Sachs, Western Union, Ionis Pharmaceuticals, NASDAQ, and others. And the third thing that we want to do is we want to drill into the intersection of cloud and data to project what the future looks like in the context of Supercloud. So in this segment, we want to explore the concept of data architectures and what's going to be required for Supercloud. And I'm pleased to welcome one of our Supercloud sponsors, ChaosSearch, Ed Walsh is the CEO of the company, with Thomas Hazel, who's the Founder, CTO, and Chief Scientist. Guys, good to see you again, thanks for coming into our Marlborough studio. >> Always great. >> Great to be here. >> Okay, so there's a little debate, I'm going to put you right in the spot. (Ed chuckling) A little debate going on in the community started by Bob Muglia, a former CEO of Snowflake, and he was at Microsoft for a long time, and he looked at the Supercloud definition, said, "I think you need to tighten it up a little bit." So, here's what he came up with. He said, "A Supercloud is a platform that provides a programmatically consistent set of services hosted on heterogeneous cloud providers." So he's calling it a platform, not an architecture, which was kind of interesting. And so presumably the platform owner is going to be responsible for the architecture, but Dr. Nelu Mihai, who's a computer scientist behind the Cloud of Clouds Project, he chimed in and responded with the following. He said, "Cloud is a programming paradigm supporting the entire lifecycle of applications with data and logic natively distributed. Supercloud is an open architecture that integrates heterogeneous clouds in an agnostic manner." So, Ed, words matter. Is this an architecture or is it a platform? >> Put us on the spot. So, I'm sure you have concepts, I would say it's an architectural or design principle. Listen, I look at Supercloud as a mega trend, just like cloud, just like data analytics. And some companies are using the principle, design principles, to literally get dramatically ahead of everyone else. I mean, things you couldn't possibly do if you didn't use cloud principles, right? So I think it's a Supercloud effect, you're able to do things you're not able to. So I think it's more a design principle, but if you do it right, you get dramatic effect as far as customer value. >> So the conversation that we were having with Muglia, and Tristan Handy of dbt Labs, was, I'll set it up as the following, and, Thomas, would love to get your thoughts, if you have a CRM, think about applications today, it's all about forms and codifying business processes, you type a bunch of stuff into Salesforce, and all the salespeople do it, and this machine generates a forecast. What if you have this new type of data app that pulls data from the transaction system, the e-commerce, the supply chain, the partner ecosystem, et cetera, and then, without humans, actually comes up with a plan. That's their vision. And Muglia was saying, in order to do that, you need to rethink data architectures and database architectures specifically, you need to get down to the level of how the data is stored on the disc. What are your thoughts on that? Well, first of all, I'm going to cop out, I think it's actually both. I do think it's a design principle, I think it's not open technology, but open APIs, open access, and you can build a platform on that design principle architecture. Now, I'm a database person, I love solving the database problems. >> I'm waited for you to launch into this. >> Yeah, so I mean, you know, Snowflake is a database, right? It's a distributed database. And we wanted to crack those codes, because, multi-region, multi-cloud, customers wanted access to their data, and their data is in a variety of forms, all these services that you're talked about. And so what I saw as a core principle was cloud object storage, everyone streams their data to cloud object storage. From there we said, well, how about we rethink database architecture, rethink file format, so that we can take each one of these services and bring them together, whether distributively or centrally, such that customers can access and get answers, whether it's operational data, whether it's business data, AKA search, or SQL, complex distributed joins. But we had to rethink the architecture. I like to say we're not a first generation, or a second, we're a third generation distributed database on pure, pure cloud storage, no caching, no SSDs. Why? Because all that availability, the cost of time, is a struggle, and cloud object storage, we think, is the answer. >> So when you're saying no caching, so when I think about how companies are solving some, you know, pretty hairy problems, take MySQL Heatwave, everybody thought Oracle was going to just forget about MySQL, well, they come out with Heatwave. And the way they solve problems, and you see their benchmarks against Amazon, "Oh, we crush everybody," is they put it all in memory. So you said no caching? You're not getting performance through caching? How is that true, and how are you getting performance? >> Well, so five, six years ago, right? When you realize that cloud object storage is going to be everywhere, and it's going to be a core foundational, if you will, fabric, what would you do? Well, a lot of times the second generation say, "We'll take it out of cloud storage, put in SSDs or something, and put into cache." And that adds a lot of time, adds a lot of costs. But I said, what if, what if we could actually make the first read hot, the first read distributed joins and searching? And so what we went out to do was said, we can't cache, because that's adds time, that adds cost. We have to make cloud object storage high performance, like it feels like a caching SSD. That's where our patents are, that's where our technology is, and we've spent many years working towards this. So, to me, if you can crack that code, a lot of these issues we're talking about, multi-region, multicloud, different services, everybody wants to send their data to the data lake, but then they move it out, we said, "Keep it right there." >> You nailed it, the data gravity. So, Bob's right, the data's coming in, and you need to get the data from everywhere, but you need an environment that you can deal with all that different schema, all the different type of technology, but also at scale. Bob's right, you cannot use memory or SSDs to cache that, that doesn't scale, it doesn't scale cost effectively. But if you could, and what you did, is you made object storage, S3 first, but object storage, the only persistence by doing that. And then we get performance, we should talk about it, it's literally, you know, hundreds of terabytes of queries, and it's done in seconds, it's done without memory caching. We have concepts of caching, but the only caching, the only persistence, is actually when we're doing caching, we're just keeping another side-eye track of things on the S3 itself. So we're using, actually, the object storage to be a database, which is kind of where Bob was saying, we agree, but that's what you started at, people thought you were crazy. >> And maybe make it live. Don't think of it as archival or temporary space, make it live, real time streaming, operational data. What we do is make it smart, we see the data coming in, we uniquely index it such that you can get your use cases, that are search, observability, security, or backend operational. But we don't have to have this, I dunno, static, fixed, siloed type of architecture technologies that were traditionally built prior to Supercloud thinking. >> And you don't have to move everything, essentially, you can do it wherever the data lands, whatever cloud across the globe, you're able to bring it together, you get the cost effectiveness, because the only persistence is the cheapest storage persistent layer you can buy. But the key thing is you cracked the code. >> We had to crack the code, right? That was the key thing. >> That's where the plans are. >> And then once you do that, then everything else gets easier to scale, your architecture, across regions, across cloud. >> Now, it's a general purpose database, as Bob was saying, but we use that database to solve a particular issue, which is around operational data, right? So, we agree with Bob's. >> Interesting. So this brings me to this concept of data, Jimata Gan is one of our speakers, you know, we talk about data fabric, which is a NetApp, originally NetApp concept, Gartner's kind of co-opted it. But so, the basic concept is, data lives everywhere, whether it's an S3 bucket, or a SQL database, or a data lake, it's just a node on the data mesh. So in your view, how does this fit in with Supercloud? Ed, you've said that you've built, essentially, an enabler for that, for the data mesh, I think you're an enabler for the Supercloud-like principles. This is a big, chewy opportunity, and it requires, you know, a team approach. There's got to be an ecosystem, there's not going to be one Supercloud to rule them all, so where does the ecosystem fit into the discussion, and where do you fit into the ecosystem? >> Right, so we agree completely, there's not one Supercloud in effect, but we use Supercloud principles to build our platform, and then, you know, the ecosystem's going to be built on leveraging what everyone else's secret powers are, right? So our power, our superpower, based upon what we built is, we deal with, if you're having any scale, or cost effective scale issues, with data, machine generated data, like business observability or security data, we are your force multiplier, we will take that in singularly, just let it, simply put it in your object storage wherever it sits, and we give you uniformity access to that using OpenAPI access, SQL, or you know, Elasticsearch API. So, that's what we do, that's our superpower. So I'll play it into data mesh, that's a perfect, we are a node on a data mesh, but I'll play it in the soup about how, the ecosystem, we see it kind of playing, and we talked about it in just in the last couple days, how we see this kind of possibly. Short term, our superpowers, we deal with this data that's coming at these environments, people, customers, building out observability or security environments, or vendors that are selling their own Supercloud, I do observability, the Datadogs of the world, dot dot dot, the Splunks of the world, dot dot dot, and security. So what we do is we fit in naturally. What we do is a cost effective scale, just land it anywhere in the world, we deal with ingest, and it's a cost effective, an order of magnitude, or two or three order magnitudes more cost effective. Allows them, their customers are asking them to do the impossible, "Give me fast monitoring alerting. I want it snappy, but I want it to keep two years of data, (laughs) and I want it cost effective." It doesn't work. They're good at the fast monitoring alerting, we're good at the long-term retention. And yet there's some gray area between those two, but one to one is actually cheaper, so we would partner. So the first ecosystem plays, who wants to have the ability to, really, all the data's in those same environments, the security observability players, they can literally, just through API, drag our data into their point to grab. We can make it seamless for customers. Right now, we make it helpful to customers. Your Datadog, we make a button, easy go from Datadog to us for logs, save you money. Same thing with Grafana. But you can also look at ecosystem, those same vendors, it used to be a year ago it was, you know, its all about how can you grow, like it's growth at all costs, now it's about cogs. So literally we can go an environment, you supply what your customer wants, but we can help with cogs. And one-on one in a partnership is better than you trying to build on your own. >> Thomas, you were saying you make the first read fast, so you think about Snowflake. Everybody wants to talk about Snowflake and Databricks. So, Snowflake, great, but you got to get the data in there. All right, so that's, can you help with that problem? >> I mean we want simple in, right? And if you have to have structure in, you're not simple. So the idea that you have a simple in, data lake, schema read type philosophy, but schema right type performance. And so what I wanted to do, what we have done, is have that simple lake, and stream that data real time, and those access points of Search or SQL, to go after whatever business case you need, security observability, warehouse integration. But the key thing is, how do I make that click, click, click answer, and do it quickly? And so what we want to do is, that first read has to be fast. Why? 'Cause then you're going to do all this siloing, layers, complexity. If your first read's not fast, you're at a disadvantage, particularly in cost. And nobody says I want less data, but everyone has to, whether they say we're going to shorten the window, we're going to use AI to choose, but in a security moment, when you don't have that answer, you're in trouble. And that's why we are this service, this Supercloud service, if you will, providing access, well-known search, well-known SQL type access, that if you just have one access point, you're at a disadvantage. >> We actually talked about Snowflake and BigQuery, and a different platform, Data Bricks. That's kind of where we see the phase two of ecosystem. One is easy, the low-hanging fruit is observability and security firms. But the next one is, what we do, our super power is dealing with this messy data that schema is changing like night and day. Pipelines are tough, and it's changing all the time, but you want these things fast, and it's big data around the world. That's the next point, just use us alongside, or inside, one of their platforms, and now we get the best of both worlds. Our superpower is keeping this messy data as a streaming, okay, not a batch thing, allow you to do that. So, that's the second one. And then to be honest, the third one, which plays you to Supercloud, it also plays perfectly in the data mesh, is if you really go to the ultimate thing, what we have done is made object storage, S3, GCS, and blob storage, we made it a database. Put, get, complex query with big joins. You know, so back to your original thing, and Muglia teed it up perfectly, we've done that. Now imagine if that's an ecosystem, who would want that? If it's, again, it's uniform available across all the regions, across all the clouds, and it's right next to where you are building a service, or a client's trying, that's where the ecosystem, I think people are going to use Superclouds for their superpowers. We're really good at this, allows that short term. I think the Snowflakes and the Data Bricks are the medium term, you know? And then I think eventually gets to, hey, listen if you can make object storage fast, you can just go after it with simple SQL queries, or elastic. Who would want that? I think that's where people are going to leverage it. It's not going to be one Supercloud, and we leverage the super clouds. >> Our viewpoint is smart object storage can be programmable, and so we agree with Bob, but we're not saying do it here, do it here. This core, fundamental layer across regions, across clouds, that everyone has? Simple in. Right now, it's hard to get data in for access for analysis. So we said, simply, we'll automate the entire process, give you API access across regions, across clouds. And again, how do you do a distributed join that's fast? How do you do a distributed join that doesn't cost you an arm or a leg? And how do you do it at scale? And that's where we've been focused. >> So prior, the cloud object store was a niche. >> Yeah. >> S3 obviously changed that. How standard is, essentially, object store across the different cloud platforms? Is that a problem for you? Is that an easy thing to solve? >> Well, let's talk about it. I mean we've fundamentally, yeah we've extracted it, but fundamentally, cloud object storage, put, get, and list. That's why it's so scalable, 'cause it doesn't have all these other components. That complexity is where we have moved up, and provide direct analytical API access. So because of its simplicity, and costs, and security, and reliability, it can scale naturally. I mean, really, distributed object storage is easy, it's put-get anywhere, now what we've done is we put a layer of intelligence, you know, call it smart object storage, where access is simple. So whether it's multi-region, do a query across, or multicloud, do a query across, or hunting, searching. >> We've had clients doing Amazon and Google, we have some Azure, but we see Amazon and Google more, and it's a consistent service across all of them. Just literally put your data in the bucket of choice, or folder of choice, click a couple buttons, literally click that to say "that's hot," and after that, it's hot, you can see it. But we're not moving data, the data gravity issue, that's the other. That it's already natively flowing to these pools of object storage across different regions and clouds. We don't move it, we index it right there, we're spinning up stateless compute, back to the Supercloud concept. But now that allows us to do all these other things, right? >> And it's no longer just cheap and deep object storage. Right? >> Yeah, we make it the same, like you have an analytic platform regardless of where you're at, you don't have to worry about that. Yeah, we deal with that, we deal with a stateless compute coming up -- >> And make it programmable. Be able to say, "I want this bucket to provide these answers." Right, that's really the hope, the vision. And the complexity to build the entire stack, and then connect them together, we said, the fabric is cloud storage, we just provide the intelligence on top. >> Let's bring it back to the customers, and one of the things we're exploring in Supercloud too is, you know, is Supercloud a solution looking for a problem? Is a multicloud really a problem? I mean, you hear, you know, a lot of the vendor marketing says, "Oh, it's a disaster, because it's all different across the clouds." And I talked to a lot of customers even as part of Supercloud too, they're like, "Well, I solved that problem by just going mono cloud." Well, but then you're not able to take advantage of a lot of the capabilities and the primitives that, you know, like Google's data, or you like Microsoft's simplicity, their RPA, whatever it is. So what are customers telling you, what are their near term problems that they're trying to solve today, and how are they thinking about the future? >> Listen, it's a real problem. I think it started, I think this is a a mega trend, just like cloud. Just, cloud data, and I always add, analytics, are the mega trends. If you're looking at those, if you're not considering using the Supercloud principles, in other words, leveraging what I have, abstracting it out, and getting the most out of that, and then build value on top, I think you're not going to be able to keep up, In fact, no way you're going to keep up with this data volume. It's a geometric challenge, and you're trying to do linear things. So clients aren't necessarily asking, hey, for Supercloud, but they're really saying, I need to have a better mechanism to simplify this and get value across it, and how do you abstract that out to do that? And that's where they're obviously, our conversations are more amazed what we're able to do, and what they're able to do with our platform, because if you think of what we've done, the S3, or GCS, or object storage, is they can't imagine the ingest, they can't imagine how easy, time to glass, one minute, no matter where it lands in the world, querying this in seconds for hundreds of terabytes squared. People are amazed, but that's kind of, so they're not asking for that, but they are amazed. And then when you start talking on it, if you're an enterprise person, you're building a big cloud data platform, or doing data or analytics, if you're not trying to leverage the public clouds, and somehow leverage all of them, and then build on top, then I think you're missing it. So they might not be asking for it, but they're doing it. >> And they're looking for a lens, you mentioned all these different services, how do I bring those together quickly? You know, our viewpoint, our service, is I have all these streams of data, create a lens where they want to go after it via search, go after via SQL, bring them together instantly, no e-tailing out, no define this table, put into this database. We said, let's have a service that creates a lens across all these streams, and then make those connections. I want to take my CRM with my Google AdWords, and maybe my Salesforce, how do I do analysis? Maybe I want to hunt first, maybe I want to join, maybe I want to add another stream to it. And so our viewpoint is, it's so natural to get into these lake platforms and then provide lenses to get that access. >> And they don't want it separate, they don't want something different here, and different there. They want it basically -- >> So this is our industry, right? If something new comes out, remember virtualization came out, "Oh my God, this is so great, it's going to solve all these problems." And all of a sudden it just got to be this big, more complex thing. Same thing with cloud, you know? It started out with S3, and then EC2, and now hundreds and hundreds of different services. So, it's a complex matter for a lot of people, and this creates problems for customers, especially when you got divisions that are using different clouds, and you're saying that the solution, or a solution for the part of the problem, is to really allow the data to stay in place on S3, use that standard, super simple, but then give it what, Ed, you've called superpower a couple of times, to make it fast, make it inexpensive, and allow you to do that across clouds. >> Yeah, yeah. >> I'll give you guys the last word on that. >> No, listen, I think, we think Supercloud allows you to do a lot more. And for us, data, everyone says more data, more problems, more budget issue, everyone knows more data is better, and we show you how to do it cost effectively at scale. And we couldn't have done it without the design principles of we're leveraging the Supercloud to get capabilities, and because we use super, just the object storage, we're able to get these capabilities of ingest, scale, cost effectiveness, and then we built on top of this. In the end, a database is a data platform that allows you to go after everything distributed, and to get one platform for analytics, no matter where it lands, that's where we think the Supercloud concepts are perfect, that's where our clients are seeing it, and we're kind of excited about it. >> Yeah a third generation database, Supercloud database, however we want to phrase it, and make it simple, but provide the value, and make it instant. >> Guys, thanks so much for coming into the studio today, I really thank you for your support of theCUBE, and theCUBE community, it allows us to provide events like this and free content. I really appreciate it. >> Oh, thank you. >> Thank you. >> All right, this is Dave Vellante for John Furrier in theCUBE community, thanks for being with us today. You're watching Supercloud 2, keep it right there for more thought provoking discussions around the future of cloud and data. (bright music)

Published Date : Feb 17 2023

SUMMARY :

And the third thing that we want to do I'm going to put you right but if you do it right, So the conversation that we were having I like to say we're not a and you see their So, to me, if you can crack that code, and you need to get the you can get your use cases, But the key thing is you cracked the code. We had to crack the code, right? And then once you do that, So, we agree with Bob's. and where do you fit into the ecosystem? and we give you uniformity access to that so you think about Snowflake. So the idea that you have are the medium term, you know? and so we agree with Bob, So prior, the cloud that an easy thing to solve? you know, call it smart object storage, and after that, it's hot, you can see it. And it's no longer just you don't have to worry about And the complexity to and one of the things we're and how do you abstract it's so natural to get and different there. and allow you to do that across clouds. I'll give you guys and we show you how to do it but provide the value, I really thank you for around the future of cloud and data.

ENTITIES

Entity	Category	Confidence
Walmart	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Bob Muglia	PERSON	0.99+
Thomas	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Ionis Pharmaceuticals	ORGANIZATION	0.99+
Western Union	ORGANIZATION	0.99+
Ed Walsh	PERSON	0.99+
Bob	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Nelu Mihai	PERSON	0.99+
Sachs	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
two	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Supercloud 2	TITLE	0.99+
first	QUANTITY	0.99+
Last August	DATE	0.99+
three	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
both	QUANTITY	0.99+
dbt Labs	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Ed	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Jimata Gan	PERSON	0.99+
third one	QUANTITY	0.99+
one minute	QUANTITY	0.99+
second	QUANTITY	0.99+
first generation	QUANTITY	0.99+
third generation	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
second generation	QUANTITY	0.99+
second one	QUANTITY	0.99+
hundreds of terabytes	QUANTITY	0.98+
SQL	TITLE	0.98+
five	DATE	0.98+
one	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
a year ago	DATE	0.98+
ChaosSearch	ORGANIZATION	0.98+
Muglia	PERSON	0.98+
MySQL	TITLE	0.98+
both worlds	QUANTITY	0.98+
third thing	QUANTITY	0.97+
Marlborough	LOCATION	0.97+
theCUBE	ORGANIZATION	0.97+
today	DATE	0.97+
Supercloud	ORGANIZATION	0.97+
Elasticsearch	TITLE	0.96+
NetApp	TITLE	0.96+
Datadog	ORGANIZATION	0.96+
One	QUANTITY	0.96+
EC2	TITLE	0.96+
each one	QUANTITY	0.96+
S3	TITLE	0.96+
one platform	QUANTITY	0.95+
Supercloud 2	EVENT	0.95+
first read	QUANTITY	0.95+
six years ago	DATE	0.95+

Evan Kaplan, InfluxData | AWS re:invent 2022

>>Hey everyone. Welcome to Las Vegas. The Cube is here, live at the Venetian Expo Center for AWS Reinvent 2022. Amazing attendance. This is day one of our coverage. Lisa Martin here with Day Ante. David is great to see so many people back. We're gonna be talk, we've been having great conversations already. We have a wall to wall coverage for the next three and a half days. When we talk to companies, customers, every company has to be a data company. And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, no longer a nice to have that is a differentiator and a competitive all >>About data. I mean, you know, I love the topic and it's, it's got so many dimensions and such texture, can't get enough of data. >>I know we have a great guest joining us. One of our alumni is back, Evan Kaplan, the CEO of Influx Data. Evan, thank you so much for joining us. Welcome back to the Cube. >>Thanks for having me. It's great to be here. So here >>We are, day one. I was telling you before we went live, we're nice and fresh hosts. Talk to us about what's new at Influxed since the last time we saw you at Reinvent. >>That's great. So first of all, we should acknowledge what's going on here. This is pretty exciting. Yeah, that does really feel like, I know there was a show last year, but this feels like the first post Covid shows a lot of energy, a lot of attention despite a difficult economy. In terms of, you know, you guys were commenting in the lead into Big data. I think, you know, if we were to talk about Big Data five, six years ago, what would we be talking about? We'd been talking about Hadoop, we were talking about Cloudera, we were talking about Hortonworks, we were talking about Big Data Lakes, data stores. I think what's happened is, is this this interesting dynamic of, let's call it if you will, the, the secularization of data in which it breaks into different fields, different, almost a taxonomy. You've got this set of search data, you've got this observability data, you've got graph data, you've got document data and what you're seeing in the market and now you have time series data. >>And what you're seeing in the market is this incredible capability by developers as well and mostly open source dynamic driving this, this incredible capability of developers to assemble data platforms that aren't unicellular, that aren't just built on Hado or Oracle or Postgres or MySQL, but in fact represent different data types. So for us, what we care about his time series, we care about anything that happens in time, where time can be the primary measurement, which if you think about it, is a huge proportion of real data. Cuz when you think about what drives ai, you think about what happened, what happened, what happened, what happened, what's going to happen. That's the functional thing. But what happened is always defined by a period, a measurement, a time. And so what's new for us is we've developed this new open source engine called IOx. And so it's basically a refresh of the whole database, a kilo database that uses Apache Arrow, par K and data fusion and turns it into a super powerful real time analytics platform. It was already pretty real time before, but it's increasingly now and it adds SQL capability and infinite cardinality. And so it handles bigger data sets, but importantly, not just bigger but faster, faster data. So that's primarily what we're talking about to show. >>So how does that affect where you can play in the marketplace? Is it, I mean, how does it affect your total available market? Your great question. Your, your customer opportunities. >>I think it's, it's really an interesting market in that you've got all of these different approaches to database. Whether you take data warehouses from Snowflake or, or arguably data bricks also. And you take these individual database companies like Mongo Influx, Neo Forge, elastic, and people like that. I think the commonality you see across the volume is, is many of 'em, if not all of them, are based on some sort of open source dynamic. So I think that is an in an untractable trend that will continue for on. But in terms of the broader, the broader database market, our total expand, total available tam, lots of these things are coming together in interesting ways. And so the, the, the wave that will ride that we wanna ride, because it's all big data and it's all increasingly fast data and it's all machine learning and AI is really around that measurement issue. That instrumentation the idea that if you're gonna build any sophisticated system, it starts with instrumentation and the journey is defined by instrumentation. So we view ourselves as that instrumentation tooling for understanding complex systems. And how, >>I have to follow quick follow up. Why did you say arguably data bricks? I mean open source ethos? >>Well, I was saying arguably data bricks cuz Spark, I mean it's a great company and it's based on Spark, but there's quite a gap between Spark and what Data Bricks is today. And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot like a really sophisticated data warehouse with a lot of post-processing capabilities >>And, and with an open source less >>Than a >>Core database. Yeah. Right, right, right. Yeah, I totally agree. Okay, thank you for that >>Part that that was not arguably like they're, they're not a good company or >>No, no. They got great momentum and I'm just curious. Absolutely. You know, so, >>So talk a little bit about IOx and, and what it is enabling you guys to achieve from a competitive advantage perspective. The key differentiators give us that scoop. >>So if you think about, so our old storage engine was called tsm, also open sourced, right? And IOx is open sourced and the old storage engine was really built around this time series measurements, particularly metrics, lots of metrics and handling those at scale and making it super easy for developers to use. But, but our old data engine only supported either a custom graphical UI that you'd build yourself on top of it or a dashboarding tool like Grafana or Chronograph or things like that. With IOCs. Two or three interventions were important. One is we now support, we'll support things like Tableau, Microsoft, bi, and so you're taking that same data that was available for instrumentation and now you're using it for business intelligence also. So that became super important and it kind of answers your question about the expanded market expands the market. The second thing is, when you're dealing with time series data, you're dealing with this concept of cardinality, which is, and I don't know if you're familiar with it, but the idea that that it's a multiplication of measurements in a table. And so the more measurements you want over the more series you have, you have this really expanding exponential set that can choke a database off. And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to think about that design point of view. And then lastly, it's just query performance is dramatically better. And so it's pretty exciting. >>So the unlimited cardinality, basically you could identify relationships between data and different databases. Is that right? Between >>The same database but different measurements, different tables, yeah. Yeah. Right. Yeah, yeah. So you can handle, so you could say, I wanna look at the way, the way the noise levels are performed in this room according to 400 different locations on 25 different days, over seven months of the year. And that each one is a measurement. Each one adds to cardinality. And you can say, I wanna search on Tuesdays in December, what the noise level is at 2:21 PM and you get a very quick response. That kind of instrumentation is critical to smarter systems. How are >>You able to process that data at at, in a performance level that doesn't bring the database to its knees? What's the secret sauce behind that? >>It's AUM database. It's built on Parque and Apache Arrow. But it's, but to say it's nice to say without a much longer conversation, it's an architecture that's really built for pulling that kind of data. If you know the data is time series and you're looking for a time measurement, you already have the ability to optimize pretty dramatically. >>So it's, it's that purpose built aspect of it. It's the >>Purpose built aspect. You couldn't take Postgres and do the same >>Thing. Right? Because a lot of vendors say, oh yeah, we have time series now. Yeah. Right. So yeah. Yeah. Right. >>And they >>Do. Yeah. But >>It's not, it's not, the founding of the company came because Paul Dicks was working on Wall Street building time series databases on H base, on MyQ, on other platforms and realize every time we do it, we have to rewrite the code. We build a bunch of application logic to handle all these. We're talking about, we have customers that are adding hundreds of millions to billions of points a second. So you're talking about an ingest level. You know, you think about all those data points, you're talking about ingest level that just doesn't, you know, it just databases aren't designed for that. Right? And so it's not just us, our competitors also build good time series databases. And so the category is really emergent. Yeah, >>Sure. Talk about a favorite customer story they think really articulates the value of what Influx is doing, especially with IOx. >>Yeah, sure. And I love this, I love this story because you know, Tesla may not be in favor because of the latest Elon Musker aids, but, but, but so we've had about a four year relationship with Tesla where they built their power wall technology around recording that, seeing your device, seeing the stuff, seeing the charging on your car. It's all captured in influx databases that are reporting from power walls and mega power packs all over the world. And they report to a central place at, at, at Tesla's headquarters and it reports out to your phone and so you can see it. And what's really cool about this to me is I've got two Tesla cars and I've got a Tesla solar roof tiles. So I watch this date all the time. So it's a great customer story. And actually if you go on our website, you can see I did an hour interview with the engineer that designed the system cuz the system is super impressive and I just think it's really cool. Plus it's, you know, it's all the good green stuff that we really appreciate supporting sustainability, right? Yeah. >>Right, right. Talk about from a, what's in it for me as a customer, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers like Tesla, like other industry customers as well? >>Well, so it's relatively new. It just arrived in our cloud product. So Tesla's not using it today. We have a first set of customers starting to use it. We, the, it's in open source. So it's a very popular project in the open source world. But the key issues are, are really the stuff that we've kind of covered here, which is that a broad SQL environment. So accessing all those SQL developers, the same people who code against Snowflake's data warehouse or data bricks or Postgres, can now can code that data against influx, open up the BI market. It's the cardinality, it's the performance. It's really an architecture. It's the next gen. We've been doing this for six years, it's the next generation of everything. We've seen how you make time series be super performing. And that's only relevant because more and more things are becoming real time as we develop smarter and smarter systems. The journey is pretty clear. You instrument the system, you, you let it run, you watch for anomalies, you correct those anomalies, you re instrument the system. You do that 4 billion times, you have a self-driving car, you do that 55 times, you have a better podcast that is, that is handling its audio better, right? So everything is on that journey of getting smarter and smarter. So >>You guys, you guys the big committers to IOCs, right? Yes. And how, talk about how you support the, develop the surrounding developer community, how you get that flywheel effect going >>First. I mean it's actually actually a really kind of, let's call it, it's more art than science. Yeah. First of all, you you, you come up with an architecture that really resonates for developers. And Paul Ds our founder, really is a developer's developer. And so he started talking about this in the community about an architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file formats that uses Apache Arrow for directing queries and things like that and uses data fusion and said what this thing needs is a Columbia database that sits behind all of this stuff and integrates it. And he started talking about it two years ago and then he started publishing in IOCs that commits in the, in GitHub commits. And slowly, but over time in Hacker News and other, and other people go, oh yeah, this is fundamentally right. >>It addresses the problems that people have with things like click cows or plain databases or Coast and they go, okay, this is the right architecture at the right time. Not different than original influx, not different than what Elastic hit on, not different than what Confluent with Kafka hit on and their time is you build an audience of people who are committed to understanding this kind of stuff and they become committers and they become the core. Yeah. And you build out from it. And so super. And so we chose to have an MIT open source license. Yeah. It's not some secondary license competitors can use it and, and competitors can use it against us. Yeah. >>One of the things I know that Influx data talks about is the time to awesome, which I love that, but what does that mean? What is the time to Awesome. Yeah. For developer, >>It comes from that original story where, where Paul would have to write six months of application logic and stuff to build a time series based applications. And so Paul's notion was, and this was based on the original Mongo, which was very successful because it was very easy to use relative to most databases. So Paul developed this commitment, this idea that I quickly joined on, which was, hey, it should be relatively quickly for a developer to build something of import to solve a problem, it should be able to happen very quickly. So it's got a schemaless background so you don't have to know the schema beforehand. It does some things that make it really easy to feel powerful as a developer quickly. And if you think about that journey, if you feel powerful with a tool quickly, then you'll go deeper and deeper and deeper and pretty soon you're taking that tool with you wherever you go, it becomes the tool of choice as you go to that next job or you go to that next application. And so that's a fundamental way we think about it. To be honest with you, we haven't always delivered perfectly on that. It's generally in our dna. So we do pretty well, but I always feel like we can do better. >>So if you were to put a bumper sticker on one of your Teslas about influx data, what would it >>Say? By the way, I'm not rich. It just happened to be that we have two Teslas and we have for a while, we just committed to that. The, the, so ask the question again. Sorry. >>Bumper sticker on influx data. What would it say? How, how would I >>Understand it be time to Awesome. It would be that that phrase his time to Awesome. Right. >>Love that. >>Yeah, I'd love it. >>Excellent time to. Awesome. Evan, thank you so much for joining David, the >>Program. It's really fun. Great thing >>On Evan. Great to, you're on. Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really transform their businesses, which is all about business transformation these days. We appreciate your insights. >>That's great. Thank >>You for our guest and Dave Ante. I'm Lisa Martin, you're watching The Cube, the leader in emerging and enterprise tech coverage. We'll be right back with our next guest.

Published Date : Nov 29 2022

SUMMARY :

And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, I mean, you know, I love the topic and it's, it's got so many dimensions and such Evan, thank you so much for joining us. It's great to be here. Influxed since the last time we saw you at Reinvent. terms of, you know, you guys were commenting in the lead into Big data. And so it's basically a refresh of the whole database, a kilo database that uses So how does that affect where you can play in the marketplace? And you take these individual database companies like Mongo Influx, Why did you say arguably data bricks? And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot Okay, thank you for that You know, so, So talk a little bit about IOx and, and what it is enabling you guys to achieve from a And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to So the unlimited cardinality, basically you could identify relationships between data And you can say, time measurement, you already have the ability to optimize pretty dramatically. So it's, it's that purpose built aspect of it. You couldn't take Postgres and do the same So yeah. And so the category is really emergent. especially with IOx. And I love this, I love this story because you know, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers you have a self-driving car, you do that 55 times, you have a better podcast that And how, talk about how you support architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file And you build out from it. One of the things I know that Influx data talks about is the time to awesome, which I love that, So it's got a schemaless background so you don't have to know the schema beforehand. It just happened to be that we have two Teslas and we have for a while, What would it say? Understand it be time to Awesome. Evan, thank you so much for joining David, the Great thing Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really That's great. You for our guest and Dave Ante.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Evan Kaplan	PERSON	0.99+
six months	QUANTITY	0.99+
Evan	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
Paul	PERSON	0.99+
55 times	QUANTITY	0.99+
two	QUANTITY	0.99+
2:21 PM	DATE	0.99+
Las Vegas	LOCATION	0.99+
Dave Ante	PERSON	0.99+
Paul Dicks	PERSON	0.99+
six years	QUANTITY	0.99+
last year	DATE	0.99+
hundreds of millions	QUANTITY	0.99+
Mongo Influx	ORGANIZATION	0.99+
4 billion times	QUANTITY	0.99+
Two	QUANTITY	0.99+
December	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Influxed	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Influx	ORGANIZATION	0.99+
IOx	TITLE	0.99+
MySQL	TITLE	0.99+
three	QUANTITY	0.99+
Tuesdays	DATE	0.99+
each one	QUANTITY	0.98+
400 different locations	QUANTITY	0.98+
25 different days	QUANTITY	0.98+
first set	QUANTITY	0.98+
an hour	QUANTITY	0.98+
First	QUANTITY	0.98+
six years ago	DATE	0.98+
The Cube	TITLE	0.98+
One	QUANTITY	0.98+
Neo Forge	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Each one	QUANTITY	0.98+
Paul Ds	PERSON	0.97+
IOx	ORGANIZATION	0.97+
today	DATE	0.97+
Teslas	ORGANIZATION	0.97+
MIT	ORGANIZATION	0.96+
Postgres	ORGANIZATION	0.96+
over seven months	QUANTITY	0.96+
one	QUANTITY	0.96+
five	DATE	0.96+
Venetian Expo Center	LOCATION	0.95+
Big Data Lakes	ORGANIZATION	0.95+
Cloudera	ORGANIZATION	0.94+
Columbia	LOCATION	0.94+
InfluxData	ORGANIZATION	0.94+
Wall Street	LOCATION	0.93+
SQL	TITLE	0.92+
Elastic	TITLE	0.92+
Data Bricks	ORGANIZATION	0.92+
Hacker News	TITLE	0.92+
two years ago	DATE	0.91+
Oracle	ORGANIZATION	0.91+
AWS Reinvent 2022	EVENT	0.91+
Elon Musker	PERSON	0.9+
Snowflake	ORGANIZATION	0.9+
Reinvent	ORGANIZATION	0.89+
billions of points a second	QUANTITY	0.89+
four year	QUANTITY	0.88+
Chronograph	TITLE	0.88+
Confluent	TITLE	0.87+
Spark	TITLE	0.86+
Apache	ORGANIZATION	0.86+
Snowflake	TITLE	0.85+
Grafana	TITLE	0.85+
GitHub	ORGANIZATION	0.84+

Collibra Data Citizens 22

>>Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions and they were largely confined to regulatory regulated industries that had to comply with public policy mandates. But as the cloud went mainstream, the tech giants showed us how valuable data could become and the value proposition for data quality and trust. It evolved from primarily a compliance driven issue to becoming a lynchpin of competitive advantage. But data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper specialized skills to develop data architectures and processes to serve the myriad data needs of organizations. >>And it resulted in a lot of frustration with data initiatives for most organizations that didn't have the resources of the cloud guys and the social media giants to really attack their data problems and turn data into gold. This is why today for example, this quite a bit of momentum to rethinking monolithic data architectures. You see, you hear about initiatives like data mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business Uni users, you hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that, but also how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. >>In other words, while it's enticing to experiment and run fast and loose with data initiatives kinda like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated. And intelligence governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is gonna use data that isn't trusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. >>Hello and welcome to the Cube's coverage of Data Citizens made possible by Calibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Ante and I'm one of the hosts of our program, which is running in parallel to data citizens. Now at the Cube we like to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the themes from the keynote speakers at Data Citizens and we'll hear from several of the executives. Felix Von Dala, who is the co-founder and CEO of Collibra, will join us along with one of the other founders of Collibra, Stan Christians, who's gonna join my colleague Lisa Martin. I'm gonna also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the, the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Hasselbeck. >>He's the vice president of Data quality at Collibra. He's an amazingly smart dude who founded Owl dq, a company that he sold to Col to Collibra last year. Now many companies, they didn't make it through the Hado era, you know, they missed the industry waves and they became Driftwood. Collibra, on the other hand, has evolved its business. They've leveraged the cloud, expanded its product portfolio, and leaned in heavily to some major partnerships with cloud providers, as well as receiving a strategic investment from Snowflake earlier this year. So it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. >>Last year, the Cube Covered Data Citizens Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hado movement, we had data lakes, we'd spark the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of ai, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you know, more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights for data, trust the data, and begin to use that data in new ways, fueling data, products, monetization and insights data citizens 2022 is back and we're pleased to have Felix Van Dema, who is the founder and CEO of Collibra. He's on the cube or excited to have you, Felix. Good to see you again. >>Likewise Dave. Thanks for having me again. >>You bet. All right, we're gonna get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever and get current on what Collibra has been up to over the past year and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >>Yeah, absolutely. And, and I think you said it well, Dave, and and the intro that that rising complexity and fragmentation in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use has only gotten kinda more, more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well, which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. >>So it's become much more acute. And, and to your earlier point, we do live in a different world and and the the past couple of years we could probably just kind of brute for it, right? We could focus on, on the top line. There was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are, are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, How do we truly get value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale it data, not just from a a technology and infrastructure perspective, but how do you actually scale data from an organizational perspective, right? You said at the the people and process, how do we do that at scale? And that's only, only only becoming much more important. And we do believe that the, the economic environment that we find ourselves in today is gonna be catalyst for organizations to really dig out more seriously if, if, if, if you will, than they maybe have in the have in the best. >>You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated it was gonna get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >>Yeah, absolutely. We, we started Colli in 2008. So in some sense and the, the last kind of financial crisis, and that was really the, the start of Colli where we found product market fit, working with large finance institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis and kind of here we are again in a very different environment, of course 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it data citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we are still relatively early in that, in that journey. >>Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a company and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your, your current momentum? >>Yeah, absolutely. Again, there's, there's a lot of tail organizations that are only maturing the data practices and we've seen it kind of transform or, or, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world where it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners like Google, Amazon, Snowflake, data bricks and, and others, right? As those kind of new modern data infrastructures, modern data architectures that are definitely all moving to the cloud, a great opportunity for us, our partners and of course our customers to help them kind of transition to the cloud even faster. >>And so we see a lot of excitement and momentum there within an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course, data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging ai, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architecture arch architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical so that they're really excited about about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believe. Then federated focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations. And so that aligns really well with our vision and, and from a product perspective, we've seen a lot of momentum with our customers there as well. >>Yeah, you know, a couple things there. I mean, the acquisition of i l dq, you know, Kirk Hasselbeck and, and their team, it's interesting, you know, the whole data quality used to be this back office function and, and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh. You mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So let's chat a little bit about the, the products. We're gonna go deeper in into products later on at, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the, the the under the covers in security, sort of making data more accessible for people just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >>Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission, either customers are still start, are just starting on that, on that journey. We wanna make it as easy as possible for the, for our organization to actually get started because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for really to, to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving. >>A lot of kind of ease of adoption, ease of use, but also then how do we make sure that lio becomes this kind of mission critical enterprise platform from a security performance architecture scale supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme from an innovation perspective, From a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction, how to make it easy. How do we make, how do we make available to true kind of shopping experience that anybody in your organization can, in a very easy search first way, find the right data product, find the right dataset, that data can then consume usage analytics. How do you, how do we help organizations drive adoption, tell them where they're working really well and where they have opportunities homepages again to, to make things easy for, for people, for anyone in your organization to kind of get started with ppia, you mentioned workflow designer, again, we have a very powerful enterprise platform. >>One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around K Bear Protect, which in partnership with Snowflake, which has been a strategic investor in kib, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data is managed as much more effective, effective rate, really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. >>So we launch more data quality cloud product as well as making use of those, those native compute capabilities in platforms like Snowflake, Data, Bricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down. So actually pushing down the computer and data quality, the monitoring into the underlying platform, which again, from a scale performance and ease of use perspective is gonna make a massive difference. And then more broadly, we, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So there's a lot coming out. The, the team has been work at work really hard and we are really, really excited about what we are coming, what we're bringing to markets. >>Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you, you talked about, you know, the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard. So how do you see sort of the future and, you know, give us the, your closing thoughts please? >>Yeah, absolutely. And I, and I think we we're really at this pivotal moment, and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not gonna fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can. It's kind of, of our, it's our mission. And so I'm really, really excited to see what we, what we are gonna, how the marks gonna evolve over the next, next few quarters and years. I think the trend is clearly there when we talk about data mesh, this kind of federated approach folks on data products is just another signal that we believe that a lot of our organization are now at the time. >>The understanding need to go beyond just the technology. I really, really think about how do we actually scale data as a business function, just like we've done with it, with, with hr, with, with sales and marketing, with finance. That's how we need to think about data. I think now is the time given the economic environment that we are in much more focus on control, much more focused on productivity efficiency and now's the time. We need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >>Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much and good luck in, in San Diego. I know you're gonna crush it out there. >>Thank you Dave. >>Yeah, it's a great spot for an in-person event and, and of course the content post event is gonna be available@collibra.com and you can of course catch the cube coverage@thecube.net and all the news@siliconangle.com. This is Dave Valante for the cube, your leader in enterprise and emerging tech coverage. >>Hi, I'm Jay from Collibra's Data Office. Today I want to talk to you about Collibra's data intelligence cloud. We often say Collibra is a single system of engagement for all of your data. Now, when I say data, I mean data in the broadest sense of the word, including reference and metadata. Think of metrics, reports, APIs, systems, policies, and even business processes that produce or consume data. Now, the beauty of this platform is that it ensures all of your users have an easy way to find, understand, trust, and access data. But how do you get started? Well, here are seven steps to help you get going. One, start with the data. What's data intelligence? Without data leverage the Collibra data catalog to automatically profile and classify your enterprise data wherever that data lives, databases, data lakes or data warehouses, whether on the cloud or on premise. >>Two, you'll then wanna organize the data and you'll do that with data communities. This can be by department, find a business or functional team, however your organization organizes work and accountability. And for that you'll establish community owners, communities, make it easy for people to navigate through the platform, find the data and will help create a sense of belonging for users. An important and related side note here, we find it's typical in many organizations that data is thought of is just an asset and IT and data offices are viewed as the owners of it and who are really the central teams performing analytics as a service provider to the enterprise. We believe data is more than an asset, it's a true product that can be converted to value. And that also means establishing business ownership of data where that strategy and ROI come together with subject matter expertise. >>Okay, three. Next, back to those communities there, the data owners should explain and define their data, not just the tables and columns, but also the related business terms, metrics and KPIs. These objects we call these assets are typically organized into business glossaries and data dictionaries. I definitely recommend starting with the topics that are most important to the business. Four, those steps that enable you and your users to have some fun with it. Linking everything together builds your knowledge graph and also known as a metadata graph by linking or relating these assets together. For example, a data set to a KPI to a report now enables your users to see what we call the lineage diagram that visualizes where the data in your dashboards actually came from and what the data means and who's responsible for it. Speaking of which, here's five. Leverage the calibra trusted business reporting solution on the marketplace, which comes with workflows for those owners to certify their reports, KPIs, and data sets. >>This helps them force their trust in their data. Six, easy to navigate dashboards or landing pages right in your platform for your company's business processes are the most effective way for everyone to better understand and take action on data. Here's a pro tip, use the dashboard design kit on the marketplace to help you build compelling dashboards. Finally, seven, promote the value of this to your users and be sure to schedule enablement office hours and new employee onboarding sessions to get folks excited about what you've built and implemented. Better yet, invite all of those community and data owners to these sessions so that they can show off the value that they've created. Those are my seven tips to get going with Collibra. I hope these have been useful. For more information, be sure to visit collibra.com. >>Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. My name is Dave Valante. With us is Kirk Hasselbeck, who's the vice president of Data Quality of Collibra Kirk, good to see you. Welcome. >>Thanks for having me, Dave. Excited to be here. >>You bet. Okay, we're gonna discuss data quality observability. It's a hot trend right now. You founded a data quality company, OWL dq, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >>Yeah, absolutely. It's, it's definitely exciting times for data quality, which you're right, has been around for a long time. So why now and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And, and while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as, as to why this is becoming so important now. And, and I guess you could kind of break this down simply and think about if Dave, you and I were gonna build, you know, a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, you know, what the ramifications could be, what, what those incidents would look like, or maybe better yet, we try to build a, a new trading algorithm with a crossover strategy where the 50 day crosses the, the 10 day average. >>And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, you know, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. You know, I, I bought a car not too long ago and my dad called and said, How many cylinders does it have? And I realized in that moment, you know, I might have failed him because, cause I didn't know. And, and I used to ask those types of questions about any lock brakes and cylinders and, and you know, if it's manual or, or automatic and, and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips, I, I really don't know that much about it. >>And, and that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the, the individuals loading and consuming all of this data for the company actually may not know that much about the data itself, and that's not even their job anymore. So we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >>You know, the other thing too about data quality, and for years we did the MIT CDO IQ event, we didn't do it last year, Covid messed everything up. But the observation I would make there thoughts is, is it data quality? Used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a, a risk to data as a, as an asset. And now as we say, we're gonna talk about observability. And so it's really become front and center just the whole quality issue because data's so fundamental, hasn't it? >>Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my, my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And, and that's kind of what's going on. There's, there's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before calibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is, is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, you know, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's, it's not ever going to be based on one or two domain experts anymore. >>So, So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they, are they cousins? What's your perspective on that? >>Yeah, it's, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the, the lingo is constantly moving is, you know, as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens, it's wrong and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. You know, everybody's talking about fresh data and stale data and, and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good in the bads. That was kind of your, your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data, but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >>So what's the Collibra angle on all this stuff made the acquisition, you got data quality observability coming together, you guys have a lot of expertise in, in this area, but you hear providence of data, you just talked about, you know, stale data, you know, the, the whole trend toward real time. How is Calibra approaching the problem and what's unique about your approach? >>Well, I think where we're fortunate is with our background, myself and team, we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the, the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution, it's more advanced than some of the observation techniques that that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights, and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong, just show me the big picture, help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows, you can actually achieve total data governance at this point with the acquisition of what was a Lineage company years ago, and then my company Ldq now Collibra, Data quality Collibra may be the best positioned for total data governance and intelligence in the space. >>Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you gotta announce new products, right? You're yearly event what's, what's new. Give us a sense as to what products are coming out, but specifically around data quality and observability. >>Absolutely. There's this, you know, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks is Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook in to these databases. And while we've always worked with the the same databases in the past, they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did your, my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? >>And with these native integrations that we're building and about to unveil, here's kind of a sneak peek for, for next week at Data Citizens. We're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration, you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress, cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >>So this is interesting because what you just described, you know, you mentioned Snowflake, you mentioned Google, Oh actually you mentioned yeah, data bricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool, but then Google's got the open data cloud. If you heard, you know, Google next and now data bricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm, I'm hearing to, to really understand the relationships between all those and have confidence across, you know, it's like Jak Dani, you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And, and, and that's what you're bringing to the table, Is that right? Did I get that right? >>Yeah, that's right. And it's, for us, it's, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now, we can send them the, the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network costs, zero egress cost, zero latency of time. And so when you were to log into Big Query tomorrow using our tool or like, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there, just like all of the major brands that you mentioned, but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And, and we think that this positions us to be the leader there. >>I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. All right, Kirk, give us your, your final thoughts and on on the trends that we've talked about and Data Citizens 22. >>Absolutely. Well, I think, you know, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there, they wanna know where everything is, where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're gonna see more one click solutions, more SAS based solutions and solutions that hopefully prove faster time to value on, on all of these modern cloud platforms. >>Excellent. All right, Kurt Hasselbeck, thanks so much for coming on the Cube and previewing Data Citizens 22. Appreciate it. >>Thanks for having me, Dave. >>You're welcome. Right, and thank you for watching. Keep it right there for more coverage from the Cube. Welcome to the Cube's virtual Coverage of Data Citizens 2022. My name is Dave Valante and I'm here with Laura Sellers, who's the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >>Thank you. Nice to be here. >>Yeah, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now when I think about historically fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >>Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Calibra a little bit a over a year ago was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, Oh, go ahead. >>I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it was just so complicated. But, but please carry on. I'd love to hear more about this. >>Yeah, I, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create. And also helping with data literacy, with something like usage analytics, it's really about driving adoption of the CLE platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called workflow designer. And we love our workflows at Libra, it's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflow flows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >>Y yes, there's definitely a lot to unpack there. I I, you know, you mentioned this idea of, of of, of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. I let's go with analogy. Why is it so important to data consumers? >>I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >>Yeah, I think when you, you, you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges, you know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies, you know, like the, the tech comes fast and furious. You got all these open source projects and get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view and how can organizations overcome these challenges? >>You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of, of and understand all the technologies that are coming. You also look at as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. >>It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is d is is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage, it's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >>Here's Topica data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, g gdpr, and, you know, California Consumer Privacy Act all becomes, becomes so much important. The cloud is really changed things in terms of performance and scale and of course partnering for, for, with Snowflake it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as a, as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically inter interested in sort of joint engineering or, and product innovation efforts, you know, beyond the standard go to market stuff? >>Definitely. So you mentioned there were a strategic investor in Calibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of our unified platform or touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which, which has always existed. We're able to profile and classify that data we're announcing with Calibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforce. So again, people can get more value out of their snowflake more quickly as far as time to value with, with our policies for all business users to be able to create. >>We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed with within Snowflake as well as the data quality. Pushdown, as I mentioned, data quality, you brought it up. It is a new, it is a, a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is again, a big ease of use push for us at Collibra of that ability to, to push it into snowflake, take advantage of the data, the data source, and the engine that already lives there and get the right and make sure you have the right quality. >>I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and, and that governance that I, that I need. You know, we've said many times on the cube that one of the notable differences in cloud this decade versus last decade, I mean ob there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in, in the hyperscale offerings cuz you got more stack, you know, mature stack capabilities and you know, it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google, whomever, and what do you see as your role and what's the Collibra sweet spot? >>Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and kbra protect there, but also tighter data plex integration. So similar to what you've seen with our strategic moves around Snowflake and, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of data plex. We also have great partners in SI's Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's as they're really important to help people with their whole data strategy and driving that data driven culture and, and Collibra being the core of it. >>Hi Laura, we're gonna, we're gonna end it there, but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >>Yeah, definitely. So I, I wanna say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust and get access to that data going forward. >>Well congratulations on all the pro progress. It was great to have you on the cube first time I believe, and really appreciate you, you taking the time with us. >>Yes, thank you for your time. >>You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the cube, your leader in enterprise and emerging tech coverage. >>So data modernization oftentimes means moving some of your storage and computer to the cloud where you get the benefit of scale and security and so on. But ultimately it doesn't take away the silos that you have. We have more locations, more tools and more processes with which we try to get value from this data. To do that at scale in an organization, people involved in this process, they have to understand each other. So you need to unite those people across those tools, processes, and systems with a shared language. When I say customer, do you understand the same thing as you hearing customer? Are we counting them in the same way so that shared language unites us and that gives the opportunity for the organization as a whole to get the maximum value out of their data assets and then they can democratize data so everyone can properly use that shared language to find, understand, and trust the data asset that's available. >>And that's where Collibra comes in. We provide a centralized system of engagement that works across all of those locations and combines all of those different user types across the whole business. At Collibra, we say United by data and that also means that we're united by data with our customers. So here is some data about some of our customers. There was the case of an online do it yourself platform who grew their revenue almost three times from a marketing campaign that provided the right product in the right hands of the right people. In other case that comes to mind is from a financial services organization who saved over 800 K every year because they were able to reuse the same data in different kinds of reports and before there was spread out over different tools and processes and silos, and now the platform brought them together so they realized, oh, we're actually using the same data, let's find a way to make this more efficient. And the last example that comes to mind is that of a large home loan, home mortgage, mortgage loan provider where they have a very complex landscape, a very complex architecture legacy in the cloud, et cetera. And they're using our software, they're using our platform to unite all the people and those processes and tools to get a common view of data to manage their compliance at scale. >>Hey everyone, I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizens. Stan, it's great to have you back on the cube. >>Hey Lisa, nice to be. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation. It also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow gonna to be a data citizen, right? So you need to make sure that these people are aware of it. You need that. People have skills and competencies to do with data what necessary and that's on, all right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in dashboard to actually make that decision and take that action. Right? >>And once you have that why to the organization, that's when you have a good data culture. Now that's continuous effort for most organizations because they're always moving, somehow they're hiring new people and it has to be continuous effort because we've seen that on the hand. Organizations continue challenged their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefit. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example, and we did an IDC study earlier this year, quite interesting. I can recommend anyone to it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this re reasons you're trying to bring both of those together and the ones that get data intelligence right, are successful and competitive. That's, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, the organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, laptops, what have you, you're not using those, right? Or you know, you're delivering them throughout organization, but not enabling your colleagues to actually do something with that asset. Same thing as through with data today, right? If you're not properly using the data asset and competitors are, they're gonna to get more advantage. So as to how you get this done, establish this. There's angles to look at, Lisa. So one angle is obviously the leadership whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? So I'm just gonna summarize it as a data leader for a second. >>So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can leadership in the organization and also the business value. And that's important. Cause those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You, I really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like s then the in place to really start upgrading that culture inch by inch if you'll, >>Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Calibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speaks here, are very excited. You know, we have Barb from m MIT speaking about data monetization. We have Dilla at the last minute. So really exciting agen agenda. Can't wait to get back out there essentially. So over the years at, we've doing this since two and eight, so a good years and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, if you, so everybody's wearing all sorts of hat at time. But over the years I've run, you know, presales that sales partnerships, product cetera. And as our company got a little bit biggish, we're now thousand two. Something like people in the company. >>I believe systems and processes become a lot important. So we said you CBRA isn't the size our customers we're getting there in of organization structure, process systems, et cetera. So we said it's really time for us to put our money where is and to our own data office, which is what we were seeing customers', organizations worldwide. And they organizations have HR units, they have a finance unit and over time they'll all have a department if you'll, that is responsible somehow for the data. So we said, ok, let's try to set an examples that other people can take away with it, right? Can take away from it. So we set up a data strategy, we started building data products, took care of the data infrastructure. That's sort of good stuff. And in doing all of that, ISA exactly as you said, we said, okay, we need to also use our product and our own practices and from that use, learn how we can make the product better, learn how we make, can make the practice better and share that learning with all the, and on, on the Monday mornings, we sometimes refer to eating our dog foods on Friday evenings. >>We referred to that drinking our own champagne. I like it. So we, we had a, we had the driver to do this. You know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should, this is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders, if you'll or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products can run, the data can flow and you know, the quality can be checked. >>And then we have a data intelligence or data governance builders where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the CBRA approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a map and started execution use of the use case. And a important ones are very simple. We them with our, our customers as well, people talking about the cata, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and privacy. So they have their process registry and they can see how the data flows. >>So that's a starting place and that turns into a marketplace so that if new analysts and data citizens join kbra, they immediately have a place to go to, to look at, see, ok, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? So they can immediately get access data. And another one that we is around trusted business. We're seeing that since, you know, self-service BI allowed everyone to make beautiful dashboards, you know, pie, pie charts. I always, my pet pee is the pie chart because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted this reporting. So we know if a, the dashboard, a data product essentially is built, we not that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either, Right. And that silver browser, right? Absolutely >>Decay. >>Exactly. Yes, >>Absolutely. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with to your organization, but there's a few that we use that might be of interest. Use those pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Are the, is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data sciences and products. Are people using them? Are they getting value from it? >>Can we calculate that value in ay perspective, right? Yeah. So that we can to the rest of the business continue to say we're tracking all those numbers and those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or, or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so and so forth. So these are an set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in thousand 10 there may have been like 10 achieve data officers or something. Gartner has exact numbers on them, but then they grew, you know, industries and the number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data. So you'll see over the years that's gonna evolve more digital and more data products. So for next years, my, my prediction is it's all products because it's an immediate link between data and, and the essentially, right? Right. So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. I think there's gonna be a continued challenge for the chief officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? >>So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful and the ones who get that will the ones that do it on the basis of data monetization, right? Connecting value to the data and making that value clear to all the data citizens in the organization, right? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an asset. >>Absolutely. Because there's so much value that can be extracted. Organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful and being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the cube at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage. >>Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra. Remember, all these videos are available on demand@thecube.net. And don't forget to check out silicon angle.com for all the news and wiki bod.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. We'll see you soon.

Published Date : Nov 2 2022

SUMMARY :

largely about getting the technology to work. Now the cloud is definitely helping with that, but also how do you automate governance? So you can see how data governance has evolved into to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the So it's a really interesting story that we're thrilled to be sharing And we said at the time, you know, maybe it's time to rethink data innovation. 2020s from the previous decade, and what challenges does that bring for your customers? as data becomes more impactful than important, the level of scrutiny with respect to privacy, So again, I think it just another incentive for organization to now truly look at data You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated the last kind of financial crisis, and that was really the, the start of Colli where we found product market Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners And the second is that those data pipelines that are now being created in the cloud, I mean, the acquisition of i l dq, you know, So that's really the theme of a lot of the innovation that we're driving. And so that's the big theme from an innovation perspective, One of our key differentiators is the ability to really drive a lot of automation through workflows. So actually pushing down the computer and data quality, one of the key principles you think about monetization. And I, and I think we we're really at this pivotal moment, and I think you said it well. We need to look beyond just the I know you're gonna crush it out there. This is Dave Valante for the cube, your leader in enterprise and Without data leverage the Collibra data catalog to automatically And for that you'll establish community owners, a data set to a KPI to a report now enables your users to see what Finally, seven, promote the value of this to your users and Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. And now you lead data quality at Collibra. imagine if we get that wrong, you know, what the ramifications could be, And I realized in that moment, you know, I might have failed him because, cause I didn't know. And it's so complex that the way companies consume them in the IT function is And so it's really become front and center just the whole quality issue because data's so fundamental, nowadays to this topic is, so maybe we could surface all of these problems with So the language is changing a you know, stale data, you know, the, the whole trend toward real time. we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. And the one right now is these hyperscalers in the cloud. And I think if you look at the whole So this is interesting because what you just described, you know, you mentioned Snowflake, And so when you were to log into Big Query tomorrow using our I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, Seeing that across the board, people used to know it was a zip code and nowadays Appreciate it. Right, and thank you for watching. Nice to be here. Can can you explain to our audience why the ability to manage data across the entire organization. I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it And one of the big pushes and passions we have at Collibra is to help with I I, you know, you mentioned this idea of, and really speeding the time to value for any of the business analysts, So where do you see, you know, the friction in adopting new data technologies? So one of the other things we're announcing with, with all of the innovations that are coming is So anybody in the organization is only getting access to the data they should have access to. So it was kind of smart that you guys were early on and We're able to profile and classify that data we're announcing with Calibra Protect this week that and get the right and make sure you have the right quality. I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, We also are doing more with Google around, you know, GCP and kbra protect there, you know, this year, the event your, your perspectives. And so it's all about everybody being able to easily It was great to have you on the cube first time I believe, cube, your leader in enterprise and emerging tech coverage. the cloud where you get the benefit of scale and security and so on. And the last example that comes to mind is that of a large home loan, home mortgage, Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, And one of the conclusions they found as they So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them But the IDC study that you just mentioned demonstrates they're three times So as to how you get this done, establish this. part of the equation of getting that right, is it's not enough to just have that leadership out Talk to us about how you are building a data culture within Collibra and But over the years I've run, you know, So we said you the data products can run, the data can flow and you know, the quality can be checked. The catalog for the data scientists to know what's in their data lake, and data citizens join kbra, they immediately have a place to go to, Yes, success of the data office. So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? to look into a crystal ball, what do you see in terms of the maturation industries and the number is estimated to be about 20,000 right now. How is that going to evolve for the next couple of years? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences And as the data show that you mentioned in that IDC study, the leader in live tech coverage. Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra.

ENTITIES

Entity	Category	Confidence
Laura	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Heineken	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Laura Sellers	PERSON	0.99+
2008	DATE	0.99+
Collibra	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Von Dala	PERSON	0.99+
Google	ORGANIZATION	0.99+
Felix Van Dema	PERSON	0.99+
seven	QUANTITY	0.99+
Stan Christians	PERSON	0.99+
2010	DATE	0.99+
Lisa	PERSON	0.99+
San Diego	LOCATION	0.99+
Jay	PERSON	0.99+
50 day	QUANTITY	0.99+
Felix	PERSON	0.99+
one	QUANTITY	0.99+
Kurt Hasselbeck	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
10 year	QUANTITY	0.99+
California Consumer Privacy Act	TITLE	0.99+
10 day	QUANTITY	0.99+
Six	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+
Last year	DATE	0.99+
demand@thecube.net	OTHER	0.99+
ETR Enterprise Technology Research	ORGANIZATION	0.99+
Barry	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
Python	TITLE	0.99+
2010s	DATE	0.99+
2020s	DATE	0.99+
Calibra	LOCATION	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Calibra	ORGANIZATION	0.99+
K Bear Protect	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Kirk Hasselbeck	PERSON	0.99+
12 months	QUANTITY	0.99+
tomorrow	DATE	0.99+
AWS	ORGANIZATION	0.99+
Barb	PERSON	0.99+
Stan	PERSON	0.99+
Data Citizens	ORGANIZATION	0.99+

Kirk Haslbeck, Collibra | Data Citizens '22

(bright upbeat music) >> Welcome to theCUBE's Coverage of Data Citizens 2022 Collibra's Customer event. My name is Dave Vellante. With us is Kirk Hasselbeck, who's the Vice President of Data Quality of Collibra. Kirk, good to see you. Welcome. >> Thanks for having me, Dave. Excited to be here. >> You bet. Okay, we're going to discuss data quality, observability. It's a hot trend right now. You founded a data quality company, OwlDQ and it was acquired by Collibra last year. Congratulations! And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >> Yeah, absolutely. It's definitely exciting times for data quality which you're right, has been around for a long time. So why now, and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as to why this is becoming so important now. And I guess you could kind of break this down simply and think about if Dave, you and I were going to build, you know a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, what the ramifications could be? What those incidents would look like? Or maybe better yet, we try to build a new trading algorithm with a crossover strategy where the 50 day crosses the 10 day average. And imagine if the data underlying the inputs to that is incorrect. We'll probably have major financial ramifications in that sense. So, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. I bought a car not too long ago and my dad called and said, "How many cylinders does it have?" And I realized in that moment, I might have failed him because 'cause I didn't know. And I used to ask those types of questions about any lock brakes and cylinders and if it's manual or automatic and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips. I really don't know that much about it. And that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the individuals loading and consuming all of this data for the company actually may not know that much about the data itself and that's not even their job anymore. So, we'll talk more about that in a minute but that's really what's setting the foreground for this observability play and why everybody's so interested, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >> You know, the other thing too about data quality and for years we did the MIT CDOIQ event we didn't do it last year at COVID, messed everything up. But the observation I would make there love thoughts is it data quality used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a a risk to data as an asset. And now, as we say, we're going to talk about observability. And so it's really become front and center, just the whole quality issue because data's fundamental, hasn't it? >> Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And that's kind of what's going on. There's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor. But with the scale that we've achieved in early days, even before Collibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is listening right intently nowadays to this topic is so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's not ever going to be based on one or two domain experts anymore. >> So, how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they cousins? What's your perspective on that? >> Yeah, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the lingo is constantly moving as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens it's wrong and when it doesn't, it's correct. Or I could look for a trend and I'll give you a good example. Everybody's talking about fresh data and stale data and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good and the bads. That was kind of your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >> So what's the Collibra angle on all this stuff made the acquisition you got data quality observability coming together, you guys have a lot of expertise in this area but you hear providence of data you just talked about stale data, the whole trend toward real time. How is Collibra approaching the problem and what's unique about your approach? >> Well, I think where we're fortunate is with our background, myself and team we sort of lived this problem for a long time in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution. It's more advanced than some of the observation techniques that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong just show me the big picture. Help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact connecting it with lineage and catalog, metadata. And as that grows, you can actually achieve total data governance. At this point, with the acquisition of what was a lineage company years ago and then my company OwlDQ, now Collibra Data Quality, Collibra may be the best positioned for total data governance and intelligence in the space. >> Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, they just said, "Oh, it's a glitch." So they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens '22 that you're announcing you got to announce new products, right? Your yearly event, what's new? Give us a sense as to what products are coming out but specifically around data quality and observability. >> Absolutely. There's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks, Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a salike model. And we've started to hook in to these databases. And while we've always worked with the same databases in the past they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did my data that I've spent all this time and money with my security team securing ever leave my hands? Did it ever leave my secure VPC as they call it? And with these native integrations that we're building and about to unveil here as kind of a sneak peek for next week at Data Citizens, we're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration you could log into the Collibra Data Quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >> So this is interesting because what you just described you mentioned Snowflake, you mentioned Google, oh actually you mentioned yeah, the Data Bricks. Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool but then Google's got the open data cloud. If you heard Google Nest and now Data Bricks doesn't call it the data cloud but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm hearing to really understand the relationships between all those and have confidence across, it's like (indistinct) you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And that's what you're bringing to the table. Is that right? Did I get that right? >> Yeah, that's right. And for us, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now we can send them the operating ability to crunch all of the calculations, the governance, the quality and get the answers. And what that's doing, it's basically zero network cost, zero egress cost, zero latency of time. And so when you were to log into Big BigQuery tomorrow using our tool or let or say Snowflake, for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls things of that nature that just become less onerous. What we're seeing is there's so much technology out there just like all of the major brands that you mentioned but how do we make it easier? The future is about less clicks, faster time to value faster scale, and eventually lower cost. And we think that this positions us to be the leader there. >> I love this example because every talks about wow the cloud guys are going to own the world and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. Alright, Kirk, give us your final thoughts and on the trends that we've talked about and Data Citizens '22. >> Absolutely. Well I think, one big trend is discovery and classification. Seeing that across the board people used to know it was a zip code and nowadays with the amount of data that's out there, they want to know where everything is where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases, how fast they can get controls and insights out of their tools. So I think we're going to see more one click solutions, more SAS-based solutions and solutions that hopefully prove faster time to value on all of these modern cloud platforms. >> Excellent, all right. Kurt Hasselbeck, thanks so much for coming on theCUBE and previewing Data Citizens '22. Appreciate it. >> Thanks for having me, Dave. >> You're welcome. All right, and thank you for watching. Keep it right there for more coverage from theCUBE.

Published Date : Oct 24 2022

SUMMARY :

Kirk, good to see you. Excited to be here. and it was acquired by Collibra last year. And it's so complex that the And now, as we say, we're going and I check out the NASDAQ market cap. and areas changing the and what's unique about your approach? of the curve there when most and some examples, remember and data activity happens in the database. and has the proper lineage, providence. and get the answers. and on the trends that we've talked about and solutions that hopefully and previewing Data Citizens '22. All right, and thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Kurt Hasselbeck	PERSON	0.99+
2010	DATE	0.99+
one	QUANTITY	0.99+
Kirk Hasselbeck	PERSON	0.99+
50 day	QUANTITY	0.99+
Kirk	PERSON	0.99+
10 day	QUANTITY	0.99+
OwlDQ	ORGANIZATION	0.99+
Kirk Haslbeck	PERSON	0.99+
next week	DATE	0.99+
Google	ORGANIZATION	0.99+
last year	DATE	0.99+
two sides	QUANTITY	0.99+
thousands	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
Snowflake	TITLE	0.99+
Data Citizens	ORGANIZATION	0.99+
Data Bricks	ORGANIZATION	0.99+
two other things	QUANTITY	0.98+
one click	QUANTITY	0.98+
tomorrow	DATE	0.98+
today	DATE	0.98+
five seconds	QUANTITY	0.97+
two domain	QUANTITY	0.94+
Collibra Data Quality	TITLE	0.92+
MIT CDOIQ	EVENT	0.9+
Data Citizens '22	TITLE	0.9+
Egress	ORGANIZATION	0.89+
Delta Lake	TITLE	0.89+
three	QUANTITY	0.86+
zero	QUANTITY	0.85+
Big Query	TITLE	0.85+
about a decade ago	DATE	0.85+
SQL Pushdown	TITLE	0.83+
Data Citizens 2022 Collibra	EVENT	0.82+
Big BigQuery	TITLE	0.81+
more than a couple	QUANTITY	0.79+
couple	QUANTITY	0.78+
one big	QUANTITY	0.77+
Collibra Data Quality	ORGANIZATION	0.75+
Collibra	OTHER	0.75+
Google Nest	ORGANIZATION	0.75+
Data Citizens '22	ORGANIZATION	0.74+
zero latency	QUANTITY	0.72+
SAS	ORGANIZATION	0.71+
Snowflake	ORGANIZATION	0.69+
COVID	ORGANIZATION	0.69+
years ago	DATE	0.68+
Wall Street	LOCATION	0.66+
theCUBE	ORGANIZATION	0.66+
many numbers	QUANTITY	0.63+
Collibra	PERSON	0.63+
times	QUANTITY	0.61+
Data	ORGANIZATION	0.61+
too long	DATE	0.6+
Vice President	PERSON	0.57+
data	QUANTITY	0.56+
CDO	TITLE	0.52+
Bricks	TITLE	0.48+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

(upbeat music) >> Welcome back everyone to The Cubes coverage here in Seattle, Washington. For AWS's Marketplace Seller Conference. It's the big news within the Amazon partner network, combining with marketplace, forming the Amazon partner organization. Part of a big reorg as they grow to the next level, NextGen cloud, mid-game on the chessboard. Cube's got it covered. I'm John Furry, your host at Cube. Great guests here from Data bricks. Both cube alumni's. Jack Anderson, GM and VP of the Databricks partnership team for AWS. You handle that relationship and Joel Minick vice president of product and partner marketing. You guys have the keys to the kingdom with Databricks and AWS. Thanks for joining. Good to see you again. >> Thanks for having us back. >> Yeah, John, great to be here. >> So I feel like we're at Reinvent 2013. Small event, no stage, but there's a real shift happening with procurement. Obviously it's a no brainer on the micro, you know, people should be buying online. Self-service, Cloud Scale. But Amazon's got billions being sold through their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website, marketplace. Merge our partner organizations, have more synergy and frictionless experiences so everyone can make more money and customer's are going to be happier. >> Yeah, that's right. >> I mean, you're running relationship. You're in the middle of it. >> Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do, is to work with Amazon on these really, you know, unique use cases. >> You know, I interviewed Ali many times over the years. I remember many years ago, maybe six, seven years ago, we were talking. He's like, "we're all in on AWS." Obviously now the success of Databricks, you've got multiple clouds, see that. Customers have choice. But I remember the strategy early on. It was like, we're going to be deep. So this is, speaks volumes to the relationship you have. Years. Jack, take us through the relationship that Databricks has with AWS from a partner perspective. Joel, and from a product perspective. Because it's not like you guys are Johnny come lately, new to the scene. >> Right. >> You've been there, almost president creation of this wave. What's the relationship and how does it relate to what's going on today? >> So most people may not know that Databricks was born on AWS. We actually did our first $100 million of revenue on Amazon. And today we're obviously available on multiple clouds. But we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon, and marketplace broadens our reach. And so, we think of marketplace in three different aspects. We've got the marketplace private offer business, which we've been doing for a number of years. Matter of fact, we were driving well over a hundred percent year over year growth in private offers. And we have a nine figure business. So it's a very significant business. And when a customer uses a private offer, that private offer counts against their private pricing agreement with AWS. So they get pricing power against their private pricing. So it's really important it goes on their Amazon bill. In may we launched our pay as you go, on demand offering. And in five short months, we have well over a thousand subscribers. And what this does, is it really reduces the barriers to entry. It's low friction. So anybody in an enterprise or startup or public sector company can start to use Databricks on AWS, in a consumption based model, and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation, pilots, POCs. They're really learning the value of that first, use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offer, CPPO. Super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with Databricks on behalf of customers. >> So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data, coming in with the buyers you get. And obviously the integration piece all fitting in there. >> Exactly. >> Okay, so those are the offers, that's current, what's in marketplace today. Is that the products... What are people buying? >> Yeah. >> I mean, I guess what's the... Joel, what are people buying in the marketplace? And what does it mean for them? >> So fundamentally what they're buying is the ability to take silos out of their organization. And that is the problem that Databricks is out there to solve. Which is, when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data. And your teams are trying to use all of this data to solve really complicated problems. And as Databricks, as the Lakehouse Company, what we're helping customers do is, how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find, through the marketplace, those rapid adoption use cases where they can get rid of these data warehousing, data lake silos they've had in the past. Get their unstructured and structured data onto one data platform, an open data platform, that is no longer adherent to any proprietary formats and standards and something they can, very much, very easily, integrate into the rest of their data environment. Apply one common data governance layer on top of that. So that from the time they ingest that data, to the time they use that data, to the time they share that data, inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform, with that common governance solution, they'd being able to bring all of those use cases together. Across their real time streaming, their data engineering, their BI, their AI. All of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges they just couldn't solve before. A good example of this, you know, one of the world's now largest data streaming platforms runs on Databricks with AWS. And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses. That they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data, from what the customers are doing with their platform and the recommendations they want to push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex. But by building it on Databricks, they were able to release it in record time and have grown at a record pace to now be the number one platform. >> And this product, it's impacting product development. >> Absolutely. >> I mean, this is like the difference between lagging months of product development, to like days. >> Yes. >> Pretty much what you're getting at. >> Yes. >> So total agility. >> Mm-hmm. >> I got that. Okay, now, I'm a customer I want to buy in the marketplace, but you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from AWS's leadership, Chris, was up there speaking, and Mona was, "Hey, he's a CRO conference chief revenue officer" conversation. Which means someone's getting compensated. So, if I'm the sales rep at Databricks, what's my motion to the customer? Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Or, how do you handle it? >> Well, I'd add what Joel just talked about with, you know, with the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, it's the entire Data Bricks offering. And- >> The flagship, all the, the top stuff. >> Everything, the flagship, the complete offering. So it's not segmented. It's not a sub segment. >> Okay. >> It's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we view this two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend Databricks for the right situation. Same thing with Databricks, our sales force wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get Databricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars. You have $5 million of spend. You put that spend through the flywheel with AWS marketplace, and then you can use that in your negotiations with AWS to get better pricing overall. So that's how we view it. >> So customers are driving. This sounds like. >> Correct. For sure. >> So they're looking at this as saying, Hey, I'm going to just get purchasing power with all my relationships. Because it's a solution architectural market, right? >> Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider. If they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing power. >> Okay, Joel, we're going to date ourselves. At least I will. So back in the old days, (group laughter) It used to be, do a Barney deal with someone, Hey, let's go to market together. You got to get paper, you do a biz dev deal. And then you got to say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for Databricks, or any company is, to go find those partners and do deals, versus now Amazon is the center point for the customer. So you can still do those joint deals, but this seems to be flipping the script a little bit. >> Well, it is, but we still have vars and consulting partners that are doing implementation work. Very valuable work, advisory work, that can actually work with marketplace through the CPPO offering. So the marketplace allows multiple ways to procure your solution. >> So it doesn't change your business structure. It just makes it more efficient. >> That's correct. >> That's a great way to say it. >> Yeah, that's great. >> Okay. So, that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >> Yes. >> Absolutely. >> Economically. >> Economically, it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon. Especially when it comes back to co-selling, right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution. And our teams are working backwards from those use cases, you know, to collaborate and land them. >> Yeah. I want to get that out there. Go ahead, Joel. >> So one of the other things I might add to that too, you know, and why this is advantageous for companies like Databricks to work through the marketplace. Is it makes it so much easier for customers to deploy a solution. It's very, literally, one click through the marketplace to get Databricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. >> You know, it's interesting. I want to bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself, EDI back in the old days, you know, all that craziness. Now this is all the internet, basically through the console. I get the infrastructure side, you know, spin up and provision some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage. A little nuanced. Most enterprises aren't yet there of integrating tech, on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrators dilemma, not an innovator's dilemma. So like, I want to integrate. So now I have integration points with Databricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you got to build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or am I off because, no one's going to be buying software like they used to. They buy software to integrate it. >> Yeah, no- >> Because everything's integrated. >> I think AWS has done a great job at creating a partner ecosystem, right? To give customers the right tools for the right jobs. And those might be with third parties. Databricks is doing the same thing with our partner connect program, right? We've got customer partners like Five Tran and DBT that, you know, augment and enhance our platform. And so you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >> Yeah. It's almost like, you know, bundling and un bundling. I was talking about this with, with Dave Alante about Supercloud. Which is why wouldn't a customer want the best solution in their architecture? Period. In its class. If someone's got API security or an API gateway. Well, you know, I don't want to be forced to buy something because it's part of a suite. And that's where you see things get sub optimized. Where someone dominates a category and they have, oh, you got to buy my version of this. >> Joel and I were talking, we were actually saying, what's really important about Databricks, is that customers control the data, right? You want to comment on that? >> Yeah. I was going to say, you know, what you're pushing on there, we think is extraordinarily, you know, the way the market is going to go. Is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically, I think, really strong places, Databricks and AWS have lined up, is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the Lakehouse, one thing we've always been extremely committed to, as a company, is building the data platform on an open foundation. And we do that primarily through Delta Lake and making sure that, to Jack's point, with Databricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed Databricks to have the breadth of integrations that it has with all the other data tools out there. Because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >> When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what am I foreclosing if I go with something else that's not as open? What's the customer's downside as you think about what's around the corner in the industry? Because if you believe it's going to be open, open source, which I think open source software is the software industry, and integration is a big deal. Because software's going to be plentiful. >> Sure. >> Let's face it. It's a good time to be in software business. But Cloud's booming. So what's the downside, from your Databricks perspective? You see a buyer clicking on Databricks versus that alternative. What's potentially should they be a nervous about, down the road, if they go with a more proprietary or locked in approach? >> Yeah. >> Well, I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also, then, beholden to the pace at which that provider is able to innovate. >> Mm-hmm. >> And I think we've seen lots of times over history where, you know, a proprietary format may run ahead, for a while, on a lot of innovation. But as that market control begins to solidify, that desire to innovate begins to degrade. Whereas in the open formats- >> So extract rents versus innovation. (John laughs) >> Exactly. Yeah, exactly. >> I'll say it. >> But in the open world, you know, you have to continue to innovate. >> Yeah. >> And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you are always going to be at the forefront of what is the latest. >> You know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked with proprietary. >> Yeah. >> You know, SNA and IBM, deck net was digital. You know the rest. And then TCPIP was part of the open systems interconnect. >> Mm-hmm. >> Revolutionary (indistinct) a big part of that, as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. >> Yeah. >> But that helped inter operate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid-game, then you get the end game. You're not there at the end game yet at Cloud. But Cloud- >> There's, always some form of lock in, right? Andy Jazzy will address it, you know, when making a decision. But if you're going to make a decision you want to reduce- You don't want to be limited, right? So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now, is an AI driven business, right? And so it has to do with, can you get that data out of silos? Can you organize it and secure it? And then can you work with data scientists to feed those models? >> Yeah. >> In a very consistent manner. And so the tools of tomorrow will, to Joel's point, will be open and we want interoperability with those tools. >> And choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds, is that they have to compete directly too. Redshift competes directly with a lot of other stuff. But they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're going to be, they're onto it. This is the- >> To Amazon's credit by having these solutions that may compete with native services in marketplace, they are providing customers with choice, low price- >> And access to the core value. Which is the hardware- >> Exactly. >> Which is their platform. Okay. So I want to get you guys thought on something else I see emerging. This is, again, kind of Cube rumination moment. So on stage, Chris unpacked a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing, compensation, workflows, services behind the curtain. And one of those things he mentioned was, they talk about resellers or channel partners, depending upon what you talk about. We believe, Dave and I believe on the Cube, that the entire indirect sales channel of the industry is going to be disrupted radically. Because those players were selling hardware in the old days and software. That game is going to change. You mentioned you guys have a program, let me get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in. Which means that the old reseller channels are going to be rewritten. They're going to be refactored with this new kinds of access. Because you've got scale, you've got money and you've got product. And you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, a value added reseller or VAB or business. >> You've got to evolve. >> You got to, you got to be here. >> Yes. >> Yeah. >> How are you guys working with those partners? Because you say you have a product in your marketplace there. How do I make money if I'm a reseller with Databricks, with Amazon? Take me through that use case. >> Well I'll let Joel comment, but I think it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations. They need expertise from consulting and SI partners. If those consulting and SI partners happen to resell the solution as well. Well, that's another aspect of their business. But I really think it is the expertise that the partners bring to help customers get outcomes. >> Joel, channel big opportunity for Amazon to reimagine this. >> For sure. Yeah. And I think, you know, to your comment about how do resellers take advantage of that, I think what Jack was pushing on is spot on. Which is, it's becoming more and more about the expertise you bring to the table. And not just transacting the software. But now actually helping customers make the right choices. And we're seeing, you know, both SIs begin to be able to resell solutions and finding a lot of opportunity in that. >> Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's going to be the evolution that this goes. >> At the end of the day, it's about services, right? >> For sure. Yeah. >> I mean... >> You've got a great service. You're going to have high gross profits. >> Yeah >> Managed service provider business is alive and well, right? Because there are a number of customers that want that type of a service. >> I think that's going to be a really hot, hot button for you guys. I think being the way you guys are open, this channel, partner services model coming in, to the fold, really kind of makes for kind of that Supercloud like experience, where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on, within Databricks. >> For sure. >> On top of this ecosystem. How does that work? This is kind of like, hasn't been written up in business school and case studies yet. This is new. What is this? >> I think, you know, what it comes down to is, you're seeing ecosystems begin to evolve around the data platforms. And that's going to be one of the big, kind of, new horizons for us as we think about what drives ecosystems. It's going to be around, well, what's the data platform that I'm using? And then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around Databricks in its Lakehouse platform as well. As customers are looking at well, if I'm standing these Lakehouses up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >> I mean you think about ecosystem theory, we're living a whole nother dream. And I'm not kidding. It hasn't yet been written up and for business school case studies is that, we're now in a whole nother connective tissue, ecology thing happening. Where you have dependencies and value proposition. Economics, connectedness. So you have relationships in these ecosystems. >> And I think one of the great things about the relationships with these ecosystems, is that there's a high degree of overlap. >> Yeah. >> So you're seeing that, you know, the way that the cloud business is evolving, the ecosystem partners of Databricks, are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for you. >> Joel, Jack, I love it because you know what it means? The best ecosystem will win, if you keep it open. >> Sure, sure. >> You can see everything. If you're going to do it in the dark, you know, you don't know the outcome. I mean, this is really kind of what we're talking about. >> And John, can I just add that when I was at Amazon, we had a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? >> Yeah. >> And so there's a platform decision being made and that ecosystem is going to evolve around the platform. >> Yeah, and I totally agree. And the word innovation gets kicked around. That's why, you know, when we had our Supercloud panel, it was called the innovators dilemma, with a slash through it, called the integrater's dilemma. Innovation is the digital transformation. So- >> Absolutely. >> Like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating? If APIs are connective tissue, what's automation, what's the service messages look like? I mean, a whole nother set of, kind of thinking, goes on in these new ecosystems and these new products. >> And that thinking is, has been born in Delta Sharing, right? So the idea that you can have a multi-cloud implementation of Databricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud solution. >> Well, Databricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great growing ecosystem. And again, I think a shining example of what every enterprise is going to do. Build on top of something, operating model, get that operating model, driving revenue. >> Mm-hmm. >> Yeah. >> Whether, you're Goldman Sachs or capital one or XYZ corporation. >> S and P global, NASDAQ. >> Yeah. >> We've got, you know, the biggest verticals in the world are solving tough problems with Databricks. I think we'd be remiss because if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions. Whether it's the relationship we have with our engineering and service teams. Our marketing teams, you know, product development. And we're going to be at Reinvent. A big presence at Reinvent. We're looking forward to seeing you there, again. >> Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening. This NextGen Cloud is here. We're seeing this evolve, kind of new economics, new value propositions kind of scaling up. Producing more. So you guys are doing a great job. Thanks for coming on the Cube and taking the time. Joel, great to see you at the check. >> Thanks for having us, John. >> Okay. Cube coverage here. The world's changing as APN comes together with the marketplace for a new partner organization at Amazon web services. The Cube's got it covered. This should be a very big, growing ecosystem as this continues. Billions of being sold through the marketplace. And of course the buyers are happy as well. So we've got it all covered. I'm John Furry. your host of the cube. Thanks for watching. (upbeat music)

Published Date : Oct 10 2022

SUMMARY :

You guys have the keys to the kingdom on the micro, you know, You're in the middle of it. you know, unique use cases. to the relationship you have. and how does it relate to And so we see customers, you know, And obviously the integration Is that the products... buying in the marketplace? And that is the problem that Databricks And this product, it's the difference between So how do you guys look at So it's not a subset, it's the Everything, the flagship, and then you can use So customers are driving. For sure. Hey, I'm going to just you know, multiple ISV spend here is that the alternative So the marketplace allows multiple ways So it doesn't change So you guys are actually incented It's the right thing to do for out there. the marketplace to get Databricks stood up I get the infrastructure side, you know, Databricks is doing the same thing And that's where you see And that is one of the things that aren't as open as you guys, down the road, if they go that provider is able to innovate. that desire to innovate begins to degrade. So extract rents versus innovation. Yeah, exactly. But in the open world, you know, And the open source the protocol stacked with proprietary. You know the rest. And so like, you know, that was, I call it the chessboard, you know, And if you look at what every customer's And so the tools of tomorrow And I would say that, you know, And access to the core value. to data centers or software, you know, How are you guys working that the partners bring to to reimagine this. And I think, you know, And that's going to be the Yeah. You're going to have high gross profits. that want that type of a service. I think being the way you guys are open, This is kind of like, And so I think there's, you know, So you have relationships And I think one of the great things And so as you build these because you know what it means? in the dark, you know, that want to build things themselves. to evolve around the platform. And the word innovation more of a, are you open? So the idea that you and the CapEx gift from AWS. Whether, you're Goldman for all of the investments across Joel, great to see you at the check. And of course the buyers

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joel	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Peter	PERSON	0.99+
Mona	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Kevin	PERSON	0.99+
Joel Minick	PERSON	0.99+
Andy	PERSON	0.99+
Ryan	PERSON	0.99+
Cathy Dally	PERSON	0.99+
Patrick	PERSON	0.99+
Greg	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stephen	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Marcus	PERSON	0.99+
Dave Alante	PERSON	0.99+
Eric	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dan	PERSON	0.99+
Peter Burris	PERSON	0.99+
Greg Tinker	PERSON	0.99+
Utah	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Raleigh	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Carl Krupitzer	PERSON	0.99+
Lisa	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
JetBlue	ORGANIZATION	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Angie Embree	PERSON	0.99+
Kirk Skaugen	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
2014	DATE	0.99+
Simon	PERSON	0.99+
United	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Southwest	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
Frank	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
1984	DATE	0.99+
China	LOCATION	0.99+
Boston	LOCATION	0.99+
California	LOCATION	0.99+
Singapore	LOCATION	0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

>>Welcome back everyone to the cubes coverage here in Seattle, Washington, AWS's marketplace seller conference. It's the big news within the Amazon partner network, combining with marketplaces, forming the Amazon partner organization, part of a big reorg as they grow the next level NextGen cloud mid-game on the chessboard. Cube's got cover. I'm John fur, host of Cub, a great guests here from data bricks, both cube alumnis, Jack Anderson, GM of the and VP of the data bricks partnership team. For ADOS, you handle that relationship and Joel Minick vice president of product and partner marketing. You guys are the, have the keys to the kingdom with data, bricks, and AWS. Thanks for joining. Thanks for good to see you again. Thanks for >>Having us back. Yeah, John, great to be here. >>So I feel like we're at reinvent 2013 small event, no stage, but there's a real shift happening with procurement. Obviously it makes it's a no brainer on the micro, you know, people should be buying online self-service cloud scale, but Amazon's got billions being sold to their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website marketplace merge our partner to have more synergy and friction, less experiences so everyone can make more money and customer's gonna be happier. >>Yeah, that's right. >>I mean, you're run relationship. You're in the middle of it. >>Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do is to work with Amazon on these really, you know, unique use cases. >>You know, I interviewed Ali many times over the years. I remember many years ago, I think six, maybe six, seven years ago, we were talking. He's like, we're all in ons. Obviously. Now the success of data bricks, you've got multiple clouds. See that customers have choice, but I remember the strategy early on. It was like, we're gonna be deep. So this is speaks volumes to the, the relationship you have years. Jack take us through the relationship that data bricks has with AWS from a, from a partner perspective, Joel, and from a product perspective, because it's not like you got to Johnny come lately new to the new, to the scene, right? We've been there almost president creation of this wave. What's the relationship and has it relate to what's going on today? >>So, so most people may not know that data bricks was born on AWS. We actually did our first 100 million of revenue on Amazon. And today we're obviously available on multiple clouds, but we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon and marketplace broadens our reach. And so we think of marketplace in three different aspects. We've got the marketplace, private offer business, which we've been doing for a number of years. Matter of fact, we we're driving well over a hundred percent year over year growth in private offers and we have a nine figure business. So it's a very significant business. And when a customer uses a private offer that private offer counts against their private pricing agreement with AWS. So they get pricing power against their, their private pricing. >>So it's really important. It goes on their Amazon bill in may. We launched our pay as you go on demand offering. And in five short months, we have well over a thousand subscribers. And what this does is it really reduces the barriers to entry it's low friction. So anybody in an enterprise or startup or public sector company can start to use data bricks on AWS and pay consumption based model and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation pilots, POCs, they're, they're really learning the value of that first use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offers C P O super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with data bricks on behalf of customers. >>So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data coming in with, with the buyers you go. And obviously the integration piece all fitting in there. Exactly. Exactly. Okay. So that's that those are the offers that's current and what's in marketplace today. Is that the products, what are, what are people buying? I mean, I guess what's the Joel, what are, what are people buying in the marketplace and what does it mean for >>Them? So fundamentally what they're buying is the ability to take silos out of their organization. And that's, that is the problem that data bricks is out there to solve, which is when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data, and your teams are trying to use all of this data to solve really complicated problems. And as data bricks as the lake house company, what we're helping customers do is how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find through the marketplace, those rapid adoption use cases where they can get rid of these data, warehousing data lake silos they've had in the past, get their unstructured and structured data onto one data platform and open data platform that is no longer adherent to any proprietary formats and standards and something. >>They can very much, very easily integrate into the rest of their data environment, apply one common data governance layer on top of that. So that from the time they ingest that data to the time they use that data to the time they share that data inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform with that common governance solution, they'd being able to bring all of those use cases together across their real time, streaming their data engineering, their BI, their AI, all of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges. They just couldn't solve before a good example of this, you know, one of the world's now largest data streaming platforms runs on data bricks with AWS. >>And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses, that they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data from what the customers are doing with their platform and the recommendations they wanna push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex, but by building it on data bricks, they were able to release it in record time and have grown at, at record pace >>To not be that's product platform that's impacting product development. Absolutely. I mean, this is like the difference between lagging months of product development to like days. Yes. Pretty much what you're getting at. Yeah. So total agility. I got that. Okay. Now I'm a customer I wanna buy in the marketplace, but I also, you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from a Davis's leadership, Chris was up there speaking and, and, and moment I was, Hey, he's a CRO conference, chief revenue officer conversation, which means someone's getting compensated. So if I'm the sales rep at data bricks, what's my motion to the customer. Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Is there or an audio lift? >>Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, the entire data bricks offering and >>The flagship, all the, the top, >>Everything, the flagship, the complete offering. So it's not, it's not segmented. It's not a sub segment. It's it's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we, we, we view this two, two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend data bricks for the right situation. Same thing with data bricks. Our Salesforce wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get data bricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars, you have $5 million of spend, you put that spend through the flywheel with AWS marketplace. And then you can use that in your negotiations with AWS to get better pricing overall. So that's how we, >>We do it. So customers are driving. This sounds like, correct. For sure. So they're looking at this as saying, Hey, I'm gonna just get purchasing power with all my relationships because it's a solution architectural market, right? >>Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider, if they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing >>Power, okay, Jill, we're gonna date ourselves. At least I will. So back in the old days, it used to be, do a Barney deal with someone, Hey, let's go to market together. You gotta get paper, you do a biz dev deal. And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for data bricks or any company is to go find those partners and do deals versus now Amazon is the center point for the customer so that you can still do those joint deals. But this seems to be flipping the script a little bit. >>Well, it is, but we still have VAs and consulting partners that are doing implementation work very valuable work advisory work that can actually work with marketplace through the C PPO offering. So the marketplace allows multiple ways to procure your >>Solution. So it doesn't change your business structure. It just makes it more efficient. That's >>Correct. >>That's a great way to say it. Yeah, >>That's great. So that's so that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >>Yes, >>Absolutely. Economically. Yeah. >>E economically it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon, especially when it comes back to co-selling right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution and our teams are working backwards from those use cases, you know, to collaborate, land them. >>Yeah. I want, I wanna get that out there. Go ahead, Joel. >>So one of the other things I might add to that too, you know, and why this is advantageous for, for companies like data bricks to, to work through the marketplace, is it makes it so much easier for customers to deploy a solution. It's, it's very, literally one click through the marketplace to get data bricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. You >>Know, it's interesting. I wanna bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself EDI back in the old days, you know, all that craziness. Now this is all the, all the internet, basically through the console, I get the infrastructure side, you know, spin up and provision. Some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage little nuance, most enterprises aren't yet there of integrating tech on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrator's dilemma, not an innovator's dilemma. So like, I want to integrate, so now I have integration points with data bricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you gotta build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or, or am I off because no, one's gonna be buying software. Like they used to, they buy software to integrate it. >>Yeah, >>No, I, cause everything's integrated. >>I think AWS has done a great job at creating a partner ecosystem, right. To give customers the right tools for the right jobs. And those might be with third parties, data bricks is doing the same thing with our partner connect program. Right. We've got customer, customer partners like five tra and D V T that, you know, augment and enhance our platform. And so you, you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >>Yeah. It's almost like, you know, bundling and unbundling. I was talking about this with, with Dave ante about Supercloud, which is why wouldn't a customer want the best solution in their architecture period. And it's class. If someone's got API security or an API gateway. Well, you know, I don't wanna be forced to buy something because it's part of a suite and that's where you see things get suboptimized where someone dominates a category and they have, oh, you gotta buy my version of this. Yeah. >>Joel, Joel. And that's Joel and I were talking, we're actually saying what what's really important about Databricks is that customers control the data. Right? You wanna comment on that? >>Yeah. I was say the, you know what you're pushing on there we think is extraordinarily, you know, the way the market is gonna go is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically I think really strong places, data, bricks, and AWS have lined up is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the lake house, one thing we've always been extremely committed to as a company is building the data platform on an open foundation. And we do that primarily through Delta lake and making sure that to Jack's point with data bricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with all the other data tools out there, because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >>When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what's the, what am I foreclosing? If I go with something else that's not as open what what's the customer's downside as you think about what's around the corner in the industry. Cuz if you believe it's gonna be open, open source, which I think opens our software is the software industry and integration is a big deal, cuz software's gonna be plentiful. Let's face it. It's a good time to be in software business, but cloud's booming. So what's the downside from your data bricks perspective, you see a buyer clicking on data bricks versus that alternative what's potentially is should they be a nervous about down the road if they go with a more proprietary or locked in approach? Well, >>I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also then beholden to the pace at which that provider is able to innovate. And I think we've seen lots of times over history where, you know, a proprietary format may run ahead for a while on a lot of innovation. But as that market control begins to solidify that desire to innovate begins to, to degrade, whereas in the open format. So >>Extract rents versus innovation. Exactly. >>Yeah, exactly. >>But >>I'll say it in the open world, you know, you have to continue to innovate. Yeah. And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you were always going to be at the forefront of what is the >>Latest, you know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked for proprietary. Yeah. You know, SNA at IBM deck net was digital, you know, the rest is, and then TCP, I P was part of the open systems, interconnect, revolutionary Oly, a big part of that as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. Yeah. But that helped interoperate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid game. Then you got the end game and we're not there. The end game yet cloud the cloud. >>There's, there's always some form of lock in, right. Andy jazzy will, will address it, you know, when making a decision. But if you're gonna make a decision you want to reduce as you don't wanna be limited. Right. So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now is an AI driven business. Right? And so it has to do with, can you get that data outta silos? Can you, can you organize it and secure it? And then can you work with data scientists to feed those models? Yeah. In a, in a very consistent manner. And so the tools of tomorrow will to Joel's point will be open and we want interoperability with those >>Tools and, and choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds is that they have to compete directly too. Redshift competes directly with a lot of other stuff, but they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're gonna be they're onto it. This is >>The Amazon's credit by having these, these solutions that may compete with native services in marketplace, they are providing customers with choice, low >>Price and access to the S and access to the core value. Exactly. Which the >>Hardware, which is their platform. Okay. So I wanna get you guys thought on something else. I, I see emerging, this is again kind of cube rumination moment. So on stage Chris unpacked, a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing compensation, workflows services behind the curtain. And one of the things he mentioned was they talk about resellers or channel partners, depending upon what you talk about. We believe Dave and I believe on the cube that the entire indirect sales channel of the industry is gonna be disrupted radically because those players were selling hardware in the old days and software, that game is gonna change. You know, you mentioned you guys have a program, want to get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in which means that the old reseller channels are gonna be rewritten. They're gonna be refactored with this new kinds of access. Cuz you've got scale, you've got money and you've got product and you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, value added reseller or V or business, >>You've gotta evolve. >>You gotta, you gotta be here. Yes. How are you guys working with those partners? Cuz you say you have a part in your marketplace there. How do I make money? If I'm a reseller with data bricks with eight Amazon, take me through that use case. >>Well I'll let Joel comment, but I think it's, it's, it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations, they need expertise from consulting and SI partners. If those consulting SI partners happen to resell the solution as well. Well, that's another aspect of their business, but I really think it is the expertise that the partners bring to help customers get outcomes. >>Joel, channel big opportunity for re re Amazon to reimagine this. >>For sure. Yeah. And I think, you know, to your comment about how to resellers take advantage of that, I think what Jack was pushing on is spot on, which is it's becoming more about more and more about the expertise you bring to the table and not just transacting the software, but now actually helping customers make the right choices. And we're seeing, you know, both SI begin to be able to resell solutions and finding a lot of opportunity in that. Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's gonna be the evolution that >>This gets at the end of the day. It's about services for sure, for sure. You've got a great service. You're gonna have high gross profits. And >>I think that the managed service provider business is alive and well, right? Because there are a number of customers that want that, that type of a service. >>I think that's gonna be a really hot, hot button for you guys. I think being the way you guys are open this channel partner services model coming in to the fold really kind of makes for kind of that super cloudlike experience where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on within data bricks for sure. On top of this ecosystem, how does that work? This is kinda like hasn't been written up in business school and case studies yet this is new. What is this? >>I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around the data platforms and that's gonna be one of the big kind of new horizons for us as we think about what drives ecosystems it's going to be around. Well, what is the, what's the data platform that I'm using and then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around data bricks in its Lakehouse platform, as well as customers are looking at well, if I'm standing these Lakehouse up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >>I mean you think about ecosystem theory, we're living a whole nother dream and I'm, and I'm not kidding. It hasn't yet been written up and for business school case studies is that we're now in a whole nother connective tissue ecology thing happening where you have dependencies and value proposition economics connectedness. So you have relationships in these ecosystems. >>And I think one of the great things about relationships with these ecosystems is that there's a high degree of overlap. Yeah. So you're seeing that, you know, the way that the cloud business is evolving, the, the ecosystem partners of data bricks are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for >>You. Joel, Jack, I love it because you know what it means the best ecosystem will win. If you keep it open. Sure. You can see everything. If you're gonna do it in the dark, you know, you don't know the outcome. I mean, this is really kind we're talking about. >>And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? Yeah. And so there's a platform decision being made and that ecosystem gonna evolve around the >>Platform. Yeah. And I totally agree. And, and, and the word innovation get kicks around. That's why, you know, when we had our super cloud panel was called the innovators dilemma with a slash through it called the integrated dilemma, innovation is the digital transformation. So absolutely like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating if APIs are the connective tissue, what's automation, what's the service message look like. I mean, a whole nother set of kind of thinking goes on and these new ecosystems and these new products >>And that, and that thinking is, has been born in Delta sharing. Right? So the idea that you can have a multi-cloud implementation of data bricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud >>Solution. Well, data bricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great grow ecosystem. And again, I think in a shining example of what every enterprise is going to do, build on top of something operating model, get that operating model, driving revenue. >>Yeah. >>Well we, whether whether you're Goldman Sachs or capital one or XYZ corporation >>S and P global NASDAQ, right. We've got, you know, these, the biggest verticals in the world are solving tough problems with data breaks. I think we'd be remiss cuz if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions, whether it's the relationship we have with our engineering and service teams. Yeah. Our marketing teams, you know, product development and we're gonna be at reinvent the big presence of reinvent. We're looking forward to seeing you there again. >>Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening this next gen cloud is here. We're seeing this evolve kind of new economics, new value propositions kind of scaling up, producing more so you guys are doing a great job. Thanks for coming on the Cuban, taking time. Chill. Great to see you at the check. Thanks for having us. Thanks. Going. Okay. Cube coverage here. The world's changing as APN comes to give the marketplace for a new partner organization at Amazon web services, the Cube's got a covered. This should be a very big growing ecosystem as this continues, billions of being sold through the marketplace. Of course the buyers are happy as well. So we've got it all covered. I'm John furry, your host of the cube. Thanks for watching.

Published Date : Sep 21 2022

SUMMARY :

Thanks for good to see you again. Yeah, John, great to be here. Obviously it makes it's a no brainer on the micro, you know, You're in the middle of it. you know, unique use cases. So this is speaks volumes to the, the relationship you have years. And when you look at what the APN allows us to do, And so we see customers, you know, doing rapid experimentation pilots, POCs, So you got the big contracts with the private offer. And that's, that is the problem that data bricks is out there to solve, They just couldn't solve before a good example of this, you know, And if you think about what does it take to set that up? So how do you guys look at this? Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering And that really helps customers because if you get data bricks So they're looking at this as saying, you know, multiple ISV spend through that same primary provider, you get pricing And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So the marketplace allows multiple ways to procure your So it doesn't change your business structure. Yeah, So you guys are actually incented to Yeah. It's the right thing to do for our relationship with Amazon, So one of the other things I might add to that too, you know, and why this is advantageous for, I get the infrastructure side, you know, spin up and provision. you know, augment and enhance our platform. you know, I don't wanna be forced to buy something because it's part of a suite and the data. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with When you see other solutions out there that aren't as open as you guys, you guys are very open by the I think the challenge with proprietary ecosystems is you become beholden to the Exactly. I'll say it in the open world, you know, you have to continue to innovate. I call it the chessboard, you know, you got opening game and mid game. And so it has to do with, can you get that data outta silos? And I would say that, you know, the argument for why I think Amazon Price and access to the S and access to the core value. So I wanna get you guys thought on something else. You gotta, you gotta be here. If those consulting SI partners happen to resell the solution as well. And we're seeing, you know, both SI begin to be This gets at the end of the day. I think that the managed service provider business is alive and well, right? I think being the way you guys are open this channel I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around So you have relationships in And so as you build these platforms out into the cloud, you're able to really take advantage you don't know the outcome. And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, That's why, you know, when we had our super cloud panel So the idea that you can have a multi-cloud implementation of data bricks, and actually share data But you guys have done a great job taking that building differentiation into the product. We're looking forward to seeing you there again. Great to see you at the check.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Joel Minick	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Joel	PERSON	0.99+
Ali	PERSON	0.99+
Jack Anderson	PERSON	0.99+
Dave	PERSON	0.99+
$5 million	QUANTITY	0.99+
Jack	PERSON	0.99+
two	QUANTITY	0.99+
Goldman Sachs	ORGANIZATION	0.99+
XYZ	ORGANIZATION	0.99+
Joel Minnick	PERSON	0.99+
Jack Andersen	PERSON	0.99+
Andy jazzy	PERSON	0.99+
third aspect	QUANTITY	0.99+
John fur	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Barney	ORGANIZATION	0.99+
both	QUANTITY	0.99+
five short months	QUANTITY	0.99+
One	QUANTITY	0.99+
APO	ORGANIZATION	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
first 100 million	QUANTITY	0.98+
tomorrow	DATE	0.98+
one	QUANTITY	0.98+
billions	QUANTITY	0.98+
Johnny	PERSON	0.97+
Davis	PERSON	0.97+
a million dollars	QUANTITY	0.96+
Salesforce	ORGANIZATION	0.96+
data bricks	ORGANIZATION	0.95+
each ISV	QUANTITY	0.95+
Seattle, Washington	LOCATION	0.95+
two different ways	QUANTITY	0.95+
one data platform	QUANTITY	0.95+
seven years ago	DATE	0.94+

Andy Thurai, Constellation Research & Larry Carvalho, RobustCloud LLC

(upbeat music) >> Okay, welcome back everyone. CUBE's coverage of re:MARS, here in Las Vegas, in person. I'm John Furrier, host of theCUBE. This is the analyst panel wrap up analysis of the keynote, the show, past one and a half days. We got two great guests here. We got Andy Thurai, Vice President, Principal Consultant, Constellation Research. Larry Carvalho, Principal Consultant at RobustCloud LLC. Congratulations going out on your own. >> Thank you. >> Andy, great to see you. >> Great to see you as well. >> Guys, thanks for coming out. So this is the session where we break down and analyze, you guys are analysts, industry analysts, you go to all the shows, we see each other. You guys are analyzing the landscape. What does this show mean to you guys? 'Cause this is not obvious to the normal tech follower. The insiders see the confluence of robotics, space, automation and machine learning. Obviously, it's IoTs, industrials, it's a bunch of things. But there's some dots to connect. Let's start with you, Larry. What do you see here happening at this show? >> So you got to see how Amazon started, right? When AWS started. When AWS started, it primarily took the compute storage, networking of Amazon.com and put it as a cloud service, as a service, and started selling the heck out of it. This is a stage later now that Amazon.com has done a lot of physical activity, and using AIML and the robotics, et cetera, it's now the second phase of innovation, which is beyond digital transformation of back office processes, to the transformation of physical processes where people are now actually delivering remotely and it's an amazing area. >> So back office's IT data center kind of vibe. >> Yeah. >> You're saying front end, industrial life. >> Yes. >> Life as we know it. >> Right, right. I mean, I just stopped at a booth here and they have something that helps anybody who's stuck in the house who cannot move around. But with Alexa, order some water to bring them wherever they are in the house where they're stuck in their bed. But look at the innovation that's going on there right at the edge. So I think those are... >> John: And you got the Lunar, got the sex appeal of the space, Lunar Outpost interview, >> Yes. >> those guys. They got Rover on Mars. They're going to have be colonizing the moon. >> Yes. >> I made a joke, I'm like, "Well, I left a part back on earth, I'll be right back." (Larry and Andy laugh) >> You can't drive back to the office. So a lot of challenges. Andy, what's your take of the show? Take us your analysis. What's the vibe, what's your analysis so far? >> It's a great show. So, as Larry was saying, one of the thing was that when Amazon started, right? So they were more about cloud computing. So, which means is they try to commoditize more of data center components or compute components. So that was working really well for what I call it as a compute economy, right? >> John: Mm hmm. >> And I call the newer economy as more of a AIML-based data economy. So when you move from a compute economy into a data economy, there are things that come into the forefront that never existed before, never popular before. Things like your AIML model creation, model training, model movement, model influencing, all of the above, right? And then of course the robotics has come long way since then. And then some of what they do at the store, or the charging, the whole nine yards. So, the whole concept of all of these components, when you put them on re:Invent, such a big show, it was getting lost. So that's why they don't have it for a couple of years. They had it one year. And now all of a sudden they woke up and say, "You know what? We got to do this!" >> John: Yeah. >> To bring out this critical components that we have, that's ripe, mature for the world to next component. So that's why- I think they're pretty good stuff. And some of the robotics things I saw in there, like one of them I posted on my Twitter, it's about the robot dog, sniffing out the robot rover, which I thought was pretty hilarious. (All laugh) >> Yeah, this is the thing. You're seeing like the pandemic put everything on hold on the last re:Mars, and then the whole world was upside down. But a lot of stuff pulled forward. You saw the call center stuff booming. You saw the Zoomification of our workplace. And I think a lot of people got to the realization that this hybrid, steady-state's here. And so, okay. That settles that. But the digital transformation of actually physical work? >> Andy: Yeah. >> Location, the walk in and out store right over here we've seen that's the ghost store in Seattle. We've all been there. In fact, I was kind of challenged, try to steal something. I'm like, okay- (Larry laughs) I'm pulling all my best New Jersey moves on everyone. You know? >> Andy: You'll get charged for it. >> I couldn't get away with it. Two double packs, drop it, it's smart as hell. Can't beat the system. But, you bring that to where the AI machine learning, and the robotics meet, robots. I mean, we had robots here on theCUBE. So, I think this robotics piece is a huge IoT, 'cause we've been covering industrial IoT for how many years, guys? And you could know what's going on there. Huge cyber threats. >> Mm hmm. >> Huge challenges, old antiquated OT technology. So I see a confluence in the collision between that OT getting decimated, to your point. And so, do you guys see that? I mean, am I just kind of seeing mirage? >> I don't see it'll get decimated, it'll get replaced with a newer- >> John: Dave would call me out on that. (Larry laughs) >> Decimated- >> Microsoft's going to get killed. >> I think it's going to have to be reworked. And just right now, you want do anything in a shop floor, you have to have a physical wire connected to it. Now you think about 5G coming in, and without a wire, you get minute details, you get low latency, high bandwidth. And the possibilities are endless at the edge. And I think with AWS, they got Outposts, they got Snowcone. >> John: There's a threat to them at the edge. Outpost is not doing well. You talk to anyone out there, it's like, you can't find success stories. >> Larry: Yeah. >> I'm going to get hammered by Amazon people, "Oh, what're you're saying that?" You know, EKS for example, with serverless is kicking ass too. So, I mean I'm not saying Outpost was wrong answer, it was a right at the time, what, four years ago that came out? >> Yeah. >> Okay, so, but that doesn't mean it's just theirs. You got Dell Technologies want some edge action. >> Yeah. >> So does HPE. >> Yes. >> So you got a competitive edge situation. >> I agree with that and I think that's definitely not Amazon's strong point, but like everything, they try to make it easy to use. >> John: Yeah. >> You know, you look at the AIML and they got Canvas. So Canvas says, hey, anybody can do AIML. If they can do that for the physical robotic processes, or even like with Outpost and Snowcone, that'll be good. I don't think they're there yet, and they don't have the presence in the market, >> John: Yeah. >> like HPE and, >> John: Well, let me ask you guys this question, because I think this brings up the next point. Will the best technology win or will the best solution win? Because if cloud's a platform and all software's open source, which you can make those assumptions, you then say, hey, they got this killer robotics thing going on with Artemis and Moonshot, they're trying to colonize the moon, but oh, they discovered a killer way to solve a big problem. Does something fall out of this kind of re:Mars environment, that cracks the code and radically changes and disrupts the IoT game? That's my open question. I don't know the answer. I'd love to get your take on what might be possible, what wild card's out there around, disrupting the edge. >> So one thing I see the way, so when IoT came into the world of play, it's when you're digitizing the physical world, it's IoT that does digitalization part of that actually, right? >> But then it has its own set of problems. >> John: Yeah. >> You're talking about you installing sensor everywhere, right? And not only installing your own sensor, but also you're installing competitor sensors. So in a given square feet how many sensors can you accommodate? So there are physical limitations on liabilities of bandwidth and networking all of that. >> John: And integration. >> As well. >> John: Your point. >> Right? So when that became an issue, this is where I was talking to the robotic guys here, a couple of companies, and one of the use cases they were talking about, which I thought was pretty cool, is, rather than going the sensor route, you go the robot route. So if you have either a factor that you want to map out, you put as many sensors on your robot, whatever that is, and then you make it go around, map the whole thing, and then you also do a surveillance in the whole nine yards. So, you can either have a fixed sensors or you can have moving sensors. So you can have three or four robots. So initially, when I was asking them about the price of it, when they were saying about a hundred thousand dollars, I was like, "Who would buy that?" (John and Larry laugh) >> When they then explained that, this is the use case, oh, that makes sense, because if you had to install, entire factory floor sensors, you're talking about millions of dollars. >> John: Yeah. >> But if you do the moveable sensors in this way, it's a lot cheaper. >> John: Yeah, yeah. >> So it's based on your use case, what are your use cases? What are you trying to achieve? >> The general purpose is over. >> Yeah. >> Which you're getting at, and that the enablement, this is again, this is the cloud scale open question- >> Yep. >> it's, okay, the differentiations isn't going to be open source software. That's open. >> It's going to be in the, how you configure it. >> Yes. >> What workflows you might have, the data streams. >> I think, John, you're bringing up a very good point about general purpose versus special purpose. Yesterday Zoox was on the stage and when they talked about their vehicle, it's made just for self-driving. You walk around in Vegas, over here, you see a bunch of old fashioned cars, whether they're Ford or GM- >> and they put all these devices around it, but you're still driving the same car. >> John: Yeah, exactly. >> You can retrofit those, but I don't think that kind of IoT is going to work. But if you redo the whole thing, we are going to see a significant change in how IoT delivers value all the way from the industrial to home, to healthcare, mining, agriculture, it's going to have to redo. I'll go back to the OT question. There are some OT guys, I know Rockwell and Siemens, some of them are innovating faster. The ones who innovate faster to keep up with the IT side, as well as the MLAI model are going to be the winners on that one. >> John: Yeah, I agree. Andy, your thoughts on manufacturing, you brought up the sensor thing. Robotics ultimately is, end of the day, an opportunity there. Obviously machine learning, we know what that does. As we move into these more autonomous builds, what does that look like? And is Amazon positioned well there? Obviously they have big manufacturers. Some are saying that they might want to get out of that business too, that Jassy's evaluating that some are saying. So, where does this all lead for that robotics manufacturing lifestyle, walk in, grab my food? 'Cause it's all robotics and AI at the end of the day, I got sensors, I got cameras, I got non-humans moving heavy lifting stuff, fixing the moon will be done by robots, not humans. So it's all coming. What's your analysis? >> Well, so, the point about robotics is on how far it has come, it is unbelievable, right? Couple of examples. One was that I was just talking to somebody, was explaining to them, to see that robot dog over there at the Boston Dynamics one- >> John: Yeah. >> climbing up and down the stairs. >> Larry: Yeah. >> That's more like the dinosaur movie opening the doors scene. (John and Larry laugh) It's like that for me, because the coordinated things, it is able to go walk up and down, that's unbelievable. But okay, it does that, and then there was also another video which is going on viral on the internet. This guy kicks the dog, robot dog, and then it falls down and it gets back up, and the sentiment that people were feeling for the dog, (Larry laughs) >> you can't, it's a robot, but people, it just comes at that level- >> John: Empathy, for a non-human. >> Yeah. >> But you see him, hey you, get off my lawn, you know? It's like, where are we? >> It has come to that level that people are able to kind of not look at that as a robot, but as more like a functioning, almost like a pet-level, human-level being. >> John: Yeah. >> And you saw that the human-like walking robot there as well. But to an extent, in my view, they are all still in an experimentation, innovation phase. It doesn't made it in the industrial terms yet. >> John: Yeah, not yet, it's coming. >> But, the problem- >> John: It's coming fast. That's what I'm trying to figure out is where you guys see Amazon and the industry relative to what from the fantasy coming reality- >> Right. >> of space in Mars, which is, it's intoxicating, let's face it. People love this. The nerds are all here. The geeks are all here. It's a celebration. James Hamilton's here- >> Yep. >> trying to get him on theCUBE. And he's here as a civilian. Jeff Barr, same thing. I'm here, not for Amazon, I bought a ticket. No, you didn't buy a ticket. (Larry laughs) >> I'm going to check on that. But, he's geeking out. >> Yeah. >> They're there because they want to be here. >> Yeah. >> Not because they have to work here. >> Well, I mean, the thing is, the innovation velocity has increased, because, in the past, remember, the smaller companies couldn't innovate because they don't have the platform. Now Compute is a platform available at the scale you want, AI is available at the scale. Every one of them is available at the scale you want. So if you have an idea, it's easy to innovate. The innovation velocity is high. But where I see most of the companies failing, whether startup or big company, is that you don't find the appropriate use case to solve, and then don't sell it to the right people to buy that. So if you don't find the right use case or don't sell the right value proposition to the actual buyer, >> John: Mm hmm. >> then why are you here? What are you doing? (John laughs) I mean, you're not just an invention, >> John: Eh, yeah. >> like a telephone kind of thing. >> Now, let's get into next talk track. I want to get your thoughts on the experience here at re:Mars. Obviously AWS and the Amazon people kind of combined effort between their teams. The event team does a great job. I thought the event, personally, was first class. The coffee didn't come in late today, I was complaining about that, (Larry laughs) >> people complaining out there, at CUBE reviews. But world class, high bar on the quality of the event. But you guys were involved in the analyst program. You've been through the walkthrough, some of the briefings. I couldn't do that 'cause I'm doing theCUBE interviews. What would you guys learn? What were some of the key walkaways, impressions? Amazon's putting all new teams together, seems on the analyst relations. >> Larry: Yeah. >> They got their mojo booming. They got three shows now, re:Mars, re:inforce, re:invent. >> Andy: Yeah. >> Which will be at theCUBE at all three. Now we got that coverage going, what's it like? What was the experience like? Did you feel it was good? Where do they need to improve? How would you grade the Amazon team? >> I think they did a great job over here in just bringing all the physical elements of the show. Even on the stage, where they had robots in there. It made it real and it's not just fake stuff. And every, or most of the booths out there are actually having- >> John: High quality demos. >> high quality demos. (John laughs) >> John: Not vaporware. >> Yeah, exactly. Not vaporware. >> John: I won't say the name of the company. (all laugh) >> And even the sessions were very good. They went through details. One thing that stood out, which is good, and I cover Low Code/No Code, and Low Code/No Code goes across everything. You know, you got DevOps No Low-Code Low-Code. You got AI Low Code/No Code. You got application development Low Code/No Code. What they have done with AI with Low Code/No Code is very powerful with Canvas. And I think that has really grown the adoption of AI. Because you don't have to go and train people what to do. And then, people are just saying, Hey, let me kick the tires, let me use it. Let me try it. >> John: It's going to be very interesting to see how Amazon, on that point, handles this, AWS handles this data tsunami. It's cause of Snowflake. Snowflake especially running the table >> Larry: Yeah. >> on the old Hadoop world. I think Dave had a great analysis with other colleagues last week at Snowflake Summit. But still, just scratching the surface. >> Larry: Yeah. >> The question is, how shared that ecosystem, how will that morph? 'Cause right now you've got Data Bricks, you've got Snowflake and a handful of others. Teradata's got some new chops going on there and a bunch of other folks. Some are going to win and lose in this downturn, but still, the scale that's needed is massive. >> So you got data growing so much, you were talking earlier about the growth of data and they were talking about the growth. That is a big pie and the pie can be shared by a lot of folks. I don't think- >> John: And snowflake pays AWS, remember that? >> Right, I get it. (John laughs) >> I get it. But they got very unique capabilities, just like Netflix has very unique capabilities. >> John: Yeah. >> They also pay AWS. >> John: Yeah. >> Right? But they're competing on prime. So I really think the cooperation is going to be there. >> John: Yeah. >> The pie is so big >> John: Yeah. >> that there's not going to be losers, but everybody could be winners. >> John: I'd be interested to follow up with you guys after next time we have an event together, we'll get you back on and figure out how do you measure this transitions? You went to IDC, so they had all kinds of ways to measure shipments. >> Larry: Yep. >> Even Gartner had fumbled for years, the Magic Quadrant on IaaS and PaaS when they had the market share. (Larry laughs) And then they finally bundled PaaS and IaaS together after years of my suggesting, thank you very much Gartner. (Larry laughs) But that just performs as the landscape changes so does the scoreboard. >> Yep. >> Right so, how do you measure who's winning and who's losing? How can we be critical of Amazon so they can get better? I mean, Andy Jassy always said to me, and Adam Salassi same way, we want to hear how bad we're doing so we can get better. >> Yeah. >> So they're open-minded to feedback. I mean, not (beep) posting on them, but they're open to critical feedback. What do you guys, what feedback would you give Amazon? Are they winning? I see them number one clearly over Azure, by miles. And even though Azure's kicking ass and taking names, getting back in the game, Microsoft's still behind, by a long ways, in some areas. >> Andy: Yes. In some ways. >> So, the scoreboard's changing. What's your thoughts on that? >> So, look, I mean, at the end of the day, when it comes to compute, right, Amazon is a clear winner. I mean, there are others who are catching up to it, but still, they are the established leader. And it comes with its own advantages because when you're trying to do innovation, when you're trying to do anything else, whether it's a data collection, we were talking about the data sensors, the amount of data they are collecting, whether it's the store, that self-serving store or other innovation projects, what they have going on. The storage compute and process of that requires a ton of compute. And they have that advantage with them. And, as I mentioned in my last article, one of my articles, when it comes to AIML and data programs, there is a rich and there is a poor. And the rich always gets richer because they, they have one leg up already. >> John: Yeah. >> I mean the amount of model training they have done, the billion or trillion dollar trillion parametrization, fine tuning of the model training and everything. They could do it faster. >> John: Yeah. >> Which means they have a leg up to begin with. So unless you are given an opportunity as a smaller, mid-size company to compete at them at the same level, you're going to start at the negative level to begin with. You have a lot of catch up to do. So, the other thing about Amazon is that they, when it comes to a lot of areas, they admit that they have to improve in certain areas and they're open and willing and listen to the people. >> Where are you, let's get critical. Let's do some critical analysis. Where does Amazon Websters need to get better? In your opinion, what criticism would you, in a constructive way, share? >> I think on the open source side, they need to be more proactive in, they are already, but they got to get even better than what they are. They got to engage with the community. They got to be able to talk on the open source side, hey, what are we doing? Maybe on the hardware side, can they do some open-sourcing of that? They got graviton. They got a lot of stuff. Will they be able to share the wealth with other folks, other than just being on an Amazon site, on the edge with their partners. >> John: Got it. >> If they can now take that, like you said, compute with what they have with a very end-to-end solution, the full stack. And if they can extend it, that's going to be really beneficial for them. >> Awesome. Andy, final word here. >> So one area where I think they could improve, which would be a game changer would be, right now, if you look at all of their solutions, if you look at the way they suggest implementation, the innovations, everything that comes out, comes out across very techy-oriented. The persona is very techy-oriented. Very rarely their solutions are built to the business audience or to the decision makers. So if I'm, say, an analyst, if I want to build, a business analyst rather, if I want to build a model, and then I want to deploy that or do some sort of application, mobile application, or what have you, it's a little bit hard. It's more techy-oriented. >> John: Yeah, yeah. >> So, if they could appeal or build a higher level abstraction of how to build and deploy applications for business users, or even build something industry specific, that's where a lot of the legacy companies succeeded. >> John: Yeah. >> Go after manufacturing specific or education. >> Well, we coined the term 'Supercloud' last re:Invent, and that's what we see. And Jerry Chen at Greylock calls it Castles in the Cloud, you can create these moats >> Yep. >> on top of the CapEx >> Yep. >> of Amazon. >> Exactly. >> And ride their back. >> Yep. >> And the difference in what you're paying and what you're charging, if you're good, like a Snowflake or a Mongo. I mean, Mongo's, they're just as big as Snow, if not bigger on Amazon than Snowflake is. 'Cause they use a lot of compute. No one turns off their database. (John laughs) >> Snowflake a little bit different, a little nuanced point, but, this is the new thing. You see Goldman Sachs, you got Capital One. They're building their own kind of, I call them sub clouds, but Dave Vellante says it's a Supercloud. And that essentially is the model. And then once you have a Supercloud, you say, great, I'm going to make sure it works on Azure and Google. >> Andy: Yep. >> And Alibaba if I have to. So, we're kind of seeing a playbook. >> Andy: Mm hmm. >> But you can't get it wrong 'cause it scales. >> Larry: Yeah, yeah. >> You can't scale the wrong answer. >> Andy: Yeah. >> So that seems to be what I'm watching is, who gets it right? Product market fit. Then if they roll it out to the cloud, then it becomes a Supercloud, and that's pure product market fit. So I think that's something that I've seen some people trying to figure out. And then, are you a supplier to the Superclouds? Like a Dell? Or you become an enabler? >> Andy: Yeah. >> You know, what's Dell Technologies do? >> Larry: Yeah. >> I mean, how do the box movers compete? >> Larry: I, the whole thing is now hybrid and you're going to have to see just, you said. (Larry laughs) >> John: Hybrid's a steady-state. I don't need to. >> Andy: I mean, >> By the way we're (indistinct), we can't get the chips, cause Broadcom and Apple bought 'em all. (Larry laughs) I mean there's a huge chip problem going on. >> Yes. I agree. >> Right now. >> I agree. >> I mean all these problems when you attract to a much higher level, a lot of those problems go away because you don't care about what they're using underlying as long as you deliver my solution. >> Larry: Yes. >> Yeah, it could be significantly, a little bit faster than what it used to be. But at the end of the day, are you solving my specific use case? >> John: Yeah. >> Then I'm willing to wait a little bit longer. >> John: Yeah. Time's on our side and now they're getting the right answers. Larry, Andy, thanks for coming on. This great analyst session turned into more of a podcast vibe, but you know what? (Larry laughs) To chill here at re:Mars, thanks for coming on, and we unpacked a lot. Thanks for sharing. >> Both: Thank you. >> Appreciate it. We'll get you back on. We'll get you in the rotation. We'll take it virtual. Do a panel. Do a panel, do some panels around this. >> Larry: Absolutely. >> Andy: Oh this not virtual, this physical. >> No we're live right now! (all laugh) We get back to Palo Alto. You guys are influencers. Thanks for coming on. You guys are moving the market, congratulations. Take a minute, quick minute each to plug any work you're doing for the people watching. Larry, what are you working on? Andy? You go after Larry, what you're working on. >> Yeah. So since I started my company, RobustCloud, since I left IDC about a year ago, I'm focused on edge computing, cloud-native technologies, and Low Code/No Code. And basically I help companies put their business value together. >> All right, Andy, what are you working on? >> I do a lot of work on the AIML areas. Particularly, last few of my reports are in the AI Ops incident management and ML Ops areas of how to generally improve your operations. >> John: Got it, yeah. >> In other words, how do you use the AIML to improve your IT operations? How do you use IT Ops to improve your AIML efficiency? So those are the- >> John: The real hardcore business transformation. >> Yep. >> All right. Guys, thanks so much for coming on the analyst session. We do keynote review, breaking down re:Mars after day two. We got a full day tomorrow. I'm John Furrier with theCUBE. See you next time. (pleasant music)

Published Date : Jun 24 2022

SUMMARY :

This is the analyst panel wrap What does this show mean to you guys? and started selling the heck out of it. data center kind of vibe. You're saying front But look at the innovation be colonizing the moon. (Larry and Andy laugh) What's the vibe, what's one of the thing was that And I call the newer economy as more And some of the robotics You saw the call center stuff booming. Location, the walk in and and the robotics meet, robots. So I see a confluence in the collision John: Dave would call me out on that. And the possibilities You talk to anyone out there, it's like, I'm going to get hammered You got Dell Technologies So you got a I agree with that You know, you look at the I don't know the answer. But then it has its how many sensors can you accommodate? and one of the use cases if you had to install, But if you do the it's, okay, the differentiations It's going to be in have, the data streams. you see a bunch of old fashioned cars, and they put all from the industrial to AI at the end of the day, Well, so, the point about robotics is and the sentiment that people that people are able to And you saw that the and the industry relative to of space in Mars, which is, No, you didn't buy a ticket. I'm going to check on that. they want to be here. at the scale you want. Obviously AWS and the Amazon on the quality of the event. They got their mojo booming. Where do they need to improve? And every, or most of the booths out there (John laughs) Yeah, exactly. the name of the company. And even the sessions were very good. John: It's going to be very But still, just scratching the surface. but still, the scale That is a big pie and the (John laughs) But they got very unique capabilities, cooperation is going to be there. that there's not going to be losers, John: I'd be interested to follow up as the landscape changes I mean, Andy Jassy always said to me, getting back in the game, So, the scoreboard's changing. the amount of data they are collecting, I mean the amount of model So, the other thing about need to get better? on the edge with their partners. end-to-end solution, the full stack. Andy, final word here. if you look at the way they of how to build and deploy Go after manufacturing calls it Castles in the Cloud, And the difference And that essentially is the model. And Alibaba if I have to. But you can't get it So that seems to be to see just, you said. John: Hybrid's a steady-state. By the way we're (indistinct), problems when you attract But at the end of the day, Then I'm willing to vibe, but you know what? We'll get you in the rotation. Andy: Oh this not You guys are moving the and Low Code/No Code. the AI Ops incident John: The real hardcore coming on the analyst session.

ENTITIES

Entity	Category	Confidence
Larry	PERSON	0.99+
Andy Thurai	PERSON	0.99+
Jeff Barr	PERSON	0.99+
John	PERSON	0.99+
Dave	PERSON	0.99+
Larry Carvalho	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Andy Thurai	PERSON	0.99+
Adam Salassi	PERSON	0.99+
Ford	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
James Hamilton	PERSON	0.99+
Boston Dynamics	ORGANIZATION	0.99+
Jerry Chen	PERSON	0.99+
GM	ORGANIZATION	0.99+
Rockwell	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Seattle	LOCATION	0.99+
Apple	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Vegas	LOCATION	0.99+
Dell	ORGANIZATION	0.99+

Howard Levenson

>>AWS public sector summit here in person in Washington, D. C. For two days live. Finally a real event. I'm john for your host of the cube. Got a great guest Howard Levinson from data bricks, regional vice president and general manager of the federal team for data bricks. Uh Super unicorn. Is it a decade corn yet? It's uh, not yet public but welcome to the cube. >>I don't know what the next stage after unicorn is, but we're growing rapidly. >>Thank you. Our audience knows David bricks extremely well. Always been on the cube many times. Even back, we were covering them back when big data was big data. Now it's all data everything. So we watched your success. Congratulations. Thank you. Um, so there's no, you know, not a big bridge for us across to see you here at AWS public sector summit. Tell us what's going on inside the data bricks amazon relationship. >>Yeah. It's been a great relationship. You know, when the company got started some number of years ago we got a contract with the government to deliver the data brooks capability and they're classified cloud in amazon's classified cloud. So that was the start of a great federal relationship today. Virtually all of our businesses in AWS and we run in every single AWS environment from commercial cloud to Govcloud to secret top secret environments and we've got customers doing great things and experiencing great results from data bricks and amazon. >>The federal government's the classic, I call migration opportunity. Right? Because I mean, let's face it before the pandemic even five years ago, even 10 years ago. Glacier moving speed slow, slow and they had to get modernized with the pandemic forced really to do it. But you guys have already cleared the runway with your value problems. You've got lake house now you guys are really optimized for the cloud. >>Okay, hardcore. Yeah. We are, we only run in the cloud and we take advantage of every single go fast feature that amazon gives us. But you know john it's The Office of Management and Budget. Did a study a couple of years ago. I think there were 28,000 federal data centers, 28,000 federal data centers. Think about that for a minute and just think about like let's say in each one of those data centers you've got a handful of operational data stores of databases. The federal government is trying to take all of that data and make sense out of it. The first step to making sense out of it is bringing it all together, normalizing it. Fed aerating it and that's exactly what we do. And that's been a real win for our federal clients and it's been a real exciting opportunity to watch people succeed in that >>endeavour. We have another guest on. And she said those data center huggers tree huggers data center huggers, majority of term people won't let go. Yeah. So but they're slowly dying away and moving on to the cloud. So migrations huge. How are you guys migrating with your customers? Give us an example of how it's working. What are some of the use cases? >>So before I do that I want to tell you a quick story. I've I had the luxury of working with the Air Force Chief data officer Ailene vedrine and she is commonly quoted as saying just remember as as airmen it's not your data it's the Air Force's data. So people were data center huggers now their data huggers but all of that data belongs to the government at the end of the day. So how do we help in that? Well think about all this data sitting in all these operational data stores they're getting it's getting updated all the time. But you want to be able to Federated this data together and make some sense out of it. So for like an organization like uh us citizenship and immigration services they had I think 28 different data sources and they want to be able to pull that data basically in real time and bring it into a data lake. Well that means doing a change data capture off of those operational data stores transforming that data and normalizing it so that you can then enjoy it. And we've done that I think they're now up to 70 data sources that are continually ingested into their data lake. And from there they support thousands of users doing analysis and reports for the whole visa processing system for the United States, the whole naturalization environment And their efficiency has gone up I think by their metrics by 24 x. >>Yeah. I mean Sandy carter was just on the cube earlier. She's the Vice president partner ecosystem here at public sector. And I was coming to her that federal game has changed, it used to be hard to get into you know everybody and you navigate the trip wires and all the subtle hints and and the people who are friends and it was like cloak and dagger and so people were locked in on certain things databases and data because now has to be freely available. I know one of the things that you guys are passionate about and this is kind of hard core architectural thing is that you need horizontally scalable data to really make a I work right. Machine learning works when you have data. How far along are these guys in their thinking when you have a customer because we're seeing progress? How far along are we? >>Yeah, we still have a long way to go in the federal government. I mean, I tell everybody, I think the federal government's probably four or five years behind what data bricks top uh clients are doing. But there are clearly people in the federal government that have really ramped it up and are on a par were even exceeding some of the commercial clients, U. S. C. I. S CBP FBI or some of the clients that we work with that are pretty far ahead and I'll say I mentioned a lot about the operational data stores but there's all kinds of data that's coming in at U S. C. I. S. They do these naturalization interviews, those are captured in real text. So now you want to do natural language processing against them, make sure these interviews are of the highest quality control, We want to be able to predict which people are going to show up for interviews based on their geospatial location and the day of the week and other factors the weather perhaps. So they're using all of these data types uh imagery text and structure data all in the Lake House concept to make predictions about how they should run their >>business. So that's a really good point. I was talking with keith brooks earlier directive is development, go to market strategy for AWS public sector. He's been there from the beginning this the 10th year of Govcloud. Right, so we're kind of riffing but the jpl Nasa Jpl, they did production workloads out of the gate. Yeah. Full mission. So now fast forward today. Cloud Native really is available. So like how do you see the the agencies in the government handling Okay. Re platform and I get that but now to do the reef acting where you guys have the Lake House new things can happen with cloud Native technologies, what's the what's the what's the cross over point for that point. >>Yeah, I think our Lake House architecture is really a big breakthrough architecture. It used to be, people would take all of this data, they put it in a Hadoop data lake, they'd end up with a data swamp with really not good control or good data quality. And uh then they would take the data from the data swamp where the data lake and they curate it and go through an E. T. L. Process and put a second copy into their data warehouse. So now you have two copies of the data to governance models. Maybe two versions of the data. A lot to manage. A lot to control with our Lake House architecture. You can put all of that data in the data lake it with our delta format. It comes in a curated way. Uh there's a catalogue associated with the data. So you know what you've got. And now you can literally build an ephemeral data warehouse directly on top of that data and it exists only for the period of time that uh people need it. And so it's cloud Native. It's elastically scalable. It terminates when nobody's using it. We run the whole center for Medicaid Medicare services. The whole Medicaid repository for the United States runs in an ephemeral data warehouse built on Amazon S three. >>You know, that is a huge call out, I want to just unpack that for a second. What you just said to me puts the exclamation point on cloud value because it's not your grandfather's data warehouse, it's like okay we do data warehouse capability but we're using higher level cloud services, whether it's governance stuff for a I to actually make it work at scale for those environments. I mean that that to me is re factoring that's not re platform Ng. Just re platform that's re platform Ng in the cloud and then re factoring capability for on uh new >>advantages. It's really true. And now you know at CMS, they have one copy of the data so they do all of their reporting, they've got a lot of congressional reports that they need to do. But now they're leveraging that same data, not making a copy of it for uh the center for program integrity for fraud. And we know how many billions of dollars worth of fraud exist in the Medicaid system. And now we're applying artificial intelligence and machine learning on entity analytics to really get to the root of those problems. It's a game >>changer. And this is where the efficiency comes in at scale. Because you start to see, I mean we always talk on the cube about like how software is changed the old days you put on the shelf shelf where they called it. Uh that's our generation. And now you got the cloud, you didn't know if something is hot or not until the inventory is like we didn't sell through in the cloud. If you're not performing, you suck basically. So it's not working, >>it's an instant Mhm. >>Report card. So now when you go to the cloud, you think the data lake and uh the lake house what you guys do uh and others like snowflake and were optimized in the cloud, you can't deny it. And then when you compare it to like, okay, so I'm saving you millions and millions if you're just on one thing, never mind the top line opportunities. >>So so john you know, years ago people didn't believe the cloud was going to be what it is. Like pretty much today, the clouds inevitable. It's everywhere. I'm gonna make you another prediction. Um And you can say you heard it here first, the data warehouse is going away. The Lake house is clearly going to replace it. There's no need anymore for two separate copies, there's no need for a proprietary uh storage copy of your data and people want to be able to apply more than sequel to the data. Uh Data warehouses, just restrict. What about an ocean house? >>Yeah. Lake is kind of small. When you think about this lake, Michigan is pretty big now, I think it's I >>think it's going to go bigger than that. I think we're talking about Sky Computer, we've been a cloud computing, we're going to uh and we're going to do that because people aren't gonna put all of their data in one place, they're going to have, it spread across different amazon regions or or or amazon availability zones and you're going to want to share data and you know, we just introduced this delta sharing capability. I don't know if you're familiar with it but it allows you to share data without a sharing server directly from picking up basically the amazon, you RLS and sharing them with different organizations. So you're sharing in place. The data actually isn't moving. You've got great governance and great granularity of the data that you choose to share and data sharing is going to be the next uh >>next break. You know, I really loved the Lake House were fairly sing gateway. I totally see that. So I totally would align with that and say I bet with you on that one. The Sky net Skynet, the Sky computing. >>See you're taking it away man, >>I know Skynet got anything that was computing in the Sky is Skynet that's terminated So but that's real. I mean I think that's a concept where it's like, you know what services and functions does for servers, you don't have a data, >>you've got to be able to connect data, nobody lives in an island. You've got to be able to connect data and more data. We all know more data produces better results. So how do you get more data? You connect to more data sources, >>Howard great to have you on talk about the relationship real quick as we end up here with amazon, What are you guys doing together? How's the partnership? >>Yeah, I mean the partnership with amazon is amazing. We have, we work uh, I think probably 95% of our federal business is running in amazon's cloud today. As I mentioned, john we run across uh, AWS commercial AWS GovCloud secret environment. See to us and you know, we have better integration with amazon services than I'll say some of the amazon services if people want to integrate with glue or kinesis or Sagemaker, a red shift, we have complete integration with all of those and that's really, it's not just a partnership at the sales level. It's a partnership and integration at the engineering level. >>Well, I think I'm really impressed with you guys as a company. I think you're an example of the kind of business model that people might have been afraid of which is being in the cloud, you can have a moat, you have competitive advantage, you can build intellectual property >>and, and john don't forget, it's all based on open source, open data, like almost everything that we've done. We've made available to people, we get 30 million downloads of the data bricks technology just for people that want to use it for free. So no vendor lock in. I think that's really important to most of our federal clients into everybody. >>I've always said competitive advantage scale and choice. Right. That's a data bricks. Howard? Thanks for coming on the key, appreciate it. Thanks again. Alright. Cube coverage here in Washington from face to face physical event were on the ground. Of course, we're also streaming a digital for the hybrid event. This is the cubes coverage of a W. S. Public sector Summit will be right back after this short break.

Published Date : Sep 28 2021

SUMMARY :

to the cube. Um, so there's no, you know, So that was the start of a great federal relationship But you guys have already cleared the runway with your value problems. But you know john it's The How are you guys migrating with your customers? So before I do that I want to tell you a quick story. I know one of the things that you guys are passionate So now you want to do natural language processing against them, make sure these interviews are of the highest quality So like how do you see the So now you have two copies of the data to governance models. I mean that that to me is re factoring that's not re platform And now you know at CMS, they have one copy of the data talk on the cube about like how software is changed the old days you put on the shelf shelf where they called So now when you go to the cloud, you think the data lake and uh the lake So so john you know, years ago people didn't believe the cloud When you think about this lake, Michigan is pretty big now, I think it's I of the data that you choose to share and data sharing is going to be the next uh So I totally would align with that and say I bet with you on that one. I mean I think that's a concept where it's like, you know what services So how do you get more See to us and you know, we have better integration with amazon services Well, I think I'm really impressed with you guys as a company. I think that's really important to most of our federal clients into everybody. Thanks for coming on the key, appreciate it.

ENTITIES

Entity	Category	Confidence
amazon	ORGANIZATION	0.99+
Howard Levinson	PERSON	0.99+
Washington	LOCATION	0.99+
Skynet	ORGANIZATION	0.99+
Howard	PERSON	0.99+
AWS	ORGANIZATION	0.99+
two copies	QUANTITY	0.99+
Washington, D. C.	LOCATION	0.99+
two days	QUANTITY	0.99+
30 million	QUANTITY	0.99+
two versions	QUANTITY	0.99+
keith brooks	PERSON	0.99+
95%	QUANTITY	0.99+
two separate copies	QUANTITY	0.99+
Howard Levenson	PERSON	0.99+
millions	QUANTITY	0.99+
Ailene vedrine	PERSON	0.99+
one copy	QUANTITY	0.99+
four	QUANTITY	0.99+
Sky	ORGANIZATION	0.99+
10 years ago	DATE	0.99+
five years	QUANTITY	0.99+
first step	QUANTITY	0.99+
28 different data sources	QUANTITY	0.99+
Michigan	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Sky Computer	ORGANIZATION	0.98+
United States	LOCATION	0.98+
28,000 federal data centers	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
28,000 federal data centers	QUANTITY	0.98+
five years ago	DATE	0.98+
second copy	QUANTITY	0.98+
thousands of users	QUANTITY	0.98+
pandemic	EVENT	0.98+
AWS	EVENT	0.97+
today	DATE	0.97+
10th year	QUANTITY	0.97+
W. S. Public sector Summit	EVENT	0.97+
Lake House	LOCATION	0.97+
john	PERSON	0.96+
Air Force	ORGANIZATION	0.96+
one	QUANTITY	0.96+
Nasa	ORGANIZATION	0.96+
Sky net	ORGANIZATION	0.96+
each one	QUANTITY	0.96+
Medicaid Medicare	ORGANIZATION	0.95+
one thing	QUANTITY	0.94+
24	QUANTITY	0.94+
data bricks	ORGANIZATION	0.94+
U S. C. I. S.	LOCATION	0.92+
up to 70 data sources	QUANTITY	0.91+
Chief data officer	PERSON	0.9+
first	QUANTITY	0.89+
Govcloud	ORGANIZATION	0.88+
Cloud Native	TITLE	0.88+
one place	QUANTITY	0.87+
GovCloud	TITLE	0.87+
couple of years ago	DATE	0.86+
Office of Management and Budget	ORGANIZATION	0.85+
Sandy carter	PERSON	0.84+
years ago	DATE	0.83+
AWS public sector summit	EVENT	0.83+
U. S. C. I. S	ORGANIZATION	0.81+
Medicaid	ORGANIZATION	0.79+
a minute	QUANTITY	0.77+
number of years ago	DATE	0.77+
a second	QUANTITY	0.75+
center huggers	ORGANIZATION	0.72+
Ng	TITLE	0.71+

The New Data Equation: Leveraging Cloud-Scale Data to Innovate in AI, CyberSecurity, & Life Sciences

>> Hi, I'm Natalie Ehrlich and welcome to the AWS startup showcase presented by The Cube. We have an amazing lineup of great guests who will share their insights on the latest innovations and solutions and leveraging cloud scale data in AI, security and life sciences. And now we're joined by the co-founders and co-CEOs of The Cube, Dave Vellante and John Furrier. Thank you gentlemen for joining me. >> Hey Natalie. >> Hey Natalie. >> How are you doing. Hey John. >> Well, I'd love to get your insights here, let's kick it off and what are you looking forward to. >> Dave, I think one of the things that we've been doing on the cube for 11 years is looking at the signal in the marketplace. I wanted to focus on this because AI is cutting across all industries. So we're seeing that with cybersecurity and life sciences, it's the first time we've had a life sciences track in the showcase, which is amazing because it shows that growth of the cloud scale. So I'm super excited by that. And I think that's going to showcase some new business models and of course the keynotes Ali Ghodsi, who's the CEO Data bricks pushing a billion dollars in revenue, clear validation that startups can go from zero to a billion dollars in revenues. So that should be really interesting. And of course the top venture capitalists coming in to talk about what the enterprise dynamics are all about. And what about you, Dave? >> You know, I thought it was an interesting mix and choice of startups. When you think about, you know, AI security and healthcare, and I've been thinking about that. Healthcare is the perfect industry, it is ripe for disruption. If you think about healthcare, you know, we all complain how expensive it is not transparent. There's a lot of discussion about, you know, can everybody have equal access that certainly with COVID the staff is burned out. There's a real divergence and diversity of the quality of healthcare and you know, it all results in patients not being happy, and I mean, if you had to do an NPS score on the patients and healthcare will be pretty low, John, you know. So when I think about, you know, AI and security in the context of healthcare in cloud, I ask questions like when are machines going to be able to better meet or make better diagnoses than doctors? And that's starting. I mean, it's really in assistance putting into play today. But I think when you think about cheaper and more accurate image analysis, when you think about the overall patient experience and trust and personalized medicine, self-service, you know, remote medicine that we've seen during the COVID pandemic, disease tracking, language translation, I mean, there are so many things where the cloud and data, and then it can help. And then at the end of it, it's all about, okay, how do I authenticate? How do I deal with privacy and personal information and tamper resistance? And that's where the security play comes in. So it's a very interesting mix of startups. I think that I'm really looking forward to hearing from... >> You know Natalie one of the things we talked about, some of these companies, Dave, we've talked a lot of these companies and to me the business model innovations that are coming out of two factors, the pandemic is kind of coming to an end so that accelerated and really showed who had the right stuff in my opinion. So you were either on the wrong side or right side of history when it comes to the pandemic and as we look back, as we come out of it with clear growth in certain companies and certain companies that adopted let's say cloud. And the other one is cloud scale. So the focus of these startup showcases is really to focus on how startups can align with the enterprise buyers and create the new kind of refactoring business models to go from, you know, a re-pivot or refactoring to more value. And the other thing that's interesting is that the business model isn't just for the good guys. If you look at say ransomware, for instance, the business model of hackers is gone completely amazing too. They're kicking it but in terms of revenue, they have their own they're well-funded machines on how to extort cash from companies. So there's a lot of security issues around the business model as well. So to me, the business model innovation with cloud-scale tech, with the pandemic forcing function, you've seen a lot of new kinds of decision-making in enterprises. You seeing how enterprise buyers are changing their decision criteria, and frankly their existing suppliers. So if you're an old guard supplier, you're going to be potentially out because if you didn't deliver during the pandemic, this is the issue that everyone's talking about. And it's kind of not publicized in the press very much, but this is actually happening. >> Well thank you both very much for joining me to kick off our AWS startup showcase. Now we're going to go to our very special guest Ali Ghodsi and John Furrier will seat with him for a fireside chat and Dave and I will see you on the other side. >> Okay, Ali great to see you. Thanks for coming on our AWS startup showcase, our second edition, second batch, season two, whatever we want to call it it's our second version of this new series where we feature, you know, the hottest startups coming out of the AWS ecosystem. And you're one of them, I've been there, but you're not a startup anymore, you're here pushing serious success on the revenue side and company. Congratulations and great to see you. >> Likewise. Thank you so much, good to see you again. >> You know I remember the first time we chatted on The Cube, you weren't really doing much software revenue, you were really talking about the new revolution in data. And you were all in on cloud. And I will say that from day one, you were always adamant that it was cloud cloud scale before anyone was really talking about it. And at that time it was on premises with Hadoop and those kinds of things. You saw that early. I remember that conversation, boy, that bet paid out great. So congratulations. >> Thank you so much. >> So I've got to ask you to jump right in. Enterprises are making decisions differently now and you are an example of that company that has gone from literally zero software sales to pushing a billion dollars as it's being reported. Certainly the success of Data bricks has been written about, but what's not written about is the success of how you guys align with the changing criteria for the enterprise customer. Take us through that and these companies here are aligning the same thing and enterprises want to change. They want to be in the right side of history. What's the success formula? >> Yeah. I mean, basically what we always did was look a few years out, the how can we help these enterprises, future proof, what they're trying to achieve, right? They have, you know, 30 years of legacy software and, you know baggage, and they have compliance and regulations, how do we help them move to the future? So we try to identify those kinds of secular trends that we think are going to maybe you see them a little bit right now, cloud was one of them, but it gets more and more and more. So we identified those and there were sort of three or four of those that we kind of latched onto. And then every year the passes, we're a little bit more right. Cause it's a secular trend in the market. And then eventually, it becomes a force that you can't kind of fight anymore. >> Yeah. And I just want to put a plug for your clubhouse talks with Andreessen Horowitz. You're always on clubhouse talking about, you know, I won't say the killer instinct, but being a CEO in a time where there's so much change going on, you're constantly under pressure. It's a lonely job at the top, I know that, but you've made some good calls. What was some of the key moments that you can point to, where you were like, okay, the wave is coming in now, we'd better get on it. What were some of those key decisions? Cause a lot of these startups want to be in your position, and a lot of buyers want to take advantage of the technology that's coming. They got to figure it out. What was some of those key inflection points for you? >> So if you're just listening to what everybody's saying, you're going to miss those trends. So then you're just going with the stream. So, Juan you mentioned that cloud. Cloud was a thing at the time, we thought it's going to be the thing that takes over everything. Today it's actually multi-cloud. So multi-cloud is a thing, it's more and more people are thinking, wow, I'm paying a lot's to the cloud vendors, do I want to buy more from them or do I want to have some optionality? So that's one. Two, open. They're worried about lock-in, you know, lock-in has happened for many, many decades. So they want open architectures, open source, open standards. So that's the second one that we bet on. The third one, which you know, initially wasn't sort of super obvious was AI and machine learning. Now it's super obvious, everybody's talking about it. But when we started, it was kind of called artificial intelligence referred to robotics, and machine learning wasn't a term that people really knew about. Today, it's sort of, everybody's doing machine learning and AI. So betting on those future trends, those secular trends as we call them super critical. >> And one of the things that I want to get your thoughts on is this idea of re-platforming versus refactoring. You see a lot being talked about in some of these, what does that even mean? It's people trying to figure that out. Re-platforming I get the cloud scale. But as you look at the cloud benefits, what do you say to customers out there and enterprises that are trying to use the benefits of the cloud? Say data for instance, in the middle of how could they be thinking about refactoring? And how can they make a better selection on suppliers? I mean, how do you know it used to be RFP, you deliver these speeds and feeds and you get selected. Now I think there's a little bit different science and methodology behind it. What's your thoughts on this refactoring as a buyer? What do I got to do? >> Well, I mean let's start with you said RFP and so on. Times have changed. Back in the day, you had to kind of sign up for something and then much later you're going to get it. So then you have to go through this arduous process. In the cloud, would pay us to go model elasticity and so on. You can kind of try your way to it. You can try before you buy. And you can use more and more. You can gradually, you don't need to go in all in and you know, say we commit to 50,000,000 and six months later to find out that wow, this stuff has got shelf where it doesn't work. So that's one thing that has changed it's beneficial. But the second thing is, don't just mimic what you had on prem in the cloud. So that's what this refactoring is about. If you had, you know, Hadoop data lake, now you're just going to have an S3 data lake. If you had an on-prem data warehouse now you just going to have a cloud data warehouse. You're just repeating what you did on prem in the cloud, architected for the future. And you know, for us, the most important thing that we say is that this lake house paradigm is a cloud native way of organizing your data. That's different from how you would do things on premises. So think through what's the right way of doing it in the cloud. Don't just try to copy paste what you had on premises in the cloud. >> It's interesting one of the things that we're observing and I'd love to get your reaction to this. Dave a lot** and I have been reporting on it is, two personas in the enterprise are changing their organization. One is I call IT ops or there's an SRE role developing. And the data teams are being dismantled and being kind of sprinkled through into other teams is this notion of data, pipelining being part of workflows, not just the department. Are you seeing organizational shifts in how people are organizing their resources, their human resources to take advantage of say that the data problems that are need to being solved with machine learning and whatnot and cloud-scale? >> Yeah, absolutely. So you're right. SRE became a thing, lots of DevOps people. It was because when the cloud vendors launched their infrastructure as a service to stitch all these things together and get it all working you needed a lot of devOps people. But now things are maturing. So, you know, with vendors like Data bricks and other multi-cloud vendors, you can actually get much higher level services where you don't need to necessarily have lots of lots of DevOps people that are themselves trying to stitch together lots of services to make this work. So that's one trend. But secondly, you're seeing more data teams being sort of completely ubiquitous in these organizations. Before it used to be you have one data team and then we'll have data and AI and we'll be done. ' It's a one and done. But that's not how it works. That's not how Google, Facebook, Twitter did it, they had data throughout the organization. Every BU was empowered. It's sales, it's marketing, it's finance, it's engineering. So how do you embed all those data teams and make them actually run fast? And you know, there's this concept of a data mesh which is super important where you can actually decentralize and enable all these teams to focus on their domains and run super fast. And that's really enabled by this Lake house paradigm in the cloud that we're talking about. Where you're open, you're basing it on open standards. You have flexibility in the data types and how they're going to store their data. So you kind of provide a lot of that flexibility, but at the same time, you have sort of centralized governance for it. So absolutely things are changing in the market. >> Well, you're just the professor, the masterclass right here is amazing. Thanks for sharing that insight. You're always got to go out of date and that's why we have you on here. You're amazing, great resource for the community. Ransomware is a huge problem, it's now the government's focus. We're being attacked and we don't know where it's coming from. This business models around cyber that's expanding rapidly. There's real revenue behind it. There's a data problem. It's not just a security problem. So one of the themes in all of these startup showcases is data is ubiquitous in the value propositions. One of them is ransomware. What's your thoughts on ransomware? Is it a data problem? Does cloud help? Some are saying that cloud's got better security with ransomware, then say on premise. What's your vision of how you see this ransomware problem being addressed besides the government taking over? >> Yeah, that's a great question. Let me start by saying, you know, we're a data company, right? And if you say you're a data company, you might as well just said, we're a privacy company, right? It's like some people say, well, what do you think about privacy? Do you guys even do privacy? We're a data company. So yeah, we're a privacy company as well. Like you can't talk about data without talking about privacy. With every customer, with every enterprise. So that's obviously top of mind for us. I do think that in the cloud, security is much better because, you know, vendors like us, we're investing so much resources into security and making sure that we harden the infrastructure and, you know, by actually having all of this infrastructure, we can monitor it, detect if something is, you know, an attack is happening, and we can immediately sort of stop it. So that's different from when it's on prem, you have kind of like the separated duties where the software vendor, which would have been us, doesn't really see what's happening in the data center. So, you know, there's an IT team that didn't develop the software is responsible for the security. So I think things are much better now. I think we're much better set up, but of course, things like cryptocurrencies and so on are making it easier for people to sort of hide. There decentralized networks. So, you know, the attackers are getting more and more sophisticated as well. So that's definitely something that's super important. It's super top of mind. We're all investing heavily into security and privacy because, you know, that's going to be super critical going forward. >> Yeah, we got to move that red line, and figure that out and get more intelligence. Decentralized trends not going away it's going to be more of that, less of the centralized. But centralized does come into play with data. It's a mix, it's not mutually exclusive. And I'll get your thoughts on this. Architectural question with, you know, 5G and the edge coming. Amazon's got that outpost stringent, the wavelength, you're seeing mobile world Congress coming up in this month. The focus on processing data at the edge is a huge issue. And enterprises are now going to be commercial part of that. So architecture decisions are being made in enterprises right now. And this is a big issue. So you mentioned multi-cloud, so tools versus platforms. Now I'm an enterprise buyer and there's no more RFPs. I got all this new choices for startups and growing companies to choose from that are cloud native. I got all kinds of new challenges and opportunities. How do I build my architecture so I don't foreclose a future opportunity. >> Yeah, as I said, look, you're actually right. Cloud is becoming even more and more something that everybody's adopting, but at the same time, there is this thing that the edge is also more and more important. And the connectivity between those two and making sure that you can really do that efficiently. My ask from enterprises, and I think this is top of mind for all the enterprise architects is, choose open because that way you can avoid locking yourself in. So that's one thing that's really, really important. In the past, you know, all these vendors that locked you in, and then you try to move off of them, they were highly innovative back in the day. In the 80's and the 90's, there were the best companies. You gave them all your data and it was fantastic. But then because you were locked in, they didn't need to innovate anymore. And you know, they focused on margins instead. And then over time, the innovation stopped and now you were kind of locked in. So I think openness is really important. I think preserving optionality with multi-cloud because we see the different clouds have different strengths and weaknesses and it changes over time. All right. Early on AWS was the only game that either showed up with much better security, active directory, and so on. Now Google with AI capabilities, which one's going to win, which one's going to be better. Actually, probably all three are going to be around. So having that optionality that you can pick between the three and then artificial intelligence. I think that's going to be the key to the future. You know, you asked about security earlier. That's how people detect zero day attacks, right? You ask about the edge, same thing there, that's where the predictions are going to happen. So make sure that you invest in AI and artificial intelligence very early on because it's not something you can just bolt on later on and have a little data team somewhere that then now you have AI and it's one and done. >> All right. Great insight. I've got to ask you, the folks may or may not know, but you're a professor at Berkeley as well, done a lot of great work. That's where you kind of came out of when Data bricks was formed. And the Berkeley basically was it invented distributed computing back in the 80's. I remember I was breaking in when Unix was proprietary, when software wasn't open you actually had the deal that under the table to get code. Now it's all open. Isn't the internet now with distributed computing and how interconnects are happening. I mean, the internet didn't break during the pandemic, which proves the benefit of the internet. And that's a positive. But as you start seeing edge, it's essentially distributed computing. So I got to ask you from a computer science standpoint. What do you see as the key learnings or connect the dots for how this distributed model will work? I see hybrids clearly, hybrid cloud is clearly the operating model but if you take it to the next level of distributed computing, what are some of the key things that you look for in the next five years as this starts to be completely interoperable, obviously software is going to drive a lot of it. What's your vision on that? >> Yeah, I mean, you know, so Berkeley, you're right for the gigs, you know, there was a now project 20, 30 years ago that basically is how we do things. There was a project on how you search in the very early on with Inktomi that became how Google and everybody else to search today. So workday was super, super early, sometimes way too early. And that was actually the mistake. Was that they were so early that people said that that stuff doesn't work. And then 20 years later you were invented. So I think 2009, Berkeley published just above the clouds saying the cloud is the future. At that time, most industry leaders said, that's just, you know, that doesn't work. Today, recently they published a research paper called, Sky Computing. So sky computing is what you get above the clouds, right? So we have the cloud as the future, the next level after that is the sky. That's one on top of them. That's what multi-cloud is. So that's a lot of the research at Berkeley, you know, into distributed systems labs is about this. And we're excited about that. Then we're one of the sky computing vendors out there. So I think you're going to see much more innovation happening at the sky level than at the compute level where you needed all those DevOps and SRE people to like, you know, build everything manually themselves. I can just see the memes now coming Ali, sky net, star track. You've got space too, by the way, space is another frontier that is seeing a lot of action going on because now the surface area of data with satellites is huge. So again, I know you guys are doing a lot of business with folks in that vertical where you starting to see real time data acquisition coming from these satellites. What's your take on the whole space as the, not the final frontier, but certainly as a new congested and contested space for, for data? >> Well, I mean, as a data vendor, we see a lot of, you know, alternative data sources coming in and people aren't using machine learning< AI to eat out signal out of the, you know, massive amounts of imagery that's coming out of these satellites. So that's actually a pretty common in FinTech, which is a vertical for us. And also sort of in the public sector, lots of, lots of, lots of satellites, imagery data that's coming. And these are massive volumes. I mean, it's like huge data sets and it's a super, super exciting what they can do. Like, you know, extracting signal from the satellite imagery is, and you know, being able to handle that amount of data, it's a challenge for all the companies that we work with. So we're excited about that too. I mean, definitely that's a trend that's going to continue. >> All right. I'm super excited for you. And thanks for coming on The Cube here for our keynote. I got to ask you a final question. As you think about the future, I see your company has achieved great success in a very short time, and again, you guys done the work, I've been following your company as you know. We've been been breaking that Data bricks story for a long time. I've been excited by it, but now what's changed. You got to start thinking about the next 20 miles stair when you look at, you know, the sky computing, you're thinking about these new architectures. As the CEO, your job is to one, not run out of money which you don't have to worry about that anymore, so hiring. And then, you got to figure out that next 20 miles stair as a company. What's that going on in your mind? Take us through your mindset of what's next. And what do you see out in that landscape? >> Yeah, so what I mentioned around Sky company optionality around multi-cloud, you're going to see a lot of capabilities around that. Like how do you get multi-cloud disaster recovery? How do you leverage the best of all the clouds while at the same time not having to just pick one? So there's a lot of innovation there that, you know, we haven't announced yet, but you're going to see a lot of it over the next many years. Things that you can do when you have the optionality across the different parts. And the second thing that's really exciting for us is bringing AI to the masses. Democratizing data and AI. So how can you actually apply machine learning to machine learning? How can you automate machine learning? Today machine learning is still quite complicated and it's pretty advanced. It's not going to be that way 10 years from now. It's going to be very simple. Everybody's going to have it at their fingertips. So how do we apply machine learning to machine learning? It's called auto ML, automatic, you know, machine learning. So that's an area, and that's not something that can be done with, right? But the goal is to eventually be able to automate a way the whole machine learning engineer and the machine learning data scientist altogether. >> You know it's really fun and talking with you is that, you know, for years we've been talking about this inside the ropes, inside the industry, around the future. Now people starting to get some visibility, the pandemics forced that. You seeing the bad projects being exposed. It's like the tide pulled out and you see all the scabs and bad projects that were justified old guard technologies. If you get it right you're on a good wave. And this is clearly what we're seeing. And you guys example of that. So as enterprises realize this, that they're going to have to look double down on the right projects and probably trash the bad projects, new criteria, how should people be thinking about buying? Because again, we talked about the RFP before. I want to kind of circle back because this is something that people are trying to figure out. You seeing, you know, organic, you come in freemium models as cloud scale becomes the advantage in the lock-in frankly seems to be the value proposition. The more value you provide, the more lock-in you get. Which sounds like that's the way it should be versus proprietary, you know, protocols. The protocol is value. How should enterprises organize their teams? Is it end to end workflows? Is it, and how should they evaluate the criteria for these technologies that they want to buy? >> Yeah, that's a great question. So I, you know, it's very simple, try to future proof your decision-making. Make sure that whatever you're doing is not blocking your in. So whatever decision you're making, what if the world changes in five years, make sure that if you making a mistake now, that's not going to bite you in about five years later. So how do you do that? Well, open source is great. If you're leveraging open-source, you can try it out already. You don't even need to talk to any vendor. Your teams can already download it and try it out and get some value out of it. If you're in the cloud, this pay as you go models, you don't have to do a big RFP and commit big. You can try it, pay the vendor, pay as you go, $10, $15. It doesn't need to be a million dollar contract and slowly grow as you're providing value. And then make sure that you're not just locking yourself in to one cloud or, you know, one particular vendor. As much as possible preserve your optionality because then that's not a one-way door. If it turns out later you want to do something else, you can, you know, pick other things as well. You're not locked in. So that's what I would say. Keep that top of mind that you're not locking yourself into a particular decision that you made today, that you might regret in five years. >> I really appreciate you coming on and sharing your with our community and The Cube. And as always great to see you. I really enjoy your clubhouse talks, and I really appreciate how you give back to the community. And I want to thank you for coming on and taking the time with us today. >> Thanks John, always appreciate talking to you. >> Okay Ali Ghodsi, CEO of Data bricks, a success story that proves the validation of cloud scale, open and create value, values the new lock-in. So Natalie, back to you for continuing coverage. >> That was a terrific interview John, but I'd love to get Dave's insights first. What were your takeaways, Dave? >> Well, if we have more time I'll tell you how Data bricks got to where they are today, but I'll say this, the most important thing to me that Allie said was he conveyed a very clear understanding of what data companies are outright and are getting ready. Talked about four things. There's not one data team, there's many data teams. And he talked about data is decentralized, and data has to have context and that context lives in the business. He said, look, think about it. The way that the data companies would get it right, they get data in teams and sales and marketing and finance and engineering. They all have their own data and data teams. And he referred to that as a data mesh. That's a term that is your mock, the Gany coined and the warehouse of the data lake it's merely a node in that global message. It meshes discoverable, he talked about federated governance, and Data bricks, they're breaking the model of shoving everything into a single repository and trying to make that the so-called single version of the truth. Rather what they're doing, which is right on is putting data in the hands of the business owners. And that's how true data companies do. And the last thing you talked about with sky computing, which I loved, it's that future layer, we talked about multi-cloud a lot that abstracts the underlying complexity of the technical details of the cloud and creates additional value on top. I always say that the cloud players like Amazon have given the gift to the world of 100 billion dollars a year they spend in CapEx. Thank you. Now we're going to innovate on top of it. Yeah. And I think the refactoring... >> Hope by John. >> That was great insight and I totally agree. The refactoring piece too was key, he brought that home. But to me, I think Data bricks that Ali shared there and why he's been open and sharing a lot of his insights and the community. But what he's not saying, cause he's humble and polite is they cracked the code on the enterprise, Dave. And to Dave's points exactly reason why they did it, they saw an opportunity to make it easier, at that time had dupe was the rage, and they just made it easier. They was smart, they made good bets, they had a good formula and they cracked the code with the enterprise. They brought it in and they brought value. And see that's the key to the cloud as Dave pointed out. You get replatform with the cloud, then you refactor. And I think he pointed out the multi-cloud and that really kind of teases out the whole future and landscape, which is essentially distributed computing. And I think, you know, companies are starting to figure that out with hybrid and this on premises and now super edge I call it, with 5G coming. So it's just pretty incredible. >> Yeah. Data bricks, IPO is coming and people should know. I mean, what everybody, they created spark as you know John and everybody thought they were going to do is mimic red hat and sell subscriptions and support. They didn't, they developed a managed service and they embedded AI tools to simplify data science. So to your point, enterprises could buy instead of build, we know this. Enterprises will spend money to make things simpler. They don't have the resources, and so this was what they got right was really embedding that, making a building a managed service, not mimicking the kind of the red hat model, but actually creating a new value layer there. And that's big part of their success. >> If I could just add one thing Natalie to that Dave saying is really right on. And as an enterprise buyer, if we go the other side of the equation, it used to be that you had to be a known company, get PR, you fill out RFPs, you had to meet all the speeds. It's like going to the airport and get a swab test, and get a COVID test and all kinds of mechanisms to like block you and filter you. Most of the biggest success stories that have created the most value for enterprises have been the companies that nobody's understood. And Andy Jazz's famous quote of, you know, being misunderstood is actually a good thing. Data bricks was very misunderstood at the beginning and no one kind of knew who they were but they did it right. And so the enterprise buyers out there, don't be afraid to test the startups because you know the next Data bricks is out there. And I think that's where I see the psychology changing from the old IT buyers, Dave. It's like, okay, let's let's test this company. And there's plenty of ways to do that. He illuminated those premium, small pilots, you don't need to go on these big things. So I think that is going to be a shift in how companies going to evaluate startups. >> Yeah. Think about it this way. Why should the large banks and insurance companies and big manufacturers and pharma companies, governments, why should they burn resources managing containers and figuring out data science tools if they can just tap into solutions like Data bricks which is an AI platform in the cloud and let the experts manage all that stuff. Think about how much money in time that saves enterprises. >> Yeah, I mean, we've got 15 companies here we're showcasing this batch and this season if you call it. That episode we are going to call it? They're awesome. Right? And the next 15 will be the same. And these companies could be the next billion dollar revenue generator because the cloud enables that day. I think that's the exciting part. >> Well thank you both so much for these insights. Really appreciate it. AWS startup showcase highlights the innovation that helps startups succeed. And no one knows that better than our very next guest, Jeff Barr. Welcome to the show and I will send this interview now to Dave and John and see you just in the bit. >> Okay, hey Jeff, great to see you. Thanks for coming on again. >> Great to be back. >> So this is a regular community segment with Jeff Barr who's a legend in the industry. Everyone knows your name. Everyone knows that. Congratulations on your recent blog posts we have reading. Tons of news, I want to get your update because 5G has been all over the news, mobile world congress is right around the corner. I know Bill Vass was a keynote out there, virtual keynote. There's a lot of Amazon discussion around the edge with wavelength. Specifically, this is the outpost piece. And I know there is news I want to get to, but the top of mind is there's massive Amazon expansion and the cloud is going to the edge, it's here. What's up with wavelength. Take us through the, I call it the power edge, the super edge. >> Well, I'm really excited about this mostly because it gives a lot more choice and flexibility and options to our customers. This idea that with wavelength we announced quite some time ago, at least quite some time ago if we think in cloud years. We announced that we would be working with 5G providers all over the world to basically put AWS in the telecom providers data centers or telecom centers, so that as their customers build apps, that those apps would take advantage of the low latency, the high bandwidth, the reliability of 5G, be able to get to some compute and storage services that are incredibly close geographically and latency wise to the compute and storage that is just going to give customers this new power and say, well, what are the cool things we can build? >> Do you see any correlation between wavelength and some of the early Amazon services? Because to me, my gut feels like there's so much headroom there. I mean, I was just riffing on the notion of low latency packets. I mean, just think about the applications, gaming and VR, and metaverse kind of cool stuff like that where having the edge be that how much power there. It just feels like a new, it feels like a new AWS. I mean, what's your take? You've seen the evolutions and the growth of a lot of the key services. Like EC2 and SA3. >> So welcome to my life. And so to me, the way I always think about this is it's like when I go to a home improvement store and I wander through the aisles and I often wonder through with no particular thing that I actually need, but I just go there and say, wow, they've got this and they've got this, they've got this other interesting thing. And I just let my creativity run wild. And instead of trying to solve a problem, I'm saying, well, if I had these different parts, well, what could I actually build with them? And I really think that this breadth of different services and locations and options and communication technologies. I suspect a lot of our customers and customers to be and are in this the same mode where they're saying, I've got all this awesomeness at my fingertips, what might I be able to do with it? >> He reminds me when Fry's was around in Palo Alto, that store is no longer here but it used to be back in the day when it was good. It was you go in and just kind of spend hours and then next thing you know, you built a compute. Like what, I didn't come in here, whether it gets some cables. Now I got a motherboard. >> I clearly remember Fry's and before that there was the weird stuff warehouse was another really cool place to hang out if you remember that. >> Yeah I do. >> I wonder if I could jump in and you guys talking about the edge and Jeff I wanted to ask you about something that is, I think people are starting to really understand and appreciate what you did with the entrepreneur acquisition, what you do with nitro and graviton, and really driving costs down, driving performance up. I mean, there's like a compute Renaissance. And I wonder if you could talk about the importance of that at the edge, because it's got to be low power, it has to be low cost. You got to be doing processing at the edge. What's your take on how that's evolving? >> Certainly so you're totally right that we started working with and then ultimately acquired Annapurna labs in Israel a couple of years ago. I've worked directly with those folks and it's really awesome to see what they've been able to do. Just really saying, let's look at all of these different aspects of building the cloud that were once effectively kind of somewhat software intensive and say, where does it make sense to actually design build fabricate, deploy custom Silicon? So from putting up the system to doing all kinds of additional kinds of security checks, to running local IO devices, running the NBME as fast as possible to support the EBS. Each of those things has been a contributing factor to not just the power of the hardware itself, but what I'm seeing and have seen for the last probably two or three years at this point is the pace of innovation on instance types just continues to get faster and faster. And it's not just cranking out new instance types because we can, it's because our awesomely diverse base of customers keeps coming to us and saying, well, we're happy with what we have so far, but here's this really interesting new use case. And we needed a different ratio of memory to CPU, or we need more cores based on the amount of memory, or we needed a lot of IO bandwidth. And having that nitro as the base lets us really, I don't want to say plug and play, cause I haven't actually built this myself, but it seems like they can actually put the different elements together, very very quickly and then come up with new instance types that just our customers say, yeah, that's exactly what I asked for and be able to just do this entire range of from like micro and nano sized all the way up to incredibly large with incredible just to me like, when we talk about terabytes of memory that are just like actually just RAM memory. It's like, that's just an inconceivably large number by the standards of where I started out in my career. So it's all putting this power in customer hands. >> You used the term plug and play, but it does give you that nitro gives you that optionality. And then other thing that to me is really exciting is the way in which ISVs are writing to whatever's underneath. So you're making that, you know, transparent to the users so I can choose as a customer, the best price performance for my workload and that that's just going to grow that ISV portfolio. >> I think it's really important to be accurate and detailed and as thorough as possible as we launch each one of these new instance types with like what kind of processor is in there and what clock speed does it run at? What kind of, you know, how much memory do we have? What are the, just the ins and outs, and is it Intel or arm or AMD based? It's such an interesting to me contrast. I can still remember back in the very very early days of back, you know, going back almost 15 years at this point and effectively everybody said, well, not everybody. A few people looked and said, yeah, we kind of get the value here. Some people said, this just sounds like a bunch of generic hardware, just kind of generic hardware in Iraq. And even back then it was something that we were very careful with to design and optimize for use cases. But this idea that is generic is so, so, so incredibly inaccurate that I think people are now getting this. And it's okay. It's fine too, not just for the cloud, but for very specific kinds of workloads and use cases. >> And you guys have announced obviously the performance improvements on a lamb** does getting faster, you got the per billing, second billings on windows and SQL server on ECE too**. So I mean, obviously everyone kind of gets that, that's been your DNA, keep making it faster, cheaper, better, easier to use. But the other area I want to get your thoughts on because this is also more on the footprint side, is that the regions and local regions. So you've got more region news, take us through the update on the expansion on the footprint of AWS because you know, a startup can come in and these 15 companies that are here, they're global with AWS, right? So this is a major benefit for customers around the world. And you know, Ali from Data bricks mentioned privacy. Everyone's a privacy company now. So the huge issue, take us through the news on the region. >> Sure, so the two most recent regions that we announced are in the UAE and in Israel. And we generally like to pre-announce these anywhere from six months to two years at a time because we do know that the customers want to start making longer term plans to where they can start thinking about where they can do their computing, where they can store their data. I think at this point we now have seven regions under construction. And, again it's all about customer trice. Sometimes it's because they have very specific reasons where for based on local laws, based on national laws, that they must compute and restore within a particular geographic area. Other times I say, well, a lot of our customers are in this part of the world. Why don't we pick a region that is as close to that part of the world as possible. And one really important thing that I always like to remind our customers of in my audience is, anything that you choose to put in a region, stays in that region unless you very explicitly take an action that says I'd like to replicate it somewhere else. So if someone says, I want to store data in the US, or I want to store it in Frankfurt, or I want to store it in Sao Paulo, or I want to store it in Tokyo or Osaka. They get to make that very specific choice. We give them a lot of tools to help copy and replicate and do cross region operations of various sorts. But at the heart, the customer gets to choose those locations. And that in the early days I think there was this weird sense that you would, you'd put things in the cloud that would just mysteriously just kind of propagate all over the world. That's never been true, and we're very very clear on that. And I just always like to reinforce that point. >> That's great stuff, Jeff. Great to have you on again as a regular update here, just for the folks watching and don't know Jeff he'd been blogging and sharing. He'd been the one man media band for Amazon it's early days. Now he's got departments, he's got peoples on doing videos. It's an immediate franchise in and of itself, but without your rough days we wouldn't have gotten all the great news we subscribe to. We watch all the blog posts. It's essentially the flow coming out of AWS which is just a tsunami of a new announcements. Always great to read, must read. Jeff, thanks for coming on, really appreciate it. That's great. >> Thank you John, great to catch up as always. >> Jeff Barr with AWS again, and follow his stuff. He's got a great audience and community. They talk back, they collaborate and they're highly engaged. So check out Jeff's blog and his social presence. All right, Natalie, back to you for more coverage. >> Terrific. Well, did you guys know that Jeff took a three week AWS road trip across 15 cities in America to meet with cloud computing enthusiasts? 5,500 miles he drove, really incredible I didn't realize that. Let's unpack that interview though. What stood out to you John? >> I think Jeff, Barr's an example of what I call direct to audience a business model. He's been doing it from the beginning and I've been following his career. I remember back in the day when Amazon was started, he was always building stuff. He's a builder, he's classic. And he's been there from the beginning. At the beginning he was just the blog and it became a huge audience. It's now morphed into, he was power blogging so hard. He has now support and he still does it now. It's basically the conduit for information coming out of Amazon. I think Jeff has single-handedly made Amazon so successful at the community developer level, and that's the startup action happened and that got them going. And I think he deserves a lot of the success for AWS. >> And Dave, how about you? What is your reaction? >> Well I think you know, and everybody knows about the cloud and back stop X** and agility, and you know, eliminating the undifferentiated, heavy lifting and all that stuff. And one of the things that's often overlooked which is why I'm excited to be part of this program is the innovation. And the innovation comes from startups, and startups start in the cloud. And so I think that that's part of the flywheel effect. You just don't see a lot of startups these days saying, okay, I'm going to do something that's outside of the cloud. There are some, but for the most part, you know, if you saw in software, you're starting in the cloud, it's so capital efficient. I think that's one thing, I've throughout my career. I've been obsessed with every part of the stack from whether it's, you know, close to the business process with the applications. And right now I'm really obsessed with the plumbing, which is why I was excited to talk about, you know, the Annapurna acquisition. Amazon bought and a part of the $350 million, it's reported, you know, maybe a little bit more, but that isn't an amazing acquisition. And the reason why that's so important is because Amazon is continuing to drive costs down, drive performance up. And in my opinion, leaving a lot of the traditional players in their dust, especially when it comes to the power and cooling. You have often overlooked things. And the other piece of the interview was that Amazon is actually getting ISVs to write to these new platforms so that you don't have to worry about there's the software run on this chip or that chip, or x86 or arm or whatever it is. It runs. And so I can choose the best price performance. And that's where people don't, they misunderstand, you always say it John, just said that people are misunderstood. I think they misunderstand, they confused, you know, the price of the cloud with the cost of the cloud. They ignore all the labor costs that are associated with that. And so, you know, there's a lot of discussion now about the cloud tax. I just think the pace is accelerating. The gap is not closing, it's widening. >> If you look at the one question I asked them about wavelength and I had a follow up there when I said, you know, we riff on it and you see, he lit up like he beam was beaming because he said something interesting. It's not that there's a problem to solve at this opportunity. And he conveyed it to like I said, walking through Fry's. But like, you go into a store and he's a builder. So he sees opportunity. And this comes back down to the Martine Casada paradox posts he wrote about do you optimize for CapEx or future revenue? And I think the tell sign is at the wavelength edge piece is going to be so creative and that's going to open up massive opportunities. I think that's the place to watch. That's the place I'm watching. And I think startups going to come out of the woodwork because that's where the action will be. And that's just Amazon at the edge, I mean, that's just cloud at the edge. I think that is going to be very effective. And his that's a little TeleSign, he kind of revealed a little bit there, a lot there with that comment. >> Well that's a to be continued conversation. >> Indeed, I would love to introduce our next guest. We actually have Soma on the line. He's the managing director at Madrona venture group. Thank you Soma very much for coming for our keynote program. >> Thank you Natalie and I'm great to be here and will have the opportunity to spend some time with you all. >> Well, you have a long to nerd history in the enterprise. How would you define the modern enterprise also known as cloud scale? >> Yeah, so I would say I have, first of all, like, you know, we've all heard this now for the last, you know, say 10 years or so. Like, software is eating the world. Okay. Put it another way, we think about like, hey, every enterprise is a software company first and foremost. Okay. And companies that truly internalize that, that truly think about that, and truly act that way are going to start up, continue running well and things that don't internalize that, and don't do that are going to be left behind sooner than later. Right. And the last few years you start off thing and not take it to the next level and talk about like, not every enterprise is not going through a digital transformation. Okay. So when you sort of think about the world from that lens. Okay. Modern enterprise has to think about like, and I am first and foremost, a technology company. I may be in the business of making a car art, you know, manufacturing paper, or like you know, manufacturing some healthcare products or what have you got out there. But technology and software is what is going to give me a unique, differentiated advantage that's going to let me do what I need to do for my customers in the best possible way [Indistinct]. So that sort of level of focus, level of execution, has to be there in a modern enterprise. The other thing is like not every modern enterprise needs to think about regular. I'm competing for talent, not anymore with my peers in my industry. I'm competing for technology talent and software talent with the top five technology companies in the world. Whether it is Amazon or Facebook or Microsoft or Google, or what have you cannot think, right? So you really have to have that mindset, and then everything flows from that. >> So I got to ask you on the enterprise side again, you've seen many ways of innovation. You've got, you know, been in the industry for many, many years. The old way was enterprises want the best proven product and the startups want that lucrative contract. Right? Yeah. And get that beach in. And it used to be, and we addressed this in our earlier keynote with Ali and how it's changing, the buyers are changing because the cloud has enabled this new kind of execution. I call it agile, call it what you want. Developers are driving modern applications, so enterprises are still, there's no, the playbooks evolving. Right? So we see that with the pandemic, people had needs, urgent needs, and they tried new stuff and it worked. The parachute opened as they say. So how do you look at this as you look at stars, you're investing in and you're coaching them. What's the playbook? What's the secret sauce of how to crack the enterprise code today. And if you're an enterprise buyer, what do I need to do? I want to be more agile. Is there a clear path? Is there's a TSA to let stuff go through faster? I mean, what is the modern playbook for buying and being a supplier? >> That's a fantastic question, John, because I think that sort of playbook is changing, even as we speak here currently. A couple of key things to understand first of all is like, you know, decision-making inside an enterprise is getting more and more de-centralized. Particularly decisions around what technology to use and what solutions to use to be able to do what people need to do. That decision making is no longer sort of, you know, all done like the CEO's office or the CTO's office kind of thing. Developers are more and more like you rightly said, like sort of the central of the workflow and the decision making process. So it'll be who both the enterprises, as well as the startups to really understand that. So what does it mean now from a startup perspective, from a startup perspective, it means like, right. In addition to thinking about like hey, not do I go create an enterprise sales post, do I sell to the enterprise like what I might have done in the past? Is that the best way of moving forward, or should I be thinking about a product led growth go to market initiative? You know, build a product that is easy to use, that made self serve really works, you know, get the developers to start using to see the value to fall in love with the product and then you think about like hey, how do I go translate that into a contract with enterprise. Right? And more and more what I call particularly, you know, startups and technology companies that are focused on the developer audience are thinking about like, you know, how do I have a bottom up go to market motion? And sometime I may sort of, you know, overlap that with the top down enterprise sales motion that we know that has been going on for many, many years or decades kind of thing. But really this product led growth bottom up a go to market motion is something that we are seeing on the rise. I would say they're going to have more than half the startup that we come across today, have that in some way shape or form. And so the enterprise also needs to understand this, the CIO or the CTO needs to know that like hey, I'm not decision-making is getting de-centralized. I need to empower my engineers and my engineering managers and my engineering leaders to be able to make the right decision and trust them. I'm going to give them some guard rails so that I don't find myself in a soup, you know, sometime down the road. But once I give them the guard rails, I'm going to enable people to make the decisions. People who are closer to the problem, to make the right decision. >> Well Soma, what are some of the ways that startups can accelerate their enterprise penetration? >> I think that's another good question. First of all, you need to think about like, Hey, what are enterprises wanting to rec? Okay. If you start off take like two steps back and think about what the enterprise is really think about it going. I'm a software company, but I'm really manufacturing paper. What do I do? Right? The core thing that most enterprises care about is like, hey, how do I better engage with my customers? How do I better serve my customers? And how do I do it in the most optimal way? At the end of the day that's what like most enterprises really care about. So startups need to understand, what are the problems that the enterprise is trying to solve? What kind of tools and platform technologies and infrastructure support, and, you know, everything else that they need to be able to do what they need to do and what only they can do in the most optimal way. Right? So to the extent you are providing either a tool or platform or some technology that is going to enable your enterprise to make progress on what they want to do, you're going to get more traction within the enterprise. In other words, stop thinking about technology, and start thinking about the customer problem that they want to solve. And the more you anchor your company, and more you anchor your conversation with the customer around that, the more the enterprise is going to get excited about wanting to work with you. >> So I got to ask you on the enterprise and developer equation because CSOs and CXOs, depending who you talk to have that same answer. Oh yeah. In the 90's and 2000's, we kind of didn't, we throttled down, we were using the legacy developer tools and cloud came and then we had to rebuild and we didn't really know what to do. So you seeing a shift, and this is kind of been going on for at least the past five to eight years, a lot more developers being hired yet. I mean, at FinTech is clearly a vertical, they always had developers and everyone had developers, but there's a fast ramp up of developers now and the role of open source has changed. Just looking at the participation. They're not just consuming open source, open source is part of the business model for mainstream enterprises. How is this, first of all, do you agree? And if so, how has this changed the course of an enterprise human resource selection? How they're organized? What's your vision on that? >> Yeah. So as I mentioned earlier, John, in my mind the first thing is, and this sort of, you know, like you said financial services has always been sort of hiring people [Indistinct]. And this is like five-year old story. So bear with me I'll tell you the firewall story and then come to I was trying to, the cloud CIO or the Goldman Sachs. Okay. And this is five years ago when people were still like, hey, is this cloud thing real and now is cloud going to take over the world? You know, am I really ready to put my data in the cloud? So there are a lot of questions and conversations can affect. The CIO of Goldman Sachs told me two things that I remember to this day. One is, hey, we've got a internal edict. That we made a decision that in the next five years, everything in Goldman Sachs is going to be on the public law. And I literally jumped out of the chair and I said like now are you going to get there? And then he laughed and said like now it really doesn't matter whether we get there or not. We want to set the tone, set the direction for the organization that hey, public cloud is here. Public cloud is there. And we need to like, you know, move as fast as we realistically can and think about all the financial regulations and security and privacy. And all these things that we care about deeply. But given all of that, the world is going towards public load and we better be on the leading edge as opposed to the lagging edge. And the second thing he said, like we're talking about like hey, how are you hiring, you know, engineers at Goldman Sachs Canada? And he said like in hey, I sort of, my team goes out to the top 20 schools in the US. And the people we really compete with are, and he was saying this, Hey, we don't compete with JP Morgan or Morgan Stanley, or pick any of your favorite financial institutions. We really think about like, hey, we want to get the best talent into Goldman Sachs out of these schools. And we really compete head to head with Google. We compete head to head with Microsoft. We compete head to head with Facebook. And we know that the caliber of people that we want to get is no different than what these companies want. If you want to continue being a successful, leading it, you know, financial services player. That sort of tells you what's going on. You also talked a little bit about like hey, open source is here to stay. What does that really mean kind of thing. In my mind like now, you can tell me that I can have from given my pedigree at Microsoft, I can tell you that we were the first embraces of open source in this world. So I'll say that right off the bat. But having said that we did in our turn around and said like, hey, this open source is real, this open source is going to be great. How can we embrace and how can we participate? And you fast forward to today, like in a Microsoft is probably as good as open source as probably any other large company I would say. Right? Including like the work that the company has done in terms of acquiring GitHub and letting it stay true to its original promise of open source and community can I think, right? I think Microsoft has come a long way kind of thing. But the thing that like in all these enterprises need to think about is you want your developers to have access to the latest and greatest tools. To the latest and greatest that the software can provide. And you really don't want your engineers to be reinventing the wheel all the time. So there is something available in the open source world. Go ahead, please set up, think about whether that makes sense for you to use it. And likewise, if you think that is something you can contribute to the open source work, go ahead and do that. So it's really a two way somebody Arctic relationship that enterprises need to have, and they need to enable their developers to want to have that symbiotic relationship. >> Soma, fantastic insights. Thank you so much for joining our keynote program. >> Thank you Natalie and thank you John. It was always fun to chat with you guys. Thank you. >> Thank you. >> John we would love to get your quick insight on that. >> Well I think first of all, he's a prolific investor the great from Madrona venture partners, which is well known in the tech circles. They're in Seattle, which is in the hub of I call cloud city. You've got Amazon and Microsoft there. He'd been at Microsoft and he knows the developer ecosystem. And reason why I like his perspective is that he understands the value of having developers as a core competency in Microsoft. That's their DNA. You look at Microsoft, their number one thing from day one besides software was developers. That was their army, the thousand centurions that one won everything for them. That has shifted. And he brought up open source, and .net and how they've embraced Linux, but something that tele before he became CEO, we interviewed him in the cube at an Xcel partners event at Stanford. He was open before he was CEO. He was talking about opening up. They opened up a lot of their open source infrastructure projects to the open compute foundation early. So they had already had that going and at that price, since that time, the stock price of Microsoft has skyrocketed because as Ali said, open always wins. And I think that is what you see here, and as an investor now he's picking in startups and investing in them. He's got to read the tea leaves. He's got to be in the right side of history. So he brings a great perspective because he sees the old way and he understands the new way. That is the key for success we've seen in the enterprise and with the startups. The people who get the future, and can create the value are going to win. >> Yeah, really excellent point. And just really quickly. What do you think were some of our greatest hits on this hour of programming? >> Well first of all I'm really impressed that Ali took the time to come join us because I know he's super busy. I think they're at a $28 billion valuation now they're pushing a billion dollars in revenue, gap revenue. And again, just a few short years ago, they had zero software revenue. So of these 15 companies we're showcasing today, you know, there's a next Data bricks in there. They're all going to be successful. They already are successful. And they're all on this rocket ship trajectory. Ali is smart, he's also got the advantage of being part of that Berkeley community which they're early on a lot of things now. Being early means you're wrong a lot, but you're also right, and you're right big. So Berkeley and Stanford obviously big areas here in the bay area as research. He is smart, He's got a great team and he's really open. So having him share his best practices, I thought that was a great highlight. Of course, Jeff Barr highlighting some of the insights that he brings and honestly having a perspective of a VC. And we're going to have Peter Wagner from wing VC who's a classic enterprise investors, super smart. So he'll add some insight. Of course, one of the community session, whenever our influencers coming on, it's our beat coming on at the end, as well as Katie Drucker. Another Madrona person is going to talk about growth hacking, growth strategies, but yeah, sights Raleigh coming on. >> Terrific, well thank you so much for those insights and thank you to everyone who is watching the first hour of our live coverage of the AWS startup showcase for myself, Natalie Ehrlich, John, for your and Dave Vellante we want to thank you very much for watching and do stay tuned for more amazing content, as well as a special live segment that John Furrier is going to be hosting. It takes place at 12:30 PM Pacific time, and it's called cracking the code, lessons learned on how enterprise buyers evaluate new startups. Don't go anywhere.

Published Date : Jun 24 2021

SUMMARY :

on the latest innovations and solutions How are you doing. are you looking forward to. and of course the keynotes Ali Ghodsi, of the quality of healthcare and you know, to go from, you know, a you on the other side. Congratulations and great to see you. Thank you so much, good to see you again. And you were all in on cloud. is the success of how you guys align it becomes a force that you moments that you can point to, So that's the second one that we bet on. And one of the things that Back in the day, you had to of say that the data problems And you know, there's this and that's why we have you on here. And if you say you're a data company, and growing companies to choose In the past, you know, So I got to ask you from a for the gigs, you know, to eat out signal out of the, you know, I got to ask you a final question. But the goal is to eventually be able the more lock-in you get. to one cloud or, you know, and taking the time with us today. appreciate talking to you. So Natalie, back to you but I'd love to get Dave's insights first. And the last thing you talked And see that's the key to the of the red hat model, to like block you and filter you. and let the experts manage all that stuff. And the next 15 will be the same. see you just in the bit. Okay, hey Jeff, great to see you. and the cloud is going and options to our customers. and some of the early Amazon services? And so to me, and then next thing you Fry's and before that and appreciate what you did And having that nitro as the base is the way in which ISVs of back, you know, going back is that the regions and local regions. And that in the early days Great to have you on again Thank you John, great to you for more coverage. What stood out to you John? and that's the startup action happened the most part, you know, And that's just Amazon at the edge, Well that's a to be We actually have Soma on the line. and I'm great to be here How would you define the modern enterprise And the last few years you start off thing So I got to ask you on and then you think about like hey, And the more you anchor your company, So I got to ask you on the enterprise and this sort of, you know, Thank you so much for It was always fun to chat with you guys. John we would love to get And I think that is what you see here, What do you think were it's our beat coming on at the end, and it's called cracking the code,

ENTITIES

Entity	Category	Confidence
Ali Ghodsi	PERSON	0.99+
Natalie Ehrlich	PERSON	0.99+
Dave	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Natalie	PERSON	0.99+
Jeff	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Osaka	LOCATION	0.99+
UAE	LOCATION	0.99+
Allie	PERSON	0.99+
Israel	LOCATION	0.99+
Peter Wagner	PERSON	0.99+
John Furrier	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
$10	QUANTITY	0.99+
Sao Paulo	LOCATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Frankfurt	LOCATION	0.99+
Berkeley	ORGANIZATION	0.99+
Jeff Barr	PERSON	0.99+
Seattle	LOCATION	0.99+
$28 billion	QUANTITY	0.99+
Katie Drucker	PERSON	0.99+
$15	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Soma	PERSON	0.99+
Iraq	LOCATION	0.99+
2009	DATE	0.99+
Juan	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
$350 million	QUANTITY	0.99+
Ali	PERSON	0.99+
11 years	QUANTITY	0.99+

Ariel Kelman, AWS | Informatica World 2019

>> Live from Las Vegas, it's theCUBE Covering Informatica World 2019 Brought to you by Informatica. >> Welcome back everyone to theCUBE's live coverage of Informatica World 2019 here in Las Vegas. I'm your host, Rebecca Knight, along with my co-host, John Furrier. We are joined by Ariel Kelman. He is the VP, Worldwide Marketing at AWS. Thank you so much for coming on theCUBE. >> Thanks so much for having me on today. >> So let's start out just at ten thousand feet and talk a little bit about what you're seeing as the major cloud and AI trends and what your customers are telling you. >> Yeah, so I mean, clearly, machine learning and AI is really the forefront of a lot of discussions in enterprise IT and there's massive interest but it's still really early. And one of the things that we're seeing companies really focused on now is just getting all their data ready to do the machine learning training. And as opposed to also, in addition I mean, training up all their people to be able to use these new skills. But we're seeing tons of interest, it's still very early, but you know one of the reasons here at Informatica World is that getting all the data imported and ready is, you know, it's almost doubled or tripled in importance as it was when people were just trying to do analytics. Now they're doing machine learning as well. You know, we're seeing huge interest in that. >> I want to get into some of the cloud trends with your business, but first, what's the relationship with Informatica, and you know we see them certainly at re:Invent. Why are you here? Was there an announcement? What's the big story? >> I mean, we've been working together for a long time and it's very complementary products and number varies. I think the relationship really started deepening when we released Redshift in 2013, and having so many customers that wanted to get data into the cloud to do data we're housing, we're already using Informatica in, to help get the data loaded and cleansed and so really they're one of the great partners that's fueling moving data into the cloud and helping our customers be more successful with Redshift. >> Yeah, one of the things I really admire about you guys is that you're very customer centric. We've been following Amazon as you know since their, actually second reinvent, Cube's been there every time, and just watching the growth, you know, Cloud certainly has been a power source for innovation, SAS companies that are born in the cloud have exponentially scaled faster than most enterprises because they use data. And so data's been a heart of all the successful SAS businesses, that's why start ups gravitated to the Cloud right away. But now that you guys got enterprise adoption, you guys have been customer centric and as you listen to customers, what are you guys hearing from that? Because the data on premises, you've got more compliance, you've got more regulation, you've got-- news today-- more privacy and now you've got regions, countries with different laws. So the complexity around even just regulatory, nevermind tech complexity, how are you guys helping customers when they say, you know what, I want to get to the cloud, love Amazon, love the cloud, but I've got my, I've got to clean up my on param house. >> Yeah, I would say like a lot, if you look at a lot of the professional services work that we do, a lot of it is around getting the company prepared and organized with all their data before they move to the cloud: segmenting it, understanding the different security regulatory requirements, coming up with a plan of what they need, what data they're going to maybe abstract up, before they load it, and there's a lot of work there. And, you know, we've been focused on trying to help customers.. >> And is there a part in you're helping migrate to the cloud, is that.. >> Yeah, there's technology pieces, companies like Informatica helping to extract and transform and load the data and on data governance policies. But then also, for a lot of our systems integrator partners, Cognizant, Accenture, Deloitte-- they're very involved in these projects. There's a lot of work that goes on; a lot of people don't talk about just before you can even start doing the machine learning, and a lot of that's getting your data ready. >> So how, what are some of the best practices that have emerged in working with companies that, as you said, there's a lot of pre-work that needs to be done and they need to be very thoughtful about about sort of getting their data sorted. >> Well I think the number one thing that I see and I recommend is to actually first take a step back from the data and to focus on what are the business requirements of, what questions are you trying to answer, let's say with machine learning, or with data science advanced analytics, and then back out the data from that. What we see a lot of, you know companies sometimes will have it be a data science driven project. Okay, here's all the data that we have, let's put it in one place, when you may not be spending time proportionate to the value of the data. And so that's one of the key things that we see, and to come up-- just come up with a strong plan around what answers you're, what business questions you're trying to answer. >> On the growth of Amazon, you guys certainly have had great record numbers, growth, even in the double digit kind of growth you're seeing on top of your baseline has been phenomenal. Clearly number one on the cloud. Enterprise has been a big focus. I noticed that on the NHL, your logo's on the ice during the playoffs; you've got the Statcast. You guys are creating a lot of aware-- I see a lot of billboards everywhere, a lot of TV ads. Is that part of the strategy is to get you guys more brand awareness? What's the.. >> We're trying, you know, it's part of our overall brand awareness strategy. What we're trying to do is to help, we're trying to communicate to the world how our customers are being successful using our technology, specifically machine learning and AI. It's one of these things where so many companies want to do it but they say, well, what am I supposed to use it for? And so, you know, one of, if you dumb down what marketing is at AWS, it's inspiring people about what they can run in the cloud with AWS, what use cases they should consider us for, and then we spend a lot of energy giving them the technical education and enablement so they can be successful using our products. At the end of the day, we make money when our customers are successful using our products. >> One of the hot products was SageMaker, we see in that group, AI's gone mainstream. That's a great tail wind for you guys because it kind of encapsulates or kind of doesn't have to get all nerdy about cloud, you know, infrastructure and SAS. AI kind of speaks to many people. It's one of the hottest curriculums and topics in the world. >> Yeah, and with SageMaker, we're trying to address a problem that we see in most of our customers where the everyday developer is not, does not have expertise in machine learning. They want to learn it, so we think that anything we can do to make it easier for every developer to ramp up on machine learning the better. So that's why we came up with SageMaker as a platform to really make all three stages of machine learning easier: getting your data prepared for training, training in optimized models, and then running inference to make the predictions and incorporate that into people's applications. >> One of the themes that's really emerging in this conversation is the need to make sure developers are ready and that your people are skilled up and know what they need to know. How are, how is AWS thinking about the skills gap, and what are you doing to remedy it? >> Yeah, a couple things. I mean, we're really, like a lot of things we do, we'll say what are all the ways we can attack the problem and let's try and help. So, we have free training that we've been creating online. We've been partnering with large online training firms like Udacity and Coursera. We have an ML solutions lab that help companies prototype, we have a pretty significant professional services team, and then we're working with all of out systems integrators partners to build up their machine learning practices. It's a new area for a lot of them and we've been pushing them to add more people so they can help their customers. >> Talk about the conferences, you have re:Invent, the CORE conference, we've been theCUBE there. We've just also covered London, Amazon's Web Services summit, and 22,000 registered, 14,000 showed up. Got huge global reach now. How do you keep up with this? I mean it's a... >> Well we're trying to help our customers keep up with all the technology. I mean, really, we have about, maybe 25 or so of these summits around the world-- usually around two days, several thousand people, free conferences. And what we're trying to do is >> They're free? >> The summits are free and it's like, we introduce so much new technology, new services, deeper functionality within our exiting services, and our customers are very hungry to learn the latest best practices and how they can use these, and so we're trying to be in all the major areas to come in and provide deep educational content to help our customers be more successful. >> And re:Invent's coming around the corner. Any themes there early on, numbers wise? Last year you had, again, record numbers. I mean at some point, is Vegas too small >> Yeah, we had over 50,000 people. We're going to have even more, and we've been expanding to more and more locations around Las Vegas and you know we're going to keep growing. There's a lot of demand. I mean, we want to be able to provide the re:Invent experience for as many people as want to attend. >> What's the biggest skill set, you know the folks graduating this month, my daughter's graduating from Cal Berkeley, and a lot of others are graduating >> Congratulations >> high school. Everyone wants to either jump into some sort of data related field, doesn't have to be computer science, those numbers are up. What's your view of skill sets that are needed right now that weren't in curriculum, or what pieces of curriculum should people be learning to be successful if machine learning continues to grow from helping videos surface to collecting customer data. Machine learning's going to be feeding the AI applications and SAS businesses. >> Yeah, I mean look, you just forget about machine learning, you go to a higher level. There's not enough good developers. I mean, we're in a world now where any enterprise that is going to be successful is going to have their own software developers. They're going to be writing their own software. That's not how the world was 15 years ago. But if you're a large corporation and you're outsourcing your technology, you're going to get disrupted by someone else who does believe in custom software and developers. So the demand for really good software engineers, I mean we deal with all the time, we're hiring. It is always going to outstrip supply. And so, for young people, I would encourage them to start coding and to not be over reliant on the university curriculums, which don't always keep pace with, you know, with the latest trends. >> And you guys got a ton of material online too, you can always go to your site. Okay, on the next question around, as someone figures out, okay, enterprise versus pure SAS, you guys have proven with the Cloud that start ups can grow very fast and then the list goes on: AirBnB, Pinterest, Zoom Communications, disrupting existing big, mature markets by having access to the data. So how do you talk about customers when you say, hey, you know, I want to be like a SAS company, like a consumer company, leverage data, but I've got a lot of stuff on premise. So how do I not make that data constrained? How do you guys feel about that conversation because that seems to be the top conversation here, is you know, it's not to say be consumer, it's consumer-like. Leveraging data, cause if data's not into AI, there's no, AI doesn't work, right? So >> Right >> It can't be constrained by anything. >> Well, you know, you talk to all these companies and at first they don't even know what they don't know in terms of what is that data? And where is it? And what are the pieces that are important? And so, you know, we encourage people to do a good amount of strategy work before they even start to move bits up to the cloud. And of course, then we have a lot of ways we can help them, from our Snowball machines that they can plug in, all the way to our Snowmobile, which is the semi truck that you can drive up to your data center and offload very large amounts of data and drive it over to our data centers. >> One of the things that is trending-- we had Ali from Data Bricks talk about, he absolutely believes a lot of the same philosophies you guys do-- data in the cloud. And one of his arguments was is that there's a lot of data sets in these marketplaces now where you can really leverage other people's data, and we see that on cybersecurity where people are starting to share data, and Cloud is a better model for that than trying to ship drives around, and there's a time for Snowball, I get that, and Snowmobile, the big trucks for large ingestion into the cloud, but the enterprise, this is a new phenomenon. No one really shared a lot in the old days. This is a new dynamic. Talk about that, is it-- >> I mean, sharing, selling, monetizing data. If there's something that is important, there will be a market for it. And I think we're seeing that just the hunger, everything from enterprises to startups, that want more data, whether it's for machine learning to train their models, or it's just to run analytics and compare against their data sets. So I think the commercial opportunity is pretty large. >> I think you're right on that. I think that's a great insight. I mean, no one ever thought about data as a service from our data set standpoint, 'cause data sets feed machine learning. All right, so let's do, give the plug on what's going on with AWS. What's new, what's on your plate, what's notable. I mean I love the NHL, I couldn't resist that plug for you being a hockey fan. But what's new in your world? >> Um, you know, we're, we're in early planning stages on our re:Invent conference, our engineers are hard at work on a lot of new technology that we're going to have ready between now and our re:Invent show. You know, also we're, my team's been doing a lot of work with the sports organizations. We've had some interesting machine learning work with major league baseball. They rolled out this year a new machine learning model to do stolen base predictions. So, you can see on some of the broadcasts, as a runner goes past first base, we'll have a ticker that will show what the probability is that they'll be successful stealing second base if they choose to run. Trying to make a little more entertaining all those scenes we've seen in the past of the pitcher throwing the ball back to first, trying to use AI machine leaning to give a little bit more insight into what's going on. >> And that's the Statcast. Part of that's the Statcast >> That's Statcast, yeah >> And you got anything new coming around that besides that new.. >> Yeah, I think that yeah, major league baseball is hard at work on some new models that I think will be announced fairly soon. >> All right, to wrap up Informatica real quick, an announcement here, news coming I hear. How are you guys working with Informatica in the field? Is there any, can you share more about relationship >> Yeah I mean I think we're going to have an announcement a little bit later today, I mean it's around the subject we've been talking about: making it easier for customers to, you know, be successful moving their data to the Cloud so that they can start to benefit from the agility, the speed and the cost savings of data analytics and machine learning in the Cloud. >> And so when you're working with customers, I mean, because this is the thing about Amazon. It is a famously innovative, cutting edge company, and when you talk about the hunger that you describe, that these customers, isn't it just that they want to be around Amazon and kind of rub shoulders with this really creative, thinking four steps ahead kind of company. I mean how do you let your innovation rub off on these customers? >> I mean there's a couple ways We do, one of the things we've done recently is these innovation workshops. We have this thing we talk about a lot this working backwards process where we force the engineers to write a press release before we'll green light the product because we feel like if you can't clearly articulate the customer benefit, then we probably shouldn't start investing, right? And so we, that's one of the processes that we use to help us innovate better, more effectively and so we've been walk-- we walk customers through this. We have them come, you know there's an international company that I was, part of one of the efforts we did in Palo Alto last year where we had a bunch of their leadership team out for two days of workshops where we worked a bunch of ideas through, through our process. And so we do some of that but the other area is we try and capture area where we think that we've innovated in some interesting way into a service that then customers can use. Like Amazon Connect I think is a good example of it. This is our contact center call routing technology and you know, one of the things Amazon's consumer business is known for is having great customer support, customer service, and they spent a lot of time and energy making sure that calls get routed intelligently to the right people, that you don't sit on hold forever, and so we figure we're probably not the only company that could benefit from that. Kind of like with AWS, when we figure out how to run infrastructure securely and high performance and availability, and so we turn that into a service and it's become a very successful service for us. A lot of companies have similar contact center problems. >> As a customer, I can attest to being on hold a lot. Ariel, thank you so much for coming on theCUBE. It's been great talking to you. >> I appreciate it. Thank you. >> Thanks for coming out, appreciate it. >> I'm Rebecca Knight, for John Furrier. You are watching theCUBE. Stay tuned. (upbeat music)

Published Date : May 21 2019

SUMMARY :

Brought to you by Informatica. He is the VP, Worldwide and AI trends and what your customers are telling you. the data imported and ready is, you know, it's almost Informatica, and you know we see them certainly to get data into the cloud to do data we're housing, we're Yeah, one of the things I really admire about you guys their data before they move to the cloud: segmenting it, the cloud, is that.. of people don't talk about just before you can even start a lot of pre-work that needs to be done and they need to be the data that we have, let's put it in one place, when you of the strategy is to get you guys more brand awareness? And so, you know, one of, if you dumb down what marketing is doesn't have to get all nerdy about cloud, you know, optimized models, and then running inference to make conversation is the need to make sure developers are all of out systems integrators partners to build up their Talk about the conferences, you have re:Invent, the CORE summits around the world-- usually around two days, the major areas to come in and provide deep educational And re:Invent's coming around the corner. and you know we're going to keep growing. going to be feeding the AI applications and SAS businesses. any enterprise that is going to be successful is going to have that conversation because that seems to be the top It can't be constrained And so, you know, we the same philosophies you guys do-- data in the cloud. that just the hunger, everything from enterprises to I mean I love the NHL, I couldn't of the pitcher throwing the ball back to first, trying Part of that's the Statcast And you got anything new coming around that that I think will be announced fairly soon. How are you guys I mean it's around the subject we've been talking about: I mean how do you let your innovation rub off on the product because we feel like if you can't clearly It's been great talking to you. I appreciate it. You are watching

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Deloitte	ORGANIZATION	0.99+
Udacity	ORGANIZATION	0.99+
Ariel	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Ariel Kelman	PERSON	0.99+
two days	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Pinterest	ORGANIZATION	0.99+
2013	DATE	0.99+
Last year	DATE	0.99+
Vegas	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
AirBnB	ORGANIZATION	0.99+
Coursera	ORGANIZATION	0.99+
second base	QUANTITY	0.99+
Informatica World	ORGANIZATION	0.99+
14,000	QUANTITY	0.99+
22,000	QUANTITY	0.99+
first base	QUANTITY	0.99+
Cognizant	ORGANIZATION	0.99+
last year	DATE	0.99+
Zoom Communications	ORGANIZATION	0.99+
Data Bricks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
over 50,000 people	QUANTITY	0.99+
SAS	ORGANIZATION	0.99+
Redshift	TITLE	0.99+
Ali	PERSON	0.98+
first	QUANTITY	0.98+
Snowmobile	ORGANIZATION	0.98+
one	QUANTITY	0.98+
today	DATE	0.98+
15 years ago	DATE	0.98+
ten thousand feet	QUANTITY	0.98+
Statcast	ORGANIZATION	0.97+
this year	DATE	0.96+
theCUBE	ORGANIZATION	0.96+
Cal Berkeley	ORGANIZATION	0.96+
second	QUANTITY	0.96+
Informatica World 2019	EVENT	0.95+
around two days	QUANTITY	0.95+
one place	QUANTITY	0.95+
2019	DATE	0.95+
Cube	ORGANIZATION	0.94+
25	QUANTITY	0.93+
CORE	EVENT	0.9+
Invent	EVENT	0.9+
about	QUANTITY	0.86+
themes	QUANTITY	0.86+
this month	DATE	0.85+
SageMaker	TITLE	0.83+
Web Services summit	EVENT	0.8+

Mike Flasko, Microsoft | Microsoft Ignite 2018

>> Live from Orlando, Florida it's theCUBE, covering Microsoft Ignite. Brought to you by Cohesity and theCUBE's eco-system partners. >> Welcome back everyone to theCUBE's live coverage of Microsoft Ignite. I'm your host Rebecca Knight along with my co-host Stu Miniman. We are joined by Mike Flasko. He is the Principal Group Product Manager here at Microsoft. Thanks so much for returning to theCUBE, you are a CUBE alumni. >> I am, yeah thanks for having me back. I appreciate it. >> So you oversee a portfolio of products. Can you let our viewers know what are you workin' on right now? >> Sure, yeah. I work in the area of data integration and governance at Microsoft, so everything around data integration, data acquisition, transformation and then pushing into the governance angles of, you know, once you acquire data and analyze it are you handling it properly as per industry guidelines or enterprise initiatives as you might have? >> You mentioned the magic word, transformation. I would love to have you define. It's become a real buzz word in this industry. How do you define digital transformation? >> Sure, I think it's a great discussion because we're talking about this all the time, but what does that really mean? And for us, the way I see it is starting to make more and more data driven decisions all the time. And so it's not like a light switch, where you weren't and then you were. Typically what happens is as we start working with customers they find new and interesting ways to use more data to help them make a more informed decision. And it starts from a big project or a small project and then just kind of takes off throughout the company. And so really, I think it boils down to using more data and having that guide a lot of the decisions you're making and typically that starts with tapping into a bunch of data that you may already have that just hasn't been part of your kind of traditional data warehousing or BI loop and thinking about how you can do that. >> Mike bring us inside the portfolio a little bit, you know, everybody knows Microsoft. We think about our daily usage of all the Microsoft product that my business data runs through, but when you talk about your products they're specific around the data. Help us walk through that a little bit. >> Sure, yeah. So we have a few kind of flagship products in the space, if you will. The first is something called Azure Data Factory and the purpose of that product is fairly simple. It's really for data professionals. They might be integrators or warehousing professionals et cetera and its to facilitate making it really easy to acquire data from wherever it is. Your business data on-prem from other clouds, SAS applications and allow a really easy experience to kind of bring data into your cloud, into our cloud for analytics and then build data processing pipelines that take that raw data and transform it into something useful, whatever your business domain requires. Whether that's training a machine learning model or populating your warehouse based on more data than you've had before. So first one, data factory all about data integration kind of a modern take on it. Built for the cloud, but fundamentally supports hybrid scenarios. And then other products we've got are things like Azure Data Catalog, which are more in the realm of aiding the discovery and governance of data. So once you start acquiring all this data and using it more productively, you start to have a lot and how do you connect those who want to consume data with the data professionals or data scientists that are producing these rich data sets. So how do you connect your information workers with your data scientists or your data engineers that are producing data sets? Data catalog's kind of the glue between the two. >> Mike wondering if you can help connect the dots to some of the waves we've been seeing. There was a traditonal kind of BI and data warehousing then we went through a kind of big data, the volumes of data and how can I, even if I'm not some multi-national or global company, take advantage of the data? Now there's machine intelligence. Machine learning, AI and all these pieces. What's the same and what's different about the trend and the products today? >> Sure, I think the first thing I've learnt through this process and being in our data space for a while and then working our big data projects is that, for a while we used to talk about them as different things. Like you do data warehousing and now that kind of has an old kind of connotation feeling to it. It's got an old feel to it, right? And then we talk about big data and you have a big data project and I think the realization that we've got is it's really those two things starting to come together and if you think about it, like everybody has been doing some form of analytics and warehousing for a while. And if we start to think about what the Brick Data Technologies has brought is a couple of things, in my opinion that kind of bring these two things together is with big data we started to be able to acquire data of significantly larger size and varying shape, right? But at the end of the day, the task is often acquire that data, shape that data into something useful and then connect it up to our business decision makers that need to leverage that data from a day to day basis. We've been doing that process in warehousing forever. It's really about how easily can we marry big data processing with the traditional data warehousing processes so that our warehouses, our decision making can kind of scale to large data and different shapes of data. And so probably what you'll see actually, at Ignite conference in a lot of our sessions, you'll hear our speakers talking about something called a modern data warehousing and like, it really doesn't matter what the label is associated with it. But it's really about how do you use big data technologies like Spark and Data Bricks naturally alongside warehousing technologies and integration technologies so they really form the modern data warehouse that does naturally handle big data, that does naturally bring in data of all shapes and sizes and provides kind of an experimentation ground as well, for data science. I think that's the last one that kind of comes in is once you've got big data and warehousing kind of working together to expand your analytics beyond kind of traditional approaches the next is opening up some of that data earlier in its life cycle for experimentation by data science. It's kind of the new angle and we think about this notion of kind of modern data warehousing as almost one thing supporting them all going forward. I think the challenge we've had is when we try to separate these into kind of net new deliverables, net new projects where we're starting to kind of bifurcate, if you will, the data platform to some degree. And things were getting a little too complex and so I think what we're seeing is that people are learning what these tools are good at and what they're not good at and now how to bring them together to really get back some of the productivity that we've had in the past. >> I want to ask you about those business decision makers that you referenced. I mean there's an assumption that every organization wants to become more data driven. And I think that most companies would probably say yes, but then there's another set of managers who really want to go by their gut. I mean have you found that being a conflict in terms of how you are positioning the products and services? >> Yeah absolutley. In a number of customer engagements we've had where you start to bring in more data, you start to evolve kind of the analytics practice. There is a lot of resistance at times that, you know, we've done it this way for 20 years, business is pretty good. What are we really fixing here? And so what we've found is the best path through this and in a lot of cases the required path has been show people the art of the possible, run experiments, show them side by side examples and typically with that comes a comfort level in what's possible sometimes it exposes new capabilities and options, sometimes it also shows that there's some other ways to arrive at decisions, but we've certainly seen that and almost like anything, you kind of have to start small, create a proving ground and be able to do it in a kind of side by side manner to show comparison as we go, but it's a conversation that I think is going to carry forward for the next little while especially as some of the work in AI and machine learning is starting to make it's way into business critical settings, right? Pricing your products. Product placement. All of this stuff that directly affects bottom lines you're starting to see these models do a really good job. And I think what we've found is it's all about experimentation. >> Mike when we listen to (mumbles) and to Dell and we talk about, you know, how things are developed inside of Microsoft, usually hear things like open and extensible, you got to have APIs in any of these modern pieces. It was highlighted in the Keynote on Monday, talking about the open data initiative got companies like Adobe and SAP out there, they have a lot of data, so the question is, of course, Microsoft has a lot of data that customers flow through, but there's also this very large eco-system we see at this show. What's the philosophy? Is it just, you know, oh, I've got some APIs and people plug into it? How does all the data get so that the customers can use it? >> Yeah it's a great question. That one I work a lot on and I think there's a couple of angles to it. One is, I think as big data's taken off, a lot of the integration technology that we've used in the past really wasn't made for this era. Where you've got data coming from everywhere. It's different shapes and it's different sizes and so at least within some of our products, we've been investing a lot into how do we make it really easy to acquire all the data you need because, you know, like you hear in all these cases, you can have the best model in the world if you don't have the best data sets it doesn't matter. Digital transformation starts with getting access to more data than you had before and so I think we've been really focused on this, we call it the ingestion of data. Being able to really easily connect and acquire all of the data and that's the starting point. The next thing that we've seen from companies have kind of gone down that journey with us is once you've acquired it all, you quickly have to understand it and you have to be able to kind of search over it and understand it through the lens of potentially business terms if you're a business user trying to understand what is all these data sets? What do they mean? And so I think this is where you're starting to see the rise of data cataloging initiatives not necessarily master data, et cetera, of the past, but this idea of, wow, I'm acquiring all of this data, how do I make sense of it? How do I catalog it? How does all of my workers or my employees easily find what they need and search for the data through the lens that makes sense to them. Data scientists are going to search through a very technical lens. Your business users through business glossary, business domain terms in that way and, so for me it all starts with the acquisition. I think it still far too hard and then becomes kind of a cataloging initiative and then the last step is how do we start to get some form of standards or agreement around the semantics of the data itself? Like this is a customer, this is a place. This is what, you know, a rating and I think with that you're going to start to see a whole eco-system of apps start to develop and one of the things that we're pretty excited about with the open data partnerships is how can we bring in data and to some degree auto-classify it into a set of terms that allow you to just get on with the business logic as opposed to spend all the time in the acquisition phase that most companies do today. >> You mentioned that AI is becoming increasingly important and mission critical or at least, bottom line critical in business models. What are some of the most exciting new uses of AI that you're seeing and that you hope expands into the larger industry? >> Sure. It really does cross a number of domains. We work with a retailer, ASOS. Every time we get to chat with them it's a very interesting use on how they have completely customized the shopping experience from how they layout the page based on your interest and preference through to how the search terms come back based on seasonality of what you're looking at based on what they've learnt about your purchase patterns over time, your sex, et cetera. And so I think this notion of like, intensely customized customer experiences is playing out everywhere. We've seen it on the other side in engine design and preventative maintenance. Where we've got certain customers now that are selling engine hours as opposed to engines themselves. And so if there's an engine hour that they can't provide that's a big deal and so they want to get ahead of any maintenance issue they can and they're using models to predict when a particular maintenance event is going to be required and getting ahead of that through to athletes and injury prevention. We're now seeing all the way down to connected clothing and athletic gear where all the way down, not just at the professional level, but it's starting to come down to the club level on athletes as they're playing, starting to realize that, oh, something's not quite right, I want to get ahead of this before I have a more serious injury. And so we've seen it in a number of domains almost every new customer I'm talking with. I'm excited by what they're doing in this area. >> Well, you bring up an interesting challenges. I've heard Microsoft is really I guess verticalizing around certain industries to put solutions together. One of the challenges we saw, you know, we saw surveys of big data. The number use case came back was always custom and it was like, oh, okay, well how do I templatize and allow hundreds of customers to do this not every single project is a massive engagement. What are you seeing that we're learning from the past and it feels like we're getting over that hump a little bit faster now than we were a few years ago. >> Yeah, so if I heard you correctly, it's a little bit loud so you're saying everything started at custom? And how do we get past that? And I think it actually goes back to what we're talking about earlier with this notion of a common understanding of data because what was happening is everybody felt they had bespoke data or we had data that was speaking about the same domains and terms, but we didn't agree on anything, so we spent a ton of time in the bespoke or custom arena of integrating, cleaning, transforming, before we could even get to model building or before we could get to any kind of innovation on the data itself and so I think one of the things is realizing that a lot of these domains we're trying to solve similar problems, we all have similar data. The more we can get to a common understanding of the data that we have, the more you can see higher level re-usable components being built, saying, "Ah, I know how to work on customer data" "I know how to work on sales data" "I know how to work on, you know, oil and gas data" whatever it might be, you'll probably start to see things come up in industry verticals as well. And I think it's that motion, like we had the same problem years ago when we talked about log files. Before there was logging standards, everything was a custom solution, right? Now we have very rich solutions for understanding IT infrastructure et cetera that usually became because we had a better base line for the understanding of the data we had. >> Great. Mike Thank you so much for coming on theCUBE. It was a pleasure having you. >> Thank you for having me. >> I'm Rebecca Knight for Stu Miniman, we will have more of theCUBE's live coverage of Microsoft Ignite coming up just after this. (techno music)

Published Date : Sep 26 2018

SUMMARY :

Brought to you by Cohesity He is the Principal Group Product Manager I am, yeah thanks for having me back. what are you workin' on right now? of, you know, once you I would love to have you define. of the decisions you're making of all the Microsoft product in the space, if you will. and the products today? the data platform to some degree. that you referenced. and in a lot of cases the and we talk about, you know, all the data you need because, you know, that you hope expands and getting ahead of that One of the challenges we saw, you know, of the data that we have, Mike Thank you so much of Microsoft Ignite

ENTITIES

Entity	Category	Confidence
Mike Flasko	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Mike	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
ASOS	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
two	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
Brick Data Technologies	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.98+
Monday	DATE	0.98+
first	QUANTITY	0.98+
Cohesity	ORGANIZATION	0.98+
One	QUANTITY	0.97+
hundreds of customers	QUANTITY	0.97+
today	DATE	0.96+
Ignite	EVENT	0.96+
Data Bricks	ORGANIZATION	0.96+
one	QUANTITY	0.96+
first one	QUANTITY	0.95+
Azure Data Factory	ORGANIZATION	0.95+
Azure Data Catalog	TITLE	0.92+
few years ago	DATE	0.91+
Spark	ORGANIZATION	0.9+
first thing	QUANTITY	0.85+
SAP	ORGANIZATION	0.8+
single project	QUANTITY	0.77+
Microsoft Ignite	ORGANIZATION	0.72+
years	DATE	0.72+
SAS	TITLE	0.7+
Ignite 2018	EVENT	0.52+
ton	QUANTITY	0.41+

Rob Lantz, Novetta - Spark Summit 2017 - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco it's the CUBE covering Spark Summit 2017 brought to you by Data Bricks. >> Welcome back to the CUBE, we're continuing to take about two people who are not just talking about things but doing things. We're happy to have, from Novetta, the Director of Predictive Analytics, Mr. Rob Lantz. Rob, welcome to the show. >> Thank you. >> And off to my right, George, how are you? >> Good. >> We've introduced you before. >> Yes. >> Well let's talk to the guest. Let's get right to it. I want to talk to you a little bit about what does Novetta do and then maybe what apps you're building using Spark. >> Sure, so Novetta is an advanced analytics company, we're medium sized and we develop custom hardware and software solutions for our customers who are looking to get insights out of their big data. Our primary offering is a hard entity resolution engine. We scale up to billions of records and we've done that for about 15 years. >> So you're in the business end of analytics, right? >> Yeah, I think so. >> Alright, so talk to us a little bit more about entity resolution, and that's all Spark right? This is your main priority? >> Yes, yes, indeed. Entity resolution is the science of taking multiple disparate data sets, traditional big data, and taking records from those and determining which of those are actually the same individual or company or address or location and which of those should be kept separate. We can aggregate those things together and build profiles and that enables a more robust picture of what's going on for an organization. >> Okay, and George? >> So what did you do... What was the solution looking like before Spark and how did it change once you adopted Spark? >> Sure, so with Spark, it enabled us to get a lot faster. Obviously those computations scaled a lot better. Before, we were having to write a lot of custom code to get those computations out across a grid. When we moved to Hadoop and then Spark, that made us, let's say able to scale those things and get it done overnight or in hours and not weeks. >> So when you say you had to do a lot of custom code to distribute across the cluster, does that include when you were working with MapReduce, or was this even before the Hadoop era? >> Oh it was before the Hadoop era and that predates my time so I won't be able to speak expertly about it, but to my understanding, it was a challenge for sure. >> Okay so this sounds like a service that your customers would then themselves build on. Maybe an ETL customer would figure out master data from a repository that is not as carefully curated as the data warehouse or similar applications. So who is your end customer and how do they build on your solution? >> Sure, so the end customer typically is an enterprise that has large volumes of data that deal in particular things. They collect, it could be customers, it could be passengers, it could be lots of different things. They want to be able to build profiles about those people or companies, like I said, or locations, any number of things can be considered an entity. The way they build upon it then is how they go about quantifying those profiles. We can help them do that, in fact, some of the work that I manage does that, but often times they do it themselves. They take the resolve data and that gets resolved nightly or even hourly. They build those profiles themselves for their own purpose. >> Then, to help us think about the application or the use case holistically, once they've built those profiles and essentially harmonized the data, what does that typically feed into? >> Oh gosh, any number of things really. Oh, shoot. We've got deployments in AWS in the cloud, we've got deployments, lots of deployments on premises obviously. That can go anywhere from relational databases to graph query language databases. Lots of different places from there for sure. >> Okay so, this actually sounds like everyone talks now about machine learning and forming every category of software. This sounds like you take the old style ETL, where master data was a value add layer on top, and that was, it took a fair amount of human judgment to do. Now, you're putting that service on top of ETL and you're largely automating it, probably with, I assume, some supervised guidance, supervised training. >> Yes, so we're getting into the machine learning space as far as entity extraction and resolution and recognition because more and more data is unstructured. But machine learning isn't necessarily a baked in part of that. Actually entity resolution is a prerequisite, I think, for quality machine learning. So if Rob Lantz is a customer, I want to be able to know what has Rob Lantz bought in the past from me. And maybe what is Rob Lantz talking about in social media? Well I need to know how to figure out who those people are and who's Rob Lantz and who's Robert Lantz is a completely different person, I don't want to collapse those two things together. Then I would build machine learning on top of that to say, right, now what's his behavior going to be in the future. But once I have that robust profile built up, I can derive a lot more interesting features with which to apply the machine learning. >> Okay, so you are a Data Bricks customer and there's also a burgeoning partnership. >> Rob: Yeah, I think that's true. >> So talk to us a little bit about what are some of the frustrations you had before adopting Data Bricks and maybe why you choose it. >> Yeah, sure. So the frustrations primarily with a traditional Hadoop environment involved having to go from one customer site to another customer site with an incredibly complex technology stack and then do a lot of the cluster management for those customers even after they'd already set it up because of all the inner workings of Hadoop and that ecosystem. Getting our Spark application installed there, we had to penetrate layers and layers of configuration in order to tune it appropriately to get the performance we needed. >> David: Okay, and were you at the keynote this morning? >> I was not, actually. >> Okay, I'm not going to ask you about that then. >> Ah. >> But I am going to ask you a little bit about your wishlist. You've been talking to people maybe in the hallway here, you just got here today but, what do you wish the community would do or develop, what would you like to learn while you're here? >> Learning while I'm here, I've already picked up a lot. So much going on and it's such a fast paced environment, it's really exciting. I think if I had a wishlist, I would want a more robust ML Lib, machine learning library. All the things that you can get on traditional, in scientific computing stacks moved onto a Spark ML Lib for easier access. On a cluster would be great. >> I thought several years ago ML Lib took over from Mahoot as the most active open source community for adding, really, I thought, scale out machine learning algorithms. If it doesn't have it all now, or maybe all is something you never reach, kind of like Red Queen effect, you know? >> Rob: For sure, for sure. >> What else is attracting these scale out implementations of the machine learning algorithms? >> Um? >> In other words, what are the platforms? If it's not Spark then... >> I don't think it exists frankly, unless you write your own. I think that would be the way to go. That's the way to go about it now. I think what organizations are having to do with machine learning in a distributed environment is just go with good enough, right. Whereas maybe some of the ensemble methods that are, actually aren't even really cutting edge necessarily, but you can really do a lot of tuning on those things, doing that tuning distributed at scale would be really powerful. I read somewhere, and I'm not going to be able to quote exactly where it was but, actually throwing more data at a problem is more valuable than tuning a perfect algorithm frankly. If we could combine the two, I think that would be really powerful. That is, finding the right algorithm and throwing all the data at it would get you a really solid model that would pick up on that signal that underlies any of these phenomena. >> David: Okay well, go ahead George. >> I was going to ask, I think that goes back to, I don't know if it was Google Paper, or one of the Google search quality guys who's a luminary in the machine learning space says, "data always trumps algorithms." >> I believe that's true and that's true in my experience certainly. >> Once you had this machine learning and once you've perhaps simplified the multi-vendor stack, then what is your solution start looking like in terms of broadening its appeal, because of the lower TCO. And then, perhaps embracing more use cases. >> I don't know that it necessarily embraces more use cases because entity resolution applies so broadly already, but what I would say is will give us more time to focus on improving the ER itself. That's I think going to be a really, really powerful improvement we can make to Novetta entity analytics as it stands right now. That's going to go into, we alluded to before, the machine learning as part of the entity resolution. Entity extraction, automated entity extraction from unstructured information and not just unstructured text but unstructured images and video. Could be a really powerful thing. Taking in stuff that isn't tagged and pulling the entities out of that automatically without actually having to have a human in the loop. Pulling every name out, every phone number out, every address out. Go ahead, sorry. >> This goes back to a couple conversations we've had today where people say data trumps algorithms, even if they don't say it explicitly, so the cloud vendors who are sitting on billions of photos, many of which might have house street addresses and things like that, or faces, how do you make better... How do you extract better tuning for your algorithms from data sets that I assume are smaller than the cloud vendors? >> They're pretty big. We employ data engineers that are very experienced at tagging that stuff manually. What I would envision would happen is we would apply somebody for a week or two weeks, to go in and tag the data as appropriate. In fact, we have products that go in and do concept tagging already across multiple languages. That's going to be the subject of my talk tomorrow as a matter of fact. But we can tag things manually or with machine assistance and then use that as a training set to go apply to the much larger data set. I'm not so worried about the scale of the data, we already have a lot, a lot of data. I think it's going to be getting that proof set that's already tagged. >> So what you're saying is, it actually sounds kind of important. That actually almost ties into what we hear about Facebook training their messenger bot where we can't do it purely just on training data so we're going to take some data that needs semi-supervision, and that becomes our new labeled set, our new training data. Then we can run it against this broad, unwashed mass of training data. Is that the strategy? >> Certainly we would get there. We would want to get there and that's the beauty of what Data Bricks promises, is that ability to save a lot of the time that we would spend doing the nug work on cluster management to innovate in that way and we're really excited about that. >> Alright, we've got just a minute to go here before the break, so I wanted to ask you maybe, the wish list question, I've been asking everybody today, what do you wish you had? Whether it's in entity resolution or some other area in the next couple of years for Novetta, what's on your list? >> Well I think that would be the more robust machine learning library, all in Spark, kind of native, so we wouldn't have to deploy that ourselves. Then, I think everything else is there, frankly. We are very excited about the platform and the stack that comes with it. >> Well that's a great ending right there, George do you have any other questions you want to ask? Alright, we're just wrapping up here. Thank you so much, we appreciate you being on the show Rob, and we'll see you out there in the Expo. >> I appreciate it, thank you. >> Alright, thanks so much. >> George: It's good to meet you. >> Thanks. >> Alright, you are watching the CUBE here at Spark Summit 2017, stay tuned, we'll be back with our next guest.

Published Date : Jun 6 2017

SUMMARY :

brought to you by Data Bricks. Welcome back to the CUBE, I want to talk to you a little bit about and we've done that for about 15 years. and build profiles and that enables a more robust picture and how did it change once you adopted Spark? and get it done overnight or in hours and not weeks. and that predates my time and how do they build on your solution? and that gets resolved nightly or even hourly. We've got deployments in AWS in the cloud, and that was, it took a fair amount going to be in the future. Okay, so you are a Data Bricks customer and maybe why you choose it. to get the performance we needed. what would you like to learn while you're here? All the things that you can get on traditional, kind of like Red Queen effect, you know? If it's not Spark then... I read somewhere, and I'm not going to be able or one of the Google search quality guys and that's true in my experience certainly. because of the lower TCO. and pulling the entities out of that automatically that I assume are smaller than the cloud vendors? I think it's going to be getting that proof set Is that the strategy? is that ability to save a lot of the time and the stack that comes with it. and we'll see you out there in the Expo. Alright, you are watching the CUBE

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George	PERSON	0.99+
Rob Lantz	PERSON	0.99+
Robert Lantz	PERSON	0.99+
San Francisco	LOCATION	0.99+
Data Bricks	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
Rob	PERSON	0.99+
two	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Spark	TITLE	0.99+
Novetta	ORGANIZATION	0.99+
two weeks	QUANTITY	0.99+
tomorrow	DATE	0.99+
two things	QUANTITY	0.98+
today	DATE	0.98+
Spark Summit 2017	EVENT	0.98+
several years ago	DATE	0.97+
Hadoop	TITLE	0.97+
Google	ORGANIZATION	0.97+
about 15 years	QUANTITY	0.96+
#SparkSummit	EVENT	0.95+
billions of photos	QUANTITY	0.95+
this morning	DATE	0.91+
ML Lib	TITLE	0.91+
billions	QUANTITY	0.9+
one	QUANTITY	0.87+
Mahoot	ORGANIZATION	0.85+
one customer site	QUANTITY	0.85+
Hadoop	DATE	0.84+
two people	QUANTITY	0.74+
CUBE	ORGANIZATION	0.72+
Predictive Analytics	ORGANIZATION	0.68+
next couple	DATE	0.66+
Director	PERSON	0.66+
years	DATE	0.62+
Spark ML Lib	TITLE	0.61+
Queen	TITLE	0.59+
ML	TITLE	0.57+
couple	QUANTITY	0.54+
Red	OTHER	0.53+
MapReduce	ORGANIZATION	0.52+
Google Paper	ORGANIZATION	0.47+

Scott Raney, Redpoint Ventures - Google Next 2017 - #GoogleNext17 - #theCUBE

(light music) You are Cube alumni. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) >> Today as a country, as a universe. >> Narrator: Congratulations, Reggie Jackson. You are Cube alumni. (light music) (light music) >> Narrator: Live from the Silicon Valley, It's the Cube. Covering Google Cloud Next 17. >> Hello and welcome to the Cube special coverage of Google Next 2017. This is the Cube's two days of live coverage here in Palo Alto studio. We have reporters and analysts on the ground. We have all the Wikibon analysts in San Francisco. Have been up there since Monday for the Google analyst summit. As well as reporters at the keynote. We're going to be going live to folks on the ground for a reaction and commentary from the keynotes. As well as all the big break outs and news coverage. Again, two days of live coverage and we want to put a shout out to Intel for their sponsorship and allowing us to do the two days of in depth coverage. Really breaking down the Cloud. And really talking about this new mega trend around Cloud service providers where it's a multi-cloud game, which is pretty clear that's happening. And then the SaaSification of the world with AI machine learning. Really changing the game on infrastructure, software development. This is the digital transformation. This is the May trend. And here to help kick off our two days of coverage is venture capitalist, Scott Raney, who's a partner at Redpoint Ventures, who has a lot of history in network software SaaS. Scott, thanks for joining us on the kickoff here. >> My pleasure. >> For our coverage. Yeah, the big story I on Google News is obviously Diane Green, great executive. She gets a lot of criticism for her presentation. Some people were saying it's a little bit sleepy, but she's got a folksy kind of, I call it the Berkeley kind of vibe, but she's super smart. She's a very cool person. But she came in from VMWare, which has a lot of chops in the enterprise so it's no surprise that Google Cloud is now marching heavily towards the enterprise. They have all the window dressing. You're seeing the all the check boxes next to the sales and marketing, some of the things that they're doing. But the end of the day, it's an AI machine learning at the center of all this. Where data and a new cloud developer or new developer market has been emerging very fast. They call it cloud native. You're investing in this space. Give me your thoughts on this because you guys have to look at the 20 mile stare down the road. Look at kind of that five year horizon or plus for investments whether it's early stage or what not, but you guys have done a lot with startups that have been successful. Twilio went public that you're on the board of. You have a lot of investments in there that are doing very, very well. The developers, the opportunities, what's your take as an investor writing big checks. >> Yeah, well I think Google is a really interesting way to start this conversation. Not just the Google Cloud platform, but Google as an entity. I think Google is frankly been defining about 10 years ahead of where enterprises are in terms of how they're thinking about building and deploying applications. And so, if you look at Google, the work they've done to actually support their internal efforts, these guys then create white papers, the white papers are then disseminated, and then a whole set of industries get kicked off around those. So obviously one of the great examples of that is what happen around Hadoop and that wave. I think what we're in the process of seeing right now is a whole series of innovations that are being developed around more kind of cloud native technologies. I think Kubernetes is a great example, which is really the outgrowth of work that Google had done around Borg. And so we spend a lot of time thinking about the work that Google's, the things that Google is working on now. Recognizing that's the future of enterprise computing. Obviously, it takes a while to get there. But, there have been massive industries you can create from that. >> And it's transformative too. Again, I mentioned Twillio. They went public. Great service. We saw Snap go public. They're now running on Google Cloud and some on AWS. There's game changing opportunities out there that are going to come out of these unique perspectives that developers and entrepreneurs might have. And say hey I'm going to innovate on camera technology. That becomes Snap, which becomes kind of a unique, weird app and then to a main stream. This is not a one off. I mean there's a lot happening around creative, young entrepreneurs and old, some guys our age. But either way, it's not just apps. It's transformation at the network level. All the way up to the top of the stack. >> Yeah. >> What are the trends around that? I mean because machine learning is obviously hot. What are you hearing for pitches? What's coming through your door? What are you looking at? You guys see a lot of deals. What's the trends that are coming out of there? >> Well, every pitch we see has machine learning in it. Every company has become an AI company at some level. So that's clearly a big trend. I think for us the way that we look at it in terms of investments is we're recognizing that the algorithms are really becoming commoditized in some level. And Google, with TensorFlow, is actually helping make that happen. As we just talked about, they're democratizing machine learning at some level. The key there is data. And so, when we look at these companies, we're looking for companies that have a unique, proprietary access to data that they can apply those algorithms to, deliver insight. I think one of the more interesting areas or applications around that we're seeing is in the SaaS space. Kind of upper level at the cloud space, how it's really not enough now to build a SaaS application that just automates a business process. What you have to do is deliver insights. You have to help make the people that are using these applications better at there job at some level and the way to do that is through things like machine learning. >> What's interesting, Peter Burris, who's one of our heads of research for Wikibon pointed out, last week when we we're covering Mobile World Congress, he goes it's interesting, you know years ago, when I was breaking into the business in the late 80s, early 90s, it was known processes, unknown technology, and those were automated. Now you have known technology and unknown processes. So getting those insights to get that discovery could really disrupt existing incumbents, big players. So someone can innovate, say hey, I'm going to innovate on a new process that's emerging. This seems to be the big trend that's going on and again the software model is changing. So how do you guys see entrepreneurs looking at the AI and are they that focused on that? Or do they see that? I mean what are the key areas? Do they actually say hey, I'm going to disrupt this marketplace with this one feature? We always hear the MVP or pick something and do it great. What are some of the things that you've seen? >> We're really seeing two things in the AI and ML space. We're seeing one is the general kind of platform play. People that are trying to actually offer machine learning to developers in some way, shape or form. And the reality is I think those are very difficult businesses to build. I think Google Cloud is actually extremely well positioned to be able to actually kind of drive that forward for developers based on all the work they've done internally and they way that cloud is built and architected. The second are applications are AI and ML. And that's where we're spending the vast majority of our time because we think that's where the most value we be created there for folks that don't own a cloud like Google. >> The thing that's interesting about entrepreneurs is it's been a nice thing, the cloud you can get into the game with open source and build a business. You don't have to get all the, provision the data center. That's kind of been talked about, it's not new news. Yeah, you can get up and running, but it's interesting. It was easy to get into the enterprise and then all of sudden now, as it gets more complicated, we're almost going back to the old days of it was really hard to crack the code in the enterprise. It seems to be a lot of new table stakes are emerging. It used to be could native, oh we're going to go to the enterprise. And you saw box.net, now being Box and Dropbox, they're getting in the enterprise very easily. But now, as we go I'd say post-2012, all these new requirements start to rear their ugly head around it's hard to get into the enterprise. So this is something that Google is certainly challenged with right now is that they have a lot of tech, they're serious about the enterprise, that's clear. But to be an enterprise contender and winner and winning deals, how hard is it to win the enterprise? And is that some that you see where the enterprise landscape has changed where it's harder or is it easier? What's your thoughts in the complexities in the enterprise? >> Yeah, I maybe have a different point of view than you do. Which is actually, I actually think it's actually easier now to penetrate the enterprise at some level than it ever has been before. But it has to start with product. And open source is an incredible phenomenon that we're seeing that's kind of overtaking the way that enterprises think about building infrastructure today. I don't think you can build an infrastructure company unless you're offering it as open source software. And so, what we look for in terms of investments and I think what entrepreneurs need to do is think about how do I build products that enterprises will love and release that as open source and open to see some level of adoption. When you see that then that's the best path to be able to go in and sell to them and building revenue around it. Kind of transitioning back to Google and what they're doing with the cloud effort, I think that their approach is actually, it's intriguing. You know, Diane is a world class executive in this way and, you know, I think brought in the last big transition that we've seen through the work she did with virtualization. And I wouldn't bet against her here. I think the things that those guys are doing is offering a pretty compelling set of higher level services now that are getting traction with things like BigQuery. I think TensorFlow is obviously very interesting. And then what they now announced recently with Spanner as a service. These are all technologies that Google understands and mastered and are very compelling technologies that I think the average developer will want. And they are highly differentiated from the services that are available from the Amazon's and Microsofts' of the world. >> Yeah, Spanner certainly got that horizontally-scalable mojo going on. They still got some work to do outside of MySQL and there on the relational database side, which we're watching. But they know that. I mean Google is clearly not saying they're, you know, fully-baked. They're actually candid in the analyst meeting. They were very candid on the security side and very candid on some of these things that they know they've got to do. But they are peddling as fast as they can. So I got to ask you the venture capital question. Developers are out there. Because there was a line, literally a blockbuster as they called it. People around the block to get in. Google IO had similar attraction. Those events are awesome. Google runs great events. They have, I would call them the technology store. People love to go in there and see what they have. But as an entrepreneur coming in, I'm going to build on a stack, whether it's Amazon or Google or somewhere else, you got to worry about the viability when you have the big gorillas out there. You got Amazon, now Google. What's the formula for and what do you worry about as an investor because the things you must think about is okay, what's the approach, where's the viability, is there a marketplace, is there monetization, can they get traction, can they go beyond the first three million in sales, because SaaS you can get there pretty quickly, as it's been discussed. What are the fears that you worry about and what advice would you give entrepreneurs as they start to start really innovating and saying hey I'm going to take the democratization of AI and I'm going to do some damage. I want to enter a market. These are considerations that you got to think about and you, as an investor, where's the risk? And what's the opportunity? >> Oh man, well there are lots of risks starting a company. We could talk for an hour about the challenges associated with being an entrepreneur. It's probably the hardest job you can imagine having. You know I think that the first and foremost is you got to build products that people love. And you got to solve a real problem. And so, I think for us as investors, we look for that. It's different now in enterprise investing in infrastructure than before where there used to be 10, 20 million dollar efforts required to build the technology and then you take it to the enterprise. And you would hope that it would sell. Now, with a couple million dollars, you have the ability to go out and write some compelling software, release it in the open source and see whether or not it gets traction. And then, really the challenge is figuring out whether you can monetize that or not, right. And in today's model, that's really where we struggle. It's ultimately in how you ultimately package this and sell it. I think that the primary models that we're seeing are either some form of upsell on open source, so either service support, open core, or an enterprise grade application built on top of the open source. The other alternative is to deliver it as a service. And we see lots of folks that are taking that open source and saying we're going to run this as a service. We have a company, a platform of mine, that does that for cribinetties, but there are companies like Data Bricks that are doing that for Spark and the whole data pipeline. And that is potentially a very compelling model too. >> Do you have a formula or an algorithm for investment? I remember talking to Jeremy Lu way back in the day and I just saw him in an interview on Snapchat, was an investor and he actually jumped into the stats with Evan Spiegel and saw the traction cause he was skeptical. A lot of people had passed on it, but you know that story. Is there an algom that you look for besides the team and being an exceptional team of people, you know technical chops and product chops. Is there a way that you look at to identify traction in this marketplace because it could be, there's a lot of turbulence, mircoservices, you got Kubernetes, another Google innovation that's kind of becoming a glue layer if you will across services. Is there a way to say oh that's got traction, I like that? Or here's some benchmarks that I look for for hurdles in ventures. >> Yeah, within this infrastructure space primarily around models that are going to be delivered as open source, there's a couple things that we can look at. We'll track GitHub stars and so we'll get a sense from that how the community views this. Whether this is something that they are particularly interested in and the level of traction they're getting within that community. It's almost like that is almost like a stamp of approval from the technology community that says this is a really cool project, right? And then, beyond that you start to look at download volumes. And to understand just how widespread the adoption of this technology is. Those are imperfect metrics, you know. And so, a lot of times it comes back to >> Market forces or whatever. >> Switching gears and looking at the customers and asking them the kinds of problems they are experiencing and whether or not these technologies have a chance to actually address real long standing challenges that they've had in either building or deploying or running applications. And so, it's different than consumer. Yeah, consumer is a little bit easier to measure. And you have a lot of data. Consumer has it's own challenges and it's very difficult to kind of predict a priority or what's going to be successful. But the good news for us is that with high-quality teams, these guys typically know where to focus and where to spend time and ultimately will be able to create it. >> And customer traction is always a great one to look at. I mean sell the data points. Scott Raney, what's new with Redpoint Ventures? Give a quick plug for what you guys are doing, what you're investing in, size of the fund, how much dry powder you have as they say. Are you still writing checks? What kind of checks? >> We are in business and we're looking for great entrepreneurs. So we have two funds. One is a 400 million dollar early stage fund that focuses primarily on Series A and an occasional Series B. And then we have a 400 million dollar early growth fund that is really more an occasional Series B and Series C. You know our attitude to the entrepreneurs is they should be indifferent to which fund they're in. We treat every investment the same. Really, we just want to be a part of great companies and get a chance to work with great entrepreneurs. >> And you guys also sponsored the party last night with the CNCF After Cloud Native Compute Foundation. >> Yeah. >> How'd that go? What were some of the conversations in the hall way there? Or in the hall way, in the event, it was a social event, but you know great community, the CNCF After Development. A couple new projects emerging. >> They've done some great work. And the projects that are coming in represent a lot of the foundation work that's going to be required to build cloud native applications. The first thing we did at this event last night is try to find what cloud native actually is. (laughs) And I think everybody has a different definition for that. >> What's the most common one? Is there a trend pattern in there? >> Yeah, I think people were saying these are applications that are built, traditionally built, using containers. They're built leveraging microservices. And they are built with the assumption that the underlying infrastructure is going to be ephemeral in some way. So you know built... >> And you have a pony in that game with Azicorp so update on those guys? >> It's a company that is doing extremely well and solving a broad set of problems around helping developers build and run applications on top of the cloud and I think what were setting there and we're seeing kind of across the board is a general desire to start to think about multi-cloud. To start to understand what it takes to actually deploy applications and run applications across multiple clouds. And also to be more agnostic about what they underlying substrate looks like. And those are trends that bode well for Google and Microsoft. >> Yeah, we're excited, we're going to be watching. Scott, thanks for coming on. We're going to be watching that. Kubernetes, that orchestration layer that's going on around microservices that's a hot I'd say battleground around innovation, a lot of good things happening there. Great opportunities when there's a lot of turbulence. Great opportunities to invest. Good luck with your investments. Scott Raney, partner at Redpoint Ventures. Very active in the community. A great VC, check him out. It's the Cube two days of live coverage all day. Going to 4:30, 5:00 pm today. And then tomorrow, Thursday. And then we're off to South by Southwest again. More coverage, we wrap with more coverage after the short break.

Published Date : Mar 8 2017

SUMMARY :

You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. You are Cube alumni. Narrator: Live from the Silicon Valley, This is the Cube's two days of live coverage I call it the Berkeley kind of vibe, And so, if you look at Google, that are going to come out of these unique perspectives What are the trends around that? You have to help make the people What are some of the things that you've seen? And the reality is I think And is that some that you see where and Microsofts' of the world. What are the fears that you worry about It's probably the hardest job you can imagine having. and saw the traction cause he was skeptical. around models that are going to be delivered as open source, And you have a lot of data. I mean sell the data points. You know our attitude to the entrepreneurs And you guys also sponsored the party last night Or in the hall way, in the event, it was a social event, And the projects that are coming in that the underlying infrastructure And also to be more agnostic about what they underlying It's the Cube two days of live coverage all day.

ENTITIES

Entity	Category	Confidence
Diane Green	PERSON	0.99+
Peter Burris	PERSON	0.99+
Redpoint Ventures	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Jeremy Lu	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Evan Spiegel	PERSON	0.99+
Scott Raney	PERSON	0.99+
Diane	PERSON	0.99+
Google	ORGANIZATION	0.99+
Scott	PERSON	0.99+
two days	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
Microsofts'	ORGANIZATION	0.99+
Reggie Jackson	PERSON	0.99+
20 mile	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
Twilio	ORGANIZATION	0.99+
Azicorp	ORGANIZATION	0.99+
Data Bricks	ORGANIZATION	0.99+
Dropbox	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
last week	DATE	0.99+
AWS	ORGANIZATION	0.99+
Series A	OTHER	0.99+
Series B	OTHER	0.99+

Nick Pentreath, IBM STC - Spark Summit East 2017 - #sparksummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is The Cube, covering Spark Summit East 2017. Brought to you by Data Bricks. Now, here are your hosts, Dave Valente and George Gilbert. >> Boston, everybody. Nick Pentry this year, he's a principal engineer a the IBM Spark Technology Center in South Africa. Welcome to The Cube. >> Thank you. >> Great to see you. >> Great to see you. >> So let's see, it's a different time of year, here that you're used to. >> I've flown from, I don't know the Fahrenheit's equivalent, but 30 degrees Celsius heat and sunshine to snow and sleet, so. >> Yeah, yeah. So it's a lot chillier there. Wait until tomorrow. But, so we were joking. You probably get the T-shirt for the longest flight here, so welcome. >> Yeah, I actually need the parka, or like a beanie. (all laugh) >> Little better. Long sleeve. So Nick, tell us about the Spark Technology Center, STC is its acronym and your role, there. >> Sure, yeah, thank you. So Spark Technology Center was formed by IBM a little over a year ago, and its mission is to focus on the Open Source world, particularly Apache Spark and the ecosystem around that, and to really drive forward the community and to make contributions to both the core project and the ecosystem. The overarching goal is to help drive adoption, yeah, and particularly enterprise customers, the kind of customers that IBM typically serves. And to harden Spark and to make it really enterprise ready. >> So why Spark? I mean, we've watched IBM do this now for several years. The famous example that I like to use is Linux. When IBM put $1 billion into Linux, it really went all in on Open Source, and it drove a lot of IBM value, both internally and externally for customers. So what was it about Spark? I mean, you could have made a similar bet on Hadoop. You decided not to, you sort of waited to see that market evolve. What was the catalyst for having you guys all go in on Spark? >> Yeah, good question. I don't know all the details, certainly, of what was the internal drivers because I joined HTC a little under a year ago, so I'm fairly new. >> Translate the hallway talk, maybe. (Nick laughs) >> Essentially, I think you raise very good parallels to Linux and also Java. >> Absolutely. >> So Spark, sorry, IBM, made these investments and Open Source technologies that had ceased to be transformational and kind of game-changing. And I think, you know, most people will probably admit within IBM that they maybe missed the boat, actually, on Hadoop and saw Spark as the successor and actually saw a chance to really dive into that and kind of almost leap frog and say, "We're going to "back this as the next generation analytics platform "and operating system for analytics "and big debt in the enterprise." >> Well, I don't know if you happened to watch the Super Bowl, but there's a saying that it's sometimes better to be lucky than good. (Nick laughs) And that sort of applies, and so, in some respects, maybe missing the window on Hadoop was not a bad thing for IBM >> Yeah, exactly because not a lot of people made a ton of dough on Hadoop and they're still sort of struggling to figure it out. And now along comes Spark, and you've got this more real time nature. IBM talks a lot about bringing analytics and transactions together. They've made some announcements about that and affecting business outcomes in near real time. I mean, that's really what it's all about and one of your areas of expertise is machine learning. And so, talk about that relationship and what it means for organizations, your mission. >> Yeah, machine learning is a key part of the mission. And you've seen the kind of big debt in enterprise story, starting with the kind of Hadoop and data lakes. And that's evolved into, now we've, before we just dumped all of this data into these data lakes and these silos and maybe we had some Hadoop jobs and so on. But now we've got all this data we can store, what are we actually going to do with it? So part of that is the traditional data warehousing and business intelligence and analytics, but more and more, we're seeing there's a rich value in this data, and to unlock it, you really need intelligent systems. You need machine learning, you need AI, you need real time decision making that starts transcending the boundaries of all the rule-based systems and human-based systems. So we see machine learning as one of the key tools and one of the key unlockers of value in these enterprise data stores. >> So Nick, perhaps paint us a picture of someone who's advanced enough to be working with machine learning with BMI and we know that the tool chain's kind of immature. Although, IBM with Data Works or Data First has a fairly broad end-to-end sort of suit of tools, but what are the early-use cases? And what needs to mature to go into higher volume production apps or higher-value production apps? >> I think the early-use cases for machine learning in general and certainly at scale are numerous and they're growing, but classic examples are, let's say, recommendation engines. That's an area that's close to my heart. In my previous life before IBM, I bought the startup that had a recommendation engine service targeting online stores and new commerce players and social networks and so on. So this is a great kind of example use case. We've got all this data about, let's say, customer behavior in your retail store or your video-sharing site, and in order to serve those customers better and make more money, if you can make good recommendations about what they should buy, what they should watch, or what they should listen to, that's a classic use case for machine learning and unlocking the data that is there, so that is one of the drivers of some of these systems, players like Amazon, they're sort of good examples of the recommendation use case. Another is fraud detection, and that is a classic example in financial services, enterprise, which is a kind of staple of IBM's customer base. So these are a couple of examples of the use cases, but the tool sets, traditionally, have been kind of cumbersome. So Amazon bought everything from scratch themselves using customized systems, and they've got teams and teams of people. Nowadays, you've got this bold into Apache Spark, you've got it in Spark, a machine learning library, you've got good models to do that kind of thing. So I think from an algorithmic perspective, there's been a lot of advancement and there's a lot of standardization and almost commoditization of the model side. So what is missing? >> George: Yeah, what else? >> And what are the shortfalls currently? So there's a big difference between the current view, I guess the hype of the machine learning as you've got data, you apply some machine learning, and then you get profit, right? But really, there's a hugely complex workflow that involves this end-to-end story. You've got data coming from various data sources, you have to feed it into one centralized system, transform and process it, extract your features and do your sort of hardcore data signs, which is the core piece that everyone sort of thinks about as the only piece, but that's kind of in the middle and it makes up a relatively small proportion of the overall chain. And once you've got that, you do model training and selection testing, and you now have to take that model, that machine-learning algorithm and you need to deploy it into a real system to make real decisions. And that's not even the end of it because once you've got that, you need to close the loop, what we call the feedback loop, and you need to monitor the performance of that model in the real world. You need to make sure that it's not deteriorating, that it's adding business value. All of these ind of things. So I think that is the real, the piece of the puzzle that's missing at the moment is this end-to-end, delivering this end-to-end story and doing it at scale, securely, enterprise-grade. >> And the business impact of that presumably will be a better-quality experience. I mean, recommendation engines and fraud detection have been around for a while, they're just not that good. Retargeting systems are too little too late, and kind of cumbersome fraud detection. Still a lot of false positives. Getting much better, certainly compressing the time. It used to be six months, >> Yes, yes. Now it's minutes or second, but a lot of false positives still, so, but are you suggesting that by closing that gap, that we'll start to see from a consumer standpoint much better experiences? >> Well, I think that's imperative because if you don't see that from a consumer standpoint, then the mission is failing because ultimately, it's not magic that you just simply throw machine learning at something and you unlock business value and everyone's happy. You have to, you know, there's a human in the loop, there. You have to fulfill the customer's need, you have to fulfill consumer needs, and the better you do that, the more successful your business is. You mentioned the time scale, and I think that's a key piece, here. >> Yeah. >> What makes better decisions? What makes a machine-learning system better? Well, it's better data and more data, and faster decisions. So I think all of those three are coming into play with Apache Spark, end-to-end's story streaming systems, and the models are getting better and better because they're getting more data and better data. >> So I think we've, the industry, has pretty much attacked the time problem. Certainly for fraud detection and recommendation systems the quality issue. Are we close? I mean, are we're talking about 6-12 months before we really sort of start to see a major impact to the consumer and ultimately, to the company who's providing those services? >> Nick: Well, >> Or is it further away than that, you think? >> You know, it's always difficult to make predictions about timeframes, but I think there's a long way to go to go from, yeah, as you mentioned where we are, the algorithms and the models are quite commoditized. The time gap to make predictions is kind of down to this real-time nature. >> Yeah. >> So what is missing? I think it's actually less about the traditional machine-learning algorithms and more about making the systems better and getting better feedback, better monitoring, so improving the end user's experience of these systems. >> Yeah. >> And that's actually, I don't think it's, I think there's a lot of work to be done. I don't think it's a 6-12 month thing, necessarily. I don't think that in 12 months, certainly, you know, everything's going to be perfectly recommended. I think there's areas of active research in the kind of academic fields of how to improve these things, but I think there's a big engineering challenge to bring in more disparate data sources, to better, to improve data quality, to improve these feedback loops, to try and get systems that are serving customer needs better. So improving recommendations, improving the quality of fraud detection systems. Everything from that to medical imaging and counter detection. I think we've got a long way to go. >> Would it be fair to say that we've done a pretty good job with traditional application lifecycle in terms of DevOps, but we now need the DevOps for the data scientists and their collaborators? >> Nick: Yeah, I think that's >> And where is BMI along that? >> Yeah, that's a good question, and I think you kind of hit the nail on the head, that the enterprise applied machine learning problem has moved from the kind of academic to the software engineering and actually, DevOps. Internally, someone mentioned the word train ops, so it's almost like, you know, the machine learning workflow and actually professionalizing and operationalizing that. So recently, IBM, for one, has announced what's in data platform and now, what's in machine learning. And that really tries to address that problem. So really, the aim is to simplify and productionize these end-to-end machine-learning workflows. So that is the product push that IBM has at the moment. >> George: Okay, that's helpful. >> Yeah, and right. I was at the Watson data platform announcement you call the Data Works. I think they changed the branding. >> Nick: Yeah. >> It looked like there were numerous components that IBM had in its portfolio that's now strung together. And to create that end-to-end system that you're describing. Is that a fair characterization, or is it underplaying? I'm sure it is. The work that went into it, but help us maybe understand that better. >> Yeah, I should caveat it by saying we're fairly focused, very focused at HTC on the Open Source side of things, So my work is predominately within the Apache Spark project and I'm less involved in the data bank. >> Dave: So you didn't contribute specifically to Watson data platform? >> Not to the product line, so, you know, >> Yeah, so its really not an appropriate question for you? >> I wouldn't want to kind of, >> Yeah. >> To talk too deeply about it >> Yeah, yeah, so that, >> Simply because I haven't been involved. >> Yeah, that's, I don't want to push you on that because it's not your wheelhouse, but then, help me understand how you will commercialize the activities that you do, or is that not necessarily the intent? >> So the intent with HTC particularly is that we focus on Open Source and a core part of that is that we, being within IBM, we have the opportunity to interface with other product groups and customer groups. >> George: Right. >> So while we're not directly focused on, let's say, the commercial aspect, we want to effectively leverage the ability to talk to real-world customers and find the use cases, talk to other product groups that are building this Watson data platform and all the product lines and the features, data sans experience, it's all built on top of Apache Apache Spark and platform. >> Dave: So your role is really to innovate? >> Exactly, yeah. >> Leverage and Open Source and innovate. >> Both innovate and kind of improve, so improve performance improve efficiency. When you are operating at the scale of a company such as IBM and other large players, your customers and you as product teams and builders of products will come into contact with all the kind of little issues and bugs >> Right. >> And performance >> Make it better. Problems, yeah. And that is the feedback that we take on board and we try and make it better, not just for IBM and their customers. Because it's an Apache product and everyone benefits. So that's really the idea. Take all the feedback and learnings from enterprise customers and product groups and centralize that in the Open Source contributions that we make. >> Great. Would it be, so would it be fair to say you're focusing on making the core Spark, Spark ML and Spark ML Lib capabilities sort of machine learning libraries and in the pipeline, more robust? >> Yes. >> And if that's the case, we know there needs to be improvements in its ability to serve predictions in real time, like high speed. We know there's a need to take the pipeline and sort of share it with other tools, perhaps. Or collaborate with other tool chains. >> Nick: Yeah. >> What are some of the things that the Enterprise customers are looking for along the lines? >> Yeah, that's a great question and very topical at the moment. So both from an Open Source community perspective and Enterprise customer perspective, this is one of the, if not the key, I think, kind of missing pieces within the Spark machine-learning kind of community at the moment, and it's one of the things that comes up most often. So it is a missing piece, and we as a community need to work together and decide, is this something that we built within Spark and provide that functionality? Is is something where we try and adopt open standards that will benefit everybody and that provides a kind of one standardized format, or way or serving models? Or is it something where there's a few Open Source projects out there that might serve for this purpose, and do we get behind those? So I don't have the answer because this is ongoing work, but it's definitely one of the most critical kind of blockers, or, let's say, areas that needs work at the moment. >> One quick question, then, along those lines. IBM, the first thing IBM contributed to the Spark community was Spark ML, which is, as I understand it, it was an ability to, I think, create an ensemble sort of set of models to do a better job or create a more, >> So are you referring to system ML, I think it is? >> System ML. >> System ML, yeah, yeah. >> What are they, I forgot. >> Yeah, so, so. >> Yeah, where does that fit? >> System ML started out as a IBM research project and perhaps the simplest way to describe it is, as a kind of sequel optimizer is to take sequel queries and decide how to execute them in the most efficient way, system ML takes a kind of high-level mathematical language and compiles it down to a execution plan that runs in a distributed system. So in much the same way as your sequel operators allow this very flexible and high-level language, you don't have to worry about how things are done, you just tell the system what you want done. System ML aims to do that for mathematical and machine learning problems, so it's now an Apache project. It's been donated to Open Source and it's an incubating project under very active development. And that is really, there's a couple of different aspects to it, but that's the high-level goal. The underlying execution engine is Spark. It can run on Hadoop and it can run locally, but really, the main focus is to execute on Spark and then expose these kind of higher level APRs that are familiar to users of languages like R and Python, for example, to be able to write their algorithms and not necessarily worry about how do I do large scale matrix operations on a cluster? System ML will compile that down and execute that for them. >> So really quickly, follow up, what that means is if it's a higher level way for people who sort of cluster aware to write machine-learning algorithms that are cluster aware? >> Nick: Precisely, yeah. >> That's very, very valuable. When it works. >> When it works, yeah. So it does, again, with the caveat that I'm mostly focused on Spark and not so much the System ML side of things, so I'm definitely not an expert. I don't claim to be an expert in it. But it does, you know, it works at the moment. It works for a large class of machine-learning problems. It's very powerful, but again, it's a young project and there's always work to be done, so exactly the areas that I know that they're focusing on are these areas of usability, hardening up the APRs and making them easier to use and easier to access for users coming from the R and Python communities who, again are, as you said, they're not necessarily experts on distributed systems and cluster awareness, but they know how to write a very complex machine-learning model in R, for example. And it's really trying to enable them with a set of APR tools. So in terms of the underlying engine, they are, I don't know how many hundreds of thousands, millions of lines of code and years and years of research that's gone into that, so it's an extremely powerful set of tools. But yes, a lot of work still to be done there and ongoing to make it, in a way to make it user ready and Enterprise ready in a sense of making it easier for people to use it and adopt it and to put it into their systems and production. >> So I wonder if we can close, Nick, just a few questions on STC, so the Spark Technology Centers in Cape Town, is that a global expertise center? Is is STC a virtual sort of IBM community, or? >> I'm the only member visiting Cape Town, >> David: Okay. >> So I'm kind of fairly lucky from that perspective, to be able to kind of live at home. The rest of the team is mostly in San Francisco, so there's an office there that's co-located with the Watson west office >> Yeah. >> And Watson teams >> Sure. >> That are based there in Howard Street, I think it is. >> Dave: How often do you get there? >> I'll be there next week. >> Okay. >> So I typically, sort of two or three times a year, I try and get across there >> Right. And interface with the team, >> So, >> But we are a fairly, I mean, IBM is obviously a global company, and I've been surprised actually, pleasantly surprised there are team members pretty much everywhere. Our team has a few scattered around including me, but in general, when we interface with various teams, they pop up in all kinds of geographical locations, and I think it's great, you know, a huge diversity of people and locations, so. >> Anything, I mean, these early days here, early day one, but anything you saw in the morning keynotes or things you hope to learn here? Anything that's excited you so far? >> A couple of the morning keynotes, but had to dash out to kind of prepare for, I'm doing a talk later, actually on feature hashing for scalable machine learning, so that's at 12:20, please come and see it. >> Dave: A breakout session, it's at what, 12:20? >> 20 past 12:00, yeah. >> Okay. >> So in room 302, I think, >> Okay. >> I'll be talking about that, so I needed to prepare, but I think some of the key exciting things that I have seen that I would like to go and take a look at are kind of related to the deep learning on Spark. I think that's been a hot topic recently in one of the areas, again, Spark is, perhaps, hasn't been the strongest contender, let's say, but there's some really interesting work coming out of Intel, it looks like. >> They're talking here on The Cube in a couple hours. >> Yeah. >> Yeah. >> I'd really like to see their work. >> Yeah. >> And that sounds very exciting, so yeah. I think every time I come to a Spark summit, they always need projects from the community, various companies, some of them big, some of them startups that are pushing the envelope, whether it's research projects in machine learning, whether it's adding deep learning libraries, whether it's improving performance for kind of commodity clusters or for single, very powerful single modes, there's always people pushing the envelope, and that's what's great about being involved in an Open Source community project and being part of those communities, so yeah. That's one of the talks that I would like to go and see. And I think I, unfortunately, had to miss some of the Netflix talks on their recommendation pipeline. That's always interesting to see. >> Dave: Right. >> But I'll have to check them on the video (laughs). >> Well, there's always another project in Open Source land. Nick, thanks very much for coming on The Cube and good luck. Cool, thanks very much. Thanks for having me. >> Have a good trip, stay warm, hang in there. (Nick laughs) Alright, keep it right there. My buddy George and I will be back with our next guest. We're live. This is The Cube from Sparks Summit East, #sparksummit. We'll be right back. (upbeat music) (gentle music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by Data Bricks. a the IBM Spark Technology Center in South Africa. So let's see, it's a different time of year, here I've flown from, I don't know the Fahrenheit's equivalent, You probably get the T-shirt for the longest flight here, need the parka, or like a beanie. So Nick, tell us about the Spark Technology Center, and the ecosystem. The famous example that I like to use is Linux. I don't know all the details, certainly, Translate the hallway talk, maybe. Essentially, I think you raise very good parallels and kind of almost leap frog and say, "We're going to and so, in some respects, maybe missing the window on Hadoop and they're still sort of struggling to figure it out. So part of that is the traditional data warehousing So Nick, perhaps paint us a picture of someone and almost commoditization of the model side. And that's not even the end of it And the business impact of that presumably will be still, so, but are you suggesting that by closing it's not magic that you just simply throw and the models are getting better and better attacked the time problem. to go from, yeah, as you mentioned where we are, and more about making the systems better So improving recommendations, improving the quality So really, the aim is to simplify and productionize Yeah, and right. And to create that end-to-end system that you're describing. and I'm less involved in the data bank. So the intent with HTC particularly is that we focus leverage the ability to talk to real-world customers and you as product teams and builders of products and centralize that in the Open Source contributions sort of machine learning libraries and in the pipeline, And if that's the case, So I don't have the answer because this is ongoing work, IBM, the first thing IBM contributed to the Spark community but really, the main focus is to execute on Spark When it works. and ongoing to make it, in a way to make it user ready So I'm kind of fairly lucky from that perspective, And interface with the team, and I think it's great, you know, A couple of the morning keynotes, but had to dash out are kind of related to the deep learning on Spark. that are pushing the envelope, whether it's research and good luck. My buddy George and I will be back with our next guest.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Valente	PERSON	0.99+
George	PERSON	0.99+
Dave	PERSON	0.99+
Nick Pentreath	PERSON	0.99+
Howard Street	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Nick Pentry	PERSON	0.99+
$1 billion	QUANTITY	0.99+
Nick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
HTC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Cape Town	LOCATION	0.99+
South Africa	LOCATION	0.99+
Java	TITLE	0.99+
Linux	TITLE	0.99+
12 months	QUANTITY	0.99+
six months	QUANTITY	0.99+
next week	DATE	0.99+
Boston	LOCATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
IBM Spark Technology Center	ORGANIZATION	0.99+
BMI	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
12:20	DATE	0.99+
three	QUANTITY	0.99+
6-12 month	QUANTITY	0.99+
Watson	ORGANIZATION	0.98+
tomorrow	DATE	0.98+
Spark Technology Center	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Spark Technology Centers	ORGANIZATION	0.98+
this year	DATE	0.97+
Hadoop	TITLE	0.97+
hundreds of thousands	QUANTITY	0.97+
both	QUANTITY	0.97+
30 degrees Celsius	QUANTITY	0.97+
Data First	ORGANIZATION	0.97+
Super Bowl	EVENT	0.97+
single	QUANTITY	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for data bricks: