Ed Walsh & Thomas Hazel | A New Database Architecture for Supercloud
(bright music) >> Hi, everybody, this is Dave Vellante, welcome back to Supercloud 2. Last August, at the first Supercloud event, we invited the broader community to help further define Supercloud, we assessed its viability, and identified the critical elements and deployment models of the concept. The objectives here at Supercloud too are, first of all, to continue to tighten and test the concept, the second is, we want to get real world input from practitioners on the problems that they're facing and the viability of Supercloud in terms of applying it to their business. So on the program, we got companies like Walmart, Sachs, Western Union, Ionis Pharmaceuticals, NASDAQ, and others. And the third thing that we want to do is we want to drill into the intersection of cloud and data to project what the future looks like in the context of Supercloud. So in this segment, we want to explore the concept of data architectures and what's going to be required for Supercloud. And I'm pleased to welcome one of our Supercloud sponsors, ChaosSearch, Ed Walsh is the CEO of the company, with Thomas Hazel, who's the Founder, CTO, and Chief Scientist. Guys, good to see you again, thanks for coming into our Marlborough studio. >> Always great. >> Great to be here. >> Okay, so there's a little debate, I'm going to put you right in the spot. (Ed chuckling) A little debate going on in the community started by Bob Muglia, a former CEO of Snowflake, and he was at Microsoft for a long time, and he looked at the Supercloud definition, said, "I think you need to tighten it up a little bit." So, here's what he came up with. He said, "A Supercloud is a platform that provides a programmatically consistent set of services hosted on heterogeneous cloud providers." So he's calling it a platform, not an architecture, which was kind of interesting. And so presumably the platform owner is going to be responsible for the architecture, but Dr. Nelu Mihai, who's a computer scientist behind the Cloud of Clouds Project, he chimed in and responded with the following. He said, "Cloud is a programming paradigm supporting the entire lifecycle of applications with data and logic natively distributed. Supercloud is an open architecture that integrates heterogeneous clouds in an agnostic manner." So, Ed, words matter. Is this an architecture or is it a platform? >> Put us on the spot. So, I'm sure you have concepts, I would say it's an architectural or design principle. Listen, I look at Supercloud as a mega trend, just like cloud, just like data analytics. And some companies are using the principle, design principles, to literally get dramatically ahead of everyone else. I mean, things you couldn't possibly do if you didn't use cloud principles, right? So I think it's a Supercloud effect, you're able to do things you're not able to. So I think it's more a design principle, but if you do it right, you get dramatic effect as far as customer value. >> So the conversation that we were having with Muglia, and Tristan Handy of dbt Labs, was, I'll set it up as the following, and, Thomas, would love to get your thoughts, if you have a CRM, think about applications today, it's all about forms and codifying business processes, you type a bunch of stuff into Salesforce, and all the salespeople do it, and this machine generates a forecast. What if you have this new type of data app that pulls data from the transaction system, the e-commerce, the supply chain, the partner ecosystem, et cetera, and then, without humans, actually comes up with a plan. That's their vision. And Muglia was saying, in order to do that, you need to rethink data architectures and database architectures specifically, you need to get down to the level of how the data is stored on the disc. What are your thoughts on that? Well, first of all, I'm going to cop out, I think it's actually both. I do think it's a design principle, I think it's not open technology, but open APIs, open access, and you can build a platform on that design principle architecture. Now, I'm a database person, I love solving the database problems. >> I'm waited for you to launch into this. >> Yeah, so I mean, you know, Snowflake is a database, right? It's a distributed database. And we wanted to crack those codes, because, multi-region, multi-cloud, customers wanted access to their data, and their data is in a variety of forms, all these services that you're talked about. And so what I saw as a core principle was cloud object storage, everyone streams their data to cloud object storage. From there we said, well, how about we rethink database architecture, rethink file format, so that we can take each one of these services and bring them together, whether distributively or centrally, such that customers can access and get answers, whether it's operational data, whether it's business data, AKA search, or SQL, complex distributed joins. But we had to rethink the architecture. I like to say we're not a first generation, or a second, we're a third generation distributed database on pure, pure cloud storage, no caching, no SSDs. Why? Because all that availability, the cost of time, is a struggle, and cloud object storage, we think, is the answer. >> So when you're saying no caching, so when I think about how companies are solving some, you know, pretty hairy problems, take MySQL Heatwave, everybody thought Oracle was going to just forget about MySQL, well, they come out with Heatwave. And the way they solve problems, and you see their benchmarks against Amazon, "Oh, we crush everybody," is they put it all in memory. So you said no caching? You're not getting performance through caching? How is that true, and how are you getting performance? >> Well, so five, six years ago, right? When you realize that cloud object storage is going to be everywhere, and it's going to be a core foundational, if you will, fabric, what would you do? Well, a lot of times the second generation say, "We'll take it out of cloud storage, put in SSDs or something, and put into cache." And that adds a lot of time, adds a lot of costs. But I said, what if, what if we could actually make the first read hot, the first read distributed joins and searching? And so what we went out to do was said, we can't cache, because that's adds time, that adds cost. We have to make cloud object storage high performance, like it feels like a caching SSD. That's where our patents are, that's where our technology is, and we've spent many years working towards this. So, to me, if you can crack that code, a lot of these issues we're talking about, multi-region, multicloud, different services, everybody wants to send their data to the data lake, but then they move it out, we said, "Keep it right there." >> You nailed it, the data gravity. So, Bob's right, the data's coming in, and you need to get the data from everywhere, but you need an environment that you can deal with all that different schema, all the different type of technology, but also at scale. Bob's right, you cannot use memory or SSDs to cache that, that doesn't scale, it doesn't scale cost effectively. But if you could, and what you did, is you made object storage, S3 first, but object storage, the only persistence by doing that. And then we get performance, we should talk about it, it's literally, you know, hundreds of terabytes of queries, and it's done in seconds, it's done without memory caching. We have concepts of caching, but the only caching, the only persistence, is actually when we're doing caching, we're just keeping another side-eye track of things on the S3 itself. So we're using, actually, the object storage to be a database, which is kind of where Bob was saying, we agree, but that's what you started at, people thought you were crazy. >> And maybe make it live. Don't think of it as archival or temporary space, make it live, real time streaming, operational data. What we do is make it smart, we see the data coming in, we uniquely index it such that you can get your use cases, that are search, observability, security, or backend operational. But we don't have to have this, I dunno, static, fixed, siloed type of architecture technologies that were traditionally built prior to Supercloud thinking. >> And you don't have to move everything, essentially, you can do it wherever the data lands, whatever cloud across the globe, you're able to bring it together, you get the cost effectiveness, because the only persistence is the cheapest storage persistent layer you can buy. But the key thing is you cracked the code. >> We had to crack the code, right? That was the key thing. >> That's where the plans are. >> And then once you do that, then everything else gets easier to scale, your architecture, across regions, across cloud. >> Now, it's a general purpose database, as Bob was saying, but we use that database to solve a particular issue, which is around operational data, right? So, we agree with Bob's. >> Interesting. So this brings me to this concept of data, Jimata Gan is one of our speakers, you know, we talk about data fabric, which is a NetApp, originally NetApp concept, Gartner's kind of co-opted it. But so, the basic concept is, data lives everywhere, whether it's an S3 bucket, or a SQL database, or a data lake, it's just a node on the data mesh. So in your view, how does this fit in with Supercloud? Ed, you've said that you've built, essentially, an enabler for that, for the data mesh, I think you're an enabler for the Supercloud-like principles. This is a big, chewy opportunity, and it requires, you know, a team approach. There's got to be an ecosystem, there's not going to be one Supercloud to rule them all, so where does the ecosystem fit into the discussion, and where do you fit into the ecosystem? >> Right, so we agree completely, there's not one Supercloud in effect, but we use Supercloud principles to build our platform, and then, you know, the ecosystem's going to be built on leveraging what everyone else's secret powers are, right? So our power, our superpower, based upon what we built is, we deal with, if you're having any scale, or cost effective scale issues, with data, machine generated data, like business observability or security data, we are your force multiplier, we will take that in singularly, just let it, simply put it in your object storage wherever it sits, and we give you uniformity access to that using OpenAPI access, SQL, or you know, Elasticsearch API. So, that's what we do, that's our superpower. So I'll play it into data mesh, that's a perfect, we are a node on a data mesh, but I'll play it in the soup about how, the ecosystem, we see it kind of playing, and we talked about it in just in the last couple days, how we see this kind of possibly. Short term, our superpowers, we deal with this data that's coming at these environments, people, customers, building out observability or security environments, or vendors that are selling their own Supercloud, I do observability, the Datadogs of the world, dot dot dot, the Splunks of the world, dot dot dot, and security. So what we do is we fit in naturally. What we do is a cost effective scale, just land it anywhere in the world, we deal with ingest, and it's a cost effective, an order of magnitude, or two or three order magnitudes more cost effective. Allows them, their customers are asking them to do the impossible, "Give me fast monitoring alerting. I want it snappy, but I want it to keep two years of data, (laughs) and I want it cost effective." It doesn't work. They're good at the fast monitoring alerting, we're good at the long-term retention. And yet there's some gray area between those two, but one to one is actually cheaper, so we would partner. So the first ecosystem plays, who wants to have the ability to, really, all the data's in those same environments, the security observability players, they can literally, just through API, drag our data into their point to grab. We can make it seamless for customers. Right now, we make it helpful to customers. Your Datadog, we make a button, easy go from Datadog to us for logs, save you money. Same thing with Grafana. But you can also look at ecosystem, those same vendors, it used to be a year ago it was, you know, its all about how can you grow, like it's growth at all costs, now it's about cogs. So literally we can go an environment, you supply what your customer wants, but we can help with cogs. And one-on one in a partnership is better than you trying to build on your own. >> Thomas, you were saying you make the first read fast, so you think about Snowflake. Everybody wants to talk about Snowflake and Databricks. So, Snowflake, great, but you got to get the data in there. All right, so that's, can you help with that problem? >> I mean we want simple in, right? And if you have to have structure in, you're not simple. So the idea that you have a simple in, data lake, schema read type philosophy, but schema right type performance. And so what I wanted to do, what we have done, is have that simple lake, and stream that data real time, and those access points of Search or SQL, to go after whatever business case you need, security observability, warehouse integration. But the key thing is, how do I make that click, click, click answer, and do it quickly? And so what we want to do is, that first read has to be fast. Why? 'Cause then you're going to do all this siloing, layers, complexity. If your first read's not fast, you're at a disadvantage, particularly in cost. And nobody says I want less data, but everyone has to, whether they say we're going to shorten the window, we're going to use AI to choose, but in a security moment, when you don't have that answer, you're in trouble. And that's why we are this service, this Supercloud service, if you will, providing access, well-known search, well-known SQL type access, that if you just have one access point, you're at a disadvantage. >> We actually talked about Snowflake and BigQuery, and a different platform, Data Bricks. That's kind of where we see the phase two of ecosystem. One is easy, the low-hanging fruit is observability and security firms. But the next one is, what we do, our super power is dealing with this messy data that schema is changing like night and day. Pipelines are tough, and it's changing all the time, but you want these things fast, and it's big data around the world. That's the next point, just use us alongside, or inside, one of their platforms, and now we get the best of both worlds. Our superpower is keeping this messy data as a streaming, okay, not a batch thing, allow you to do that. So, that's the second one. And then to be honest, the third one, which plays you to Supercloud, it also plays perfectly in the data mesh, is if you really go to the ultimate thing, what we have done is made object storage, S3, GCS, and blob storage, we made it a database. Put, get, complex query with big joins. You know, so back to your original thing, and Muglia teed it up perfectly, we've done that. Now imagine if that's an ecosystem, who would want that? If it's, again, it's uniform available across all the regions, across all the clouds, and it's right next to where you are building a service, or a client's trying, that's where the ecosystem, I think people are going to use Superclouds for their superpowers. We're really good at this, allows that short term. I think the Snowflakes and the Data Bricks are the medium term, you know? And then I think eventually gets to, hey, listen if you can make object storage fast, you can just go after it with simple SQL queries, or elastic. Who would want that? I think that's where people are going to leverage it. It's not going to be one Supercloud, and we leverage the super clouds. >> Our viewpoint is smart object storage can be programmable, and so we agree with Bob, but we're not saying do it here, do it here. This core, fundamental layer across regions, across clouds, that everyone has? Simple in. Right now, it's hard to get data in for access for analysis. So we said, simply, we'll automate the entire process, give you API access across regions, across clouds. And again, how do you do a distributed join that's fast? How do you do a distributed join that doesn't cost you an arm or a leg? And how do you do it at scale? And that's where we've been focused. >> So prior, the cloud object store was a niche. >> Yeah. >> S3 obviously changed that. How standard is, essentially, object store across the different cloud platforms? Is that a problem for you? Is that an easy thing to solve? >> Well, let's talk about it. I mean we've fundamentally, yeah we've extracted it, but fundamentally, cloud object storage, put, get, and list. That's why it's so scalable, 'cause it doesn't have all these other components. That complexity is where we have moved up, and provide direct analytical API access. So because of its simplicity, and costs, and security, and reliability, it can scale naturally. I mean, really, distributed object storage is easy, it's put-get anywhere, now what we've done is we put a layer of intelligence, you know, call it smart object storage, where access is simple. So whether it's multi-region, do a query across, or multicloud, do a query across, or hunting, searching. >> We've had clients doing Amazon and Google, we have some Azure, but we see Amazon and Google more, and it's a consistent service across all of them. Just literally put your data in the bucket of choice, or folder of choice, click a couple buttons, literally click that to say "that's hot," and after that, it's hot, you can see it. But we're not moving data, the data gravity issue, that's the other. That it's already natively flowing to these pools of object storage across different regions and clouds. We don't move it, we index it right there, we're spinning up stateless compute, back to the Supercloud concept. But now that allows us to do all these other things, right? >> And it's no longer just cheap and deep object storage. Right? >> Yeah, we make it the same, like you have an analytic platform regardless of where you're at, you don't have to worry about that. Yeah, we deal with that, we deal with a stateless compute coming up -- >> And make it programmable. Be able to say, "I want this bucket to provide these answers." Right, that's really the hope, the vision. And the complexity to build the entire stack, and then connect them together, we said, the fabric is cloud storage, we just provide the intelligence on top. >> Let's bring it back to the customers, and one of the things we're exploring in Supercloud too is, you know, is Supercloud a solution looking for a problem? Is a multicloud really a problem? I mean, you hear, you know, a lot of the vendor marketing says, "Oh, it's a disaster, because it's all different across the clouds." And I talked to a lot of customers even as part of Supercloud too, they're like, "Well, I solved that problem by just going mono cloud." Well, but then you're not able to take advantage of a lot of the capabilities and the primitives that, you know, like Google's data, or you like Microsoft's simplicity, their RPA, whatever it is. So what are customers telling you, what are their near term problems that they're trying to solve today, and how are they thinking about the future? >> Listen, it's a real problem. I think it started, I think this is a a mega trend, just like cloud. Just, cloud data, and I always add, analytics, are the mega trends. If you're looking at those, if you're not considering using the Supercloud principles, in other words, leveraging what I have, abstracting it out, and getting the most out of that, and then build value on top, I think you're not going to be able to keep up, In fact, no way you're going to keep up with this data volume. It's a geometric challenge, and you're trying to do linear things. So clients aren't necessarily asking, hey, for Supercloud, but they're really saying, I need to have a better mechanism to simplify this and get value across it, and how do you abstract that out to do that? And that's where they're obviously, our conversations are more amazed what we're able to do, and what they're able to do with our platform, because if you think of what we've done, the S3, or GCS, or object storage, is they can't imagine the ingest, they can't imagine how easy, time to glass, one minute, no matter where it lands in the world, querying this in seconds for hundreds of terabytes squared. People are amazed, but that's kind of, so they're not asking for that, but they are amazed. And then when you start talking on it, if you're an enterprise person, you're building a big cloud data platform, or doing data or analytics, if you're not trying to leverage the public clouds, and somehow leverage all of them, and then build on top, then I think you're missing it. So they might not be asking for it, but they're doing it. >> And they're looking for a lens, you mentioned all these different services, how do I bring those together quickly? You know, our viewpoint, our service, is I have all these streams of data, create a lens where they want to go after it via search, go after via SQL, bring them together instantly, no e-tailing out, no define this table, put into this database. We said, let's have a service that creates a lens across all these streams, and then make those connections. I want to take my CRM with my Google AdWords, and maybe my Salesforce, how do I do analysis? Maybe I want to hunt first, maybe I want to join, maybe I want to add another stream to it. And so our viewpoint is, it's so natural to get into these lake platforms and then provide lenses to get that access. >> And they don't want it separate, they don't want something different here, and different there. They want it basically -- >> So this is our industry, right? If something new comes out, remember virtualization came out, "Oh my God, this is so great, it's going to solve all these problems." And all of a sudden it just got to be this big, more complex thing. Same thing with cloud, you know? It started out with S3, and then EC2, and now hundreds and hundreds of different services. So, it's a complex matter for a lot of people, and this creates problems for customers, especially when you got divisions that are using different clouds, and you're saying that the solution, or a solution for the part of the problem, is to really allow the data to stay in place on S3, use that standard, super simple, but then give it what, Ed, you've called superpower a couple of times, to make it fast, make it inexpensive, and allow you to do that across clouds. >> Yeah, yeah. >> I'll give you guys the last word on that. >> No, listen, I think, we think Supercloud allows you to do a lot more. And for us, data, everyone says more data, more problems, more budget issue, everyone knows more data is better, and we show you how to do it cost effectively at scale. And we couldn't have done it without the design principles of we're leveraging the Supercloud to get capabilities, and because we use super, just the object storage, we're able to get these capabilities of ingest, scale, cost effectiveness, and then we built on top of this. In the end, a database is a data platform that allows you to go after everything distributed, and to get one platform for analytics, no matter where it lands, that's where we think the Supercloud concepts are perfect, that's where our clients are seeing it, and we're kind of excited about it. >> Yeah a third generation database, Supercloud database, however we want to phrase it, and make it simple, but provide the value, and make it instant. >> Guys, thanks so much for coming into the studio today, I really thank you for your support of theCUBE, and theCUBE community, it allows us to provide events like this and free content. I really appreciate it. >> Oh, thank you. >> Thank you. >> All right, this is Dave Vellante for John Furrier in theCUBE community, thanks for being with us today. You're watching Supercloud 2, keep it right there for more thought provoking discussions around the future of cloud and data. (bright music)
SUMMARY :
And the third thing that we want to do I'm going to put you right but if you do it right, So the conversation that we were having I like to say we're not a and you see their So, to me, if you can crack that code, and you need to get the you can get your use cases, But the key thing is you cracked the code. We had to crack the code, right? And then once you do that, So, we agree with Bob's. and where do you fit into the ecosystem? and we give you uniformity access to that so you think about Snowflake. So the idea that you have are the medium term, you know? and so we agree with Bob, So prior, the cloud that an easy thing to solve? you know, call it smart object storage, and after that, it's hot, you can see it. And it's no longer just you don't have to worry about And the complexity to and one of the things we're and how do you abstract it's so natural to get and different there. and allow you to do that across clouds. I'll give you guys and we show you how to do it but provide the value, I really thank you for around the future of cloud and data.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Walmart | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
NASDAQ | ORGANIZATION | 0.99+ |
Bob Muglia | PERSON | 0.99+ |
Thomas | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
Ionis Pharmaceuticals | ORGANIZATION | 0.99+ |
Western Union | ORGANIZATION | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Nelu Mihai | PERSON | 0.99+ |
Sachs | ORGANIZATION | 0.99+ |
Tristan Handy | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
two years | QUANTITY | 0.99+ |
Supercloud 2 | TITLE | 0.99+ |
first | QUANTITY | 0.99+ |
Last August | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
dbt Labs | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Ed | PERSON | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
Jimata Gan | PERSON | 0.99+ |
third one | QUANTITY | 0.99+ |
one minute | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
first generation | QUANTITY | 0.99+ |
third generation | QUANTITY | 0.99+ |
Grafana | ORGANIZATION | 0.99+ |
second generation | QUANTITY | 0.99+ |
second one | QUANTITY | 0.99+ |
hundreds of terabytes | QUANTITY | 0.98+ |
SQL | TITLE | 0.98+ |
five | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
Databricks | ORGANIZATION | 0.98+ |
a year ago | DATE | 0.98+ |
ChaosSearch | ORGANIZATION | 0.98+ |
Muglia | PERSON | 0.98+ |
MySQL | TITLE | 0.98+ |
both worlds | QUANTITY | 0.98+ |
third thing | QUANTITY | 0.97+ |
Marlborough | LOCATION | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
today | DATE | 0.97+ |
Supercloud | ORGANIZATION | 0.97+ |
Elasticsearch | TITLE | 0.96+ |
NetApp | TITLE | 0.96+ |
Datadog | ORGANIZATION | 0.96+ |
One | QUANTITY | 0.96+ |
EC2 | TITLE | 0.96+ |
each one | QUANTITY | 0.96+ |
S3 | TITLE | 0.96+ |
one platform | QUANTITY | 0.95+ |
Supercloud 2 | EVENT | 0.95+ |
first read | QUANTITY | 0.95+ |
six years ago | DATE | 0.95+ |
Ed Walsh, Courtney Pallotta & Thomas Hazel, ChaosSearch | AWS 2021 CUBE Testimonial
(upbeat music) >> My name's Courtney Pallota, I'm the Vice President of Marketing at ChaosSearch. We've partnered with theCUBE team to take every one of those assets, tailor them to meet whatever our needs were, and get them out and shared far and wide. And theCUBE team has been tremendously helpful in partnering with us to make that a success. >> theCUBE has been fantastic with us. They are thought leaders in this space. And we have a unique product, a unique vision, and they have an insight into where the market's going. They've had conference with us with data mesh, and how do we fit into that new realm of data access. And with our unique vision, with our unique platform, and with theCUBE, we've uniquely come out into the market. >> What's my overall experience with theCUBE? Would I do it again, would I recommended it to others? I said, I recommend theCUBE to everyone. In fact, I was at IBM, and some of the IBM executives didn't want to go on theCUBE because it's a live interview. Live interviews can be traumatic. But the fact of the matter is, one, yeah, they're tough questions, but they're in line, they're what clients are looking for. So yes, you have to be on ball. I mean, you're always on your toes, but you get your message out so crisply. So I recommend it to everyone. I've gotten a lot of other executives to participate, and they've all had a great example. You have to be ready. I mean, you can't go on theCUBE and not be ready, but now you can get your message out. And it has such a good distribution. I can't think of a better platform. So I recommended it to everyone. If I say ChaosSearch in one word, I'd say digital transformation, with a hyphen.
SUMMARY :
tailor them to meet And with our unique vision, I said, I recommend theCUBE to everyone.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Courtney Pallota | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
ChaosSearch | ORGANIZATION | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
one word | QUANTITY | 0.97+ |
Courtney Pallotta | PERSON | 0.9+ |
theCUBE | TITLE | 0.71+ |
one | QUANTITY | 0.56+ |
AWS 2021 | ORGANIZATION | 0.55+ |
Thomas Hazel, ChaosSearchJSON Flex on ChaosSearch
[Thomas Hazel] - Hello, this is Thomas Hazel, founder CTO here at ChaosSearch. And tonight I'm going to demonstrate a new feature we are offering this quarter called JSON Flex. If you're familiar with JSON datasets, they're wonderful ways to represent information. You know, they're multidimensional, they have ability to set up arrays as attributes but those arrays are really problematic when you need to expand them or flatten them to do any type of elastic search or relational access, particularly when you're trying to do aggregations. And so the common process is to exclude those arrays or pick and choose that information. But with this new Chaos Flex capability, our system uniquely can index that data horizontally in a very small and efficient representation. And then with our Chaos Refinery, expand each attribute as you wish vertically, so you can do all the basic and natural constructs you would have done if you had, you know, a more straightforward, two dimensional, three dimensional type representation. So without further ado, I'mma get into this presentation of JSON Flex. Now, in this case, I've already set up the service to point to a particular S3 account that has CloudTrail data, one that is pretty problematic when it comes down to flattening data. And again, if you know CloudTrail, one row can become 10,000 as data gets flattened. So without further ado, let me jump right in. When you first log into the ChaosSearch service, you'll see a tab called 'Storage'. This is the S3 account, and I have variety of buckets. I have the refinery, it's a data refinery. This is where we create views or lenses into these index streams that you can do analysis that publishes it in elastic API as an index pattern or relational table in SQL Now a particular bucket I have here is a whole bunch of demonstration datasets that we have to show off our capabilities and our offering. In this bucket, I have CloudTrail data and I'm going to create what we call a 'object group'. An object group is a entry point, a filter of which files I want to index that data. Now, it can be statically there or a live streaming. These object groups had the ability to say, what type of data do you want to index on? Now through our wizard, you can type in, you know, prefix in this case, I want to type in CloudTrail, and you see here, I have a whole bunch of CloudTrail. I'mma choose one file to make it quick and easy. But this particular CloudTrail data will expand and we can show the capability of this horizontal to vertical expansion. So I walked through the wizard, as you can see here, we discovered JSON, it's a gzip file. Leave flattening unlimited 'cause we want to be able to expand infinitely. But this case, instead of doing default virtual, I'm going to horizontally represent this information. And this uniquely compresses the data in a way that can be stored efficiently on disc but then expanded in our data refinery on Pond Query or search requests. So I'mma create this object group. Now I'm going to call this, you know, 'JSON Flex test' and I could set up live indexing, SQS pops up but I'mma skip that and skip Retention and just create it. Once this object group is created, you kind of think of it as a virtual bucket, 'cause it does filter the data as you can see here. When I look at the view, I just see CloudTrail, but within the console, I can say start indexing. Now this is static data there could be a live stream and we set up workers to index this data. Whether it's one file, a million files or one terabyte, or one petabyte, we index the data. We discover all the schema, and as you see here, we discovered 104 columns. Now what's interesting is that we represent this expansion in a horizontal way. You know, if you know CloudTrail records zero, record one, record two. This can expand pretty dramatically if you fully flatten it but this case we horizontally representing it as the index. So when I go into the data refinery, I can create a view. Now, if you know the data refinery of ChaosSearch, you can bring multiple data streams together. You can do transformations virtually, you can do correlations, but in this case, I'm just going to take this one particular index stream, we call 'JSON Flex' and walk through a wizard, we try to simplify everything and select a particular attribute to expand. Now, again, we represent this in one row but if you had arrays and do all the permutations, it could go one to 100 to 10,000. We had one JSON audit that went from one row to 1 million rows. Now, clearly you don't want to create all those permutations, when you're tryna put into a database. With our unique index technology, you can do it virtually and sort horizontally. So let me just select 'Virtual' and walk through the wizard. Now, as I mentioned, we do all these different transformations changed schema, we're going to skip all that and select the order time, records event and say, 'create this'. I'm going to say, you know, 'JSON Flex View', I can set up caching, do a variety of things, I'm going to skip that. And once I create this, it's now available in the elastic API as an index pattern, as well as SQL via our Presto API dialect. And you can use Looker, Tableau, et cetera. But in this case, we go to this 'Analytics tab' and we built in the Kibana, open search tooling that is Apache Tonetto. And I click on discovery here and I'm going to select that particular view. Again, it looks like, oops, it looks like an index pattern, and I'mma choose, let's see here, let's choose 15 years from past and present and make sure I find where actually was timed. And what you'll see here is, you know, sure. It's just one particular data set has a variety of columns, but you see here is unlike that record zero, records one, now it's expanded. And so it has been expanded like a vertical flattening that you would traditionally do if you wanted to do anything that was an elastic or a relational construct, you know, that fit into a table format. Now the 'vantage of JSON Flex, you don't have that stored as a blob and use these proprietary JSON API's. You can use your native elastic API or your native SQL tooling to get access to it naturally without that expense of that explosion or without the complexity of ETLing it, and picking and choosing before you actually put into the database. That completes the demonstration of ChaosSearch new JSON Flex capability. If you're interested, come to ChaosSearch.io and set up a free trial. Thank you.
SUMMARY :
and as you see here, we
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Thomas Hazel | PERSON | 0.99+ |
10,000 | QUANTITY | 0.99+ |
one terabyte | QUANTITY | 0.99+ |
one file | QUANTITY | 0.99+ |
104 columns | QUANTITY | 0.99+ |
one petabyte | QUANTITY | 0.99+ |
1 million rows | QUANTITY | 0.99+ |
JSON Flex | TITLE | 0.99+ |
ChaosSearch | ORGANIZATION | 0.99+ |
one row | QUANTITY | 0.99+ |
a million files | QUANTITY | 0.99+ |
tonight | DATE | 0.98+ |
Tableau | TITLE | 0.98+ |
each attribute | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
SQL | TITLE | 0.98+ |
S3 | TITLE | 0.98+ |
100 | QUANTITY | 0.98+ |
JSON | TITLE | 0.98+ |
15 years | QUANTITY | 0.98+ |
Presto | TITLE | 0.97+ |
one | QUANTITY | 0.96+ |
Looker | TITLE | 0.95+ |
two | QUANTITY | 0.93+ |
JSON Flex View | TITLE | 0.92+ |
JSON API | TITLE | 0.91+ |
Flex | TITLE | 0.87+ |
zero | QUANTITY | 0.87+ |
SQS | TITLE | 0.86+ |
ChaosSearchJSON | ORGANIZATION | 0.8+ |
this quarter | DATE | 0.8+ |
CloudTrail | COMMERCIAL_ITEM | 0.79+ |
Apache Tonetto | ORGANIZATION | 0.72+ |
JSON | ORGANIZATION | 0.69+ |
Chaos Flex | TITLE | 0.69+ |
CloudTrail | TITLE | 0.6+ |
ChaosSearch | TITLE | 0.58+ |
ChaosSearch.io | TITLE | 0.57+ |
data set | QUANTITY | 0.56+ |
Kibana | ORGANIZATION | 0.45+ |
Ed Walsh and Thomas Hazel, ChaosSearch
>> Welcome to theCUBE, I am Dave Vellante. And today we're going to explore the ebb and flow of data as it travels into the cloud and the data lake. The concept of data lakes was alluring when it was first coined last decade by CTO James Dixon. Rather than be limited to highly structured and curated data that lives in a relational database in the form of an expensive and rigid data warehouse or a data mart. A data lake is formed by flowing data from a variety of sources into a scalable repository, like, say an S3 bucket that anyone can access, dive into, they can extract water, A.K.A data, from that lake and analyze data that's much more fine-grained and less expensive to store at scale. The problem became that organizations started to dump everything into their data lakes with no schema on our right, no metadata, no context, just shoving it into the data lake and figure out what's valuable at some point down the road. Kind of reminds you of your attic, right? Except this is an attic in the cloud. So it's too big to clean out over a weekend. Well look, it's 2021 and we should be solving this problem by now. A lot of folks are working on this, but often the solutions add other complexities for technology pros. So to understand this better, we're going to enlist the help of ChaosSearch CEO Ed Walsh, and Thomas Hazel, the CTO and Founder of ChaosSearch. We're also going to speak with Kevin Miller who's the Vice President and General Manager of S3 at Amazon web services. And of course they manage the largest and deepest data lakes on the planet. And we'll hear from a customer to get their perspective on this problem and how to go about solving it, but let's get started. Ed, Thomas, great to see you. Thanks for coming on theCUBE. >> Likewise. >> Face to face, it's really good to be here. >> It is nice face to face. >> It's great. >> So, Ed, let me start with you. We've been talking about data lakes in the cloud forever. Why is it still so difficult to extract value from those data lakes? >> Good question. I mean, data analytics at scale has always been a challenge, right? So, we're making some incremental changes. As you mentioned that we need to see some step function changes. But in fact, it's the reason ChaosSearch was really founded. But if you look at it, the same challenge around data warehouse or a data lake. Really it's not just to flowing the data in, it's how to get insights out. So it kind of falls into a couple of areas, but the business side will always complain and it's kind of uniform across everything in data lakes, everything in data warehousing. They'll say, "Hey, listen, I typically have to deal with a centralized team to do that data prep because it's data scientists and DBAs". Most of the time, they're a centralized group. Sometimes they're are business units, but most of the time, because they're scarce resources together. And then it takes a lot of time. It's arduous, it's complicated, it's a rigid process of the deal of the team, hard to add new data, but also it's hard to, it's very hard to share data and there's no way to governance without locking it down. And of course they would be more self-serve. So there's, you hear from the business side constantly now underneath is like, there's some real technology issues that we haven't really changed the way we're doing data prep since the two thousands, right? So if you look at it, it's, it falls two big areas. It's one, how to do data prep. How do you take, a request comes in from a business unit. I want to do X, Y, Z with this data. I want to use this type of tool sets to do the following. Someone has to be smart, how to put that data in the right schema, you mentioned. You have to put it in the right format, that the tool sets can analyze that data before you do anything. And then second thing, I'll come back to that 'cause that's the biggest challenge. But the second challenge is how these different data lakes and data warehouses are now persisting data and the complexity of managing that data and also the cost of computing it. And I'll go through that. But basically the biggest thing is actually getting it from raw data so the rigidness and complexity that the business sides are using it is literally someone has to do this ETL process, extract, transform, load. They're actually taking data, a request comes in, I need so much data in this type of way to put together. They're literally physically duplicating data and putting it together on a schema. They're stitching together almost a data puddle for all these different requests. And what happens is anytime they have to do that, someone has to do it. And it's, very skilled resources are scanned in the enterprise, right? So it's a DBS and data scientists. And then when they want new data, you give them a set of data set. They're always saying, what can I add to this data? Now that I've seen the reports. I want to add this data more fresh. And the same process has to happen. This takes about 60% to 80% of the data scientists in DPA's to do this work. It's kind of well-documented. And this is what actually stops the process. That's what is rigid. They have to be rigid because there's a process around that. That's the biggest challenge of doing this. And it takes an enterprise, weeks or months. I always say three weeks or three months. And no one challenges beyond that. It also takes the same skill set of people that you want to drive digital transformation, data warehousing initiatives, motorization, being data driven or all these data scientists and DBS they don't have enough of. So this is not only hurting you getting insights out of your day like in the warehouses. It's also, this resource constraint is hurting you actually getting. >> So that smallest atomic unit is that team, that's super specialized team, right? >> Right. >> Yeah. Okay. So you guys talk about activating the data lake. >> Yep. >> For analytics. What's unique about that? What problems are you all solving? You know, when you guys crew created this magic sauce. >> No, and basically, there's a lot of things. I highlighted the biggest one is how to do the data prep, but also you're persisting and using the data. But in the end, it's like, there's a lot of challenges at how to get analytics at scale. And this is really where Thomas and I founded the team to go after this, but I'll try to say it simply. What we're doing, I'll try to compare and contrast what we do compared to what you do with maybe an elastic cluster or a BI cluster. And if you look at it, what we do is we simply put your data in S3, don't move it, don't transform it. In fact, we're against data movement. What we do is we literally point and set that data and we index that data and make it available in a data representation that you can give virtual views to end-users. And those virtual views are available immediately over petabytes of data. And it actually gets presented to the end-user as an open API. So if you're elastic search user, you can use all your elastic search tools on this view. If you're a SQL user, Tableau, Looker, all the different tools, same thing with machine learning next year. So what we do is we take it, make it very simple. Simply put it there. It's already there already. Point us at it. We do the hard of indexing and making available. And then you publish in the open API as your users can use exactly what they do today. So that's, dramatically I'll give you a before and after. So let's say you're doing elastic search. You're doing logging analytics at scale, they're lending their data in S3. And then they're ETL physically duplicating and moving data. And typically deleting a lot of data to get in a format that elastic search can use. They're persisting it up in a data layer called leucine. It's physically sitting in memories, CPU, SSDs, and it's not one of them, it's a bunch of those. They in the cloud, you have to set them up because they're persisting ECC. They stand up same by 24, not a very cost-effective way to the cloud computing. What we do in comparison to that is literally pointing it at the same S3. In fact, you can run a complete parallel, the data necessary it's being ETL out. When just one more use case read only, or allow you to get that data and make this virtual views. So we run a complete parallel, but what happens is we just give a virtual view to the end users. We don't need this persistence layer, this extra cost layer, this extra time, cost and complexity of doing that. So what happens is when you look at what happens in elastic, they have a constraint, a trade-off of how much you can keep and how much you can afford to keep. And also it becomes unstable at time because you have to build out a schema. It's on a server, the more the schema scales out, guess what? you have to add more servers, very expensive. They're up seven by 24. And also they become brutalized. You lose one node, the whole thing has to be put together. We have none of that cost and complexity. We literally go from to keep whatever you want, whatever you want to keep an S3 is single persistence, very cost effective. And what we are able to do is, costs, we save 50 to 80%. Why? We don't go with the old paradigm of sit it up on servers, spin them up for persistence and keep them up 7 by 24. We're literally asking their cluster, what do you want to cut? We bring up the right compute resources. And then we release those sources after the query done. So we can do some queries that they can't imagine at scale, but we're able to do the exact same query at 50 to 80% savings. And they don't have to do any tutorial of moving that data or managing that layer of persistence, which is not only expensive, it becomes brittle. And then it becomes, I'll be quick. Once you go to BI, it's the same challenge, but the BI systems, the requests are constant coming at from a business unit down to the centralized data team. Give me this flavor of data. I want to use this piece of, you know, this analytic tool in that desk set. So they have to do all this pipeline. They're constantly saying, okay, I'll give you this data, this data, I'm duplicating that data, I'm moving it and stitching it together. And then the minute you want more data, they do the same process all over. We completely eliminate that. >> And those requests are queue up. Thomas, it had me, you don't have to move the data. That's kind of the exciting piece here, isn't it? >> Absolutely no. I think, you know, the data lake philosophy has always been solid, right? The problem is we had that Hadoop hang over, right? Where let's say we were using that platform, little too many variety of ways. And so, I always believed in data lake philosophy when James came and coined that I'm like, that's it. However, HTFS, that wasn't really a service. Cloud object storage is a service that the elasticity, the security, the durability, all that benefits are really why we founded on-cloud storage as a first move. >> So it was talking Thomas about, you know, being able to shut off essentially the compute so you don't have to keep paying for it, but there's other vendors out there and stuff like that. Something similar as separating, compute from storage that they're famous for that. And you have Databricks out there doing their lake house thing. Do you compete with those? How do you participate and how do you differentiate? >> Well, you know you've heard this term data lakes, warehouse, now lake house. And so what everybody wants is simple in, easy in, however, the problem with data lakes was complexity of out. Driving value. And I said, what if, what if you have the easy in and the value out? So if you look at, say snowflake as a warehousing solution, you have to all that prep and data movement to get into that system. And that it's rigid static. Now, Databricks, now that lake house has exact same thing. Now, should they have a data lake philosophy, but their data ingestion is not data lake philosophy. So I said, what if we had that simple in with a unique architecture and indexed technology, make it virtually accessible, publishable dynamically at petabyte scale. And so our service connects to the customer's cloud storage. Data stream the data in, set up what we call a live indexing stream, and then go to our data refinery and publish views that can be consumed the elastic API, use cabana Grafana, or say SQL tables look or say Tableau. And so we're getting the benefits of both sides, use scheme on read-write performance with scheme write-read performance. And if you can do that, that's the true promise of a data lake, you know, again, nothing against Hadoop, but scheme on read with all that complexity of software was a little data swamping. >> Well, you've got to start it, okay. So we got to give them a good prompt, but everybody I talked to has got this big bunch of spark clusters, now saying, all right, this doesn't scale, we're stuck. And so, you know, I'm a big fan of Jamag Dagani and our concept of the data lake and it's early days. But if you fast forward to the end of the decade, you know, what do you see as being the sort of critical components of this notion of, people call it data mesh, but to get the analytics stack, you're a visionary Thomas, how do you see this thing playing out over the next decade? >> I love her thought leadership, to be honest, our core principles were her core principles now, 5, 6, 7 years ago. And so this idea of, decentralize that data as a product, self-serve and, and federated computer governance, I mean, all that was our core principle. The trick is how do you enable that mesh philosophy? I can say we're a mesh ready, meaning that, we can participate in a way that very few products can participate. If there's gates data into your system, the CTL, the schema management, my argument with the data meshes like producers and consumers have the same rights. I want the consumer, people that choose how they want to consume that data. As well as the producer, publishing it. I can say our data refinery is that answer. You know, shoot, I'd love to open up a standard, right? Where we can really talk about the producers and consumers and the rights each others have. But I think she's right on the philosophy. I think as products mature in this cloud, in this data lake capabilities, the trick is those gates. If you have to structure up front, if you set those pipelines, the chance of you getting your data into a mesh is the weeks and months that Ed was mentioning. >> Well, I think you're right. I think the problem with data mesh today is the lack of standards you've got. You know, when you draw the conceptual diagrams, you've got a lot of lollipops, which are APIs, but they're all unique primitives. So there aren't standards, by which to your point, the consumer can take the data the way he or she wants it and build their own data products without having to tap people on the shoulder to say, how can I use this?, where does the data live? And being able to add their own data. >> You're exactly right. So I'm an organization, I'm generating data, when the courageously stream it into a lake. And then the service, a ChaosSearch service, is the data is discoverable and configurable by the consumer. Let's say you want to go to the corner store. I want to make a certain meal tonight. I want to pick and choose what I want, how I want it. Imagine if the data mesh truly can have that producer of information, you know, all the things you can buy a grocery store and what you want to make for dinner. And if you'd static, if you call up your producer to do the change, was it really a data mesh enabled service? I would argue not. >> Ed, bring us home. >> Well, maybe one more thing with this. >> Please, yeah. 'Cause some of this is we're talking 2031, but largely these principles are what we have in production today, right? So even the self service where you can actually have a business context on top of a data lake, we do that today, we talked about, we get rid of the physical ETL, which is 80% of the work, but the last 20% it's done by this refinery where you can do virtual views, the right or back and do all the transformation need and make it available. But also that's available to, you can actually give that as a role-based access service to your end-users, actually analysts. And you don't want to be a data scientist or DBA. In the hands of a data scientist the DBA is powerful, but the fact of matter, you don't have to affect all of our employees, regardless of seniority, if they're in finance or in sales, they actually go through and learn how to do this. So you don't have to be it. So part of that, and they can come up with their own view, which that's one of the things about data lakes. The business unit wants to do themselves, but more importantly, because they have that context of what they're trying to do instead of queuing up the very specific request that takes weeks, they're able to do it themselves. >> And if I have to put it on different data stores and ETL that I can do things in real time or near real time. And that's game changing and something we haven't been able to do ever. >> And then maybe just to wrap it up, listen, you know 8 years ago, Thomas and his group of founders, came up with the concept. How do you actually get after analytics at scale and solve the real problems? And it's not one thing, it's not just getting S3. It's all these different things. And what we have in market today is the ability to literally just simply stream it to S3, by the way, simply do, what we do is automate the process of getting the data in a representation that you can now share an augment. And then we publish open API. So can actually use a tool as you want, first use case log analytics, hey, it's easy to just stream your logs in. And we give you elastic search type of services. Same thing that with CQL, you'll see mainstream machine learning next year. So listen, I think we have the data lake, you know, 3.0 now, and we're just stretching our legs right now to have fun. >> Well, and you have to say it log analytics. But if I really do believe in this concept of building data products and data services, because I want to sell them, I want to monetize them and being able to do that quickly and easily, so I can consume them as the future. So guys, thanks so much for coming on the program. Really appreciate it.
SUMMARY :
and Thomas Hazel, the CTO really good to be here. lakes in the cloud forever. And the same process has to happen. So you guys talk about You know, when you guys crew founded the team to go after this, That's kind of the exciting service that the elasticity, And you have Databricks out there And if you can do that, end of the decade, you know, the chance of you getting your on the shoulder to say, all the things you can buy a grocery store So even the self service where you can actually have And if I have to put it is the ability to literally Well, and you have
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Kevin Miller | PERSON | 0.99+ |
Thomas | PERSON | 0.99+ |
Ed | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
50 | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
James | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
ChaosSearch | ORGANIZATION | 0.99+ |
three months | QUANTITY | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
2021 | DATE | 0.99+ |
two thousands | QUANTITY | 0.99+ |
three weeks | QUANTITY | 0.99+ |
24 | QUANTITY | 0.99+ |
James Dixon | PERSON | 0.99+ |
last decade | DATE | 0.99+ |
7 | QUANTITY | 0.99+ |
second challenge | QUANTITY | 0.99+ |
2031 | DATE | 0.99+ |
Jamag Dagani | PERSON | 0.98+ |
S3 | ORGANIZATION | 0.98+ |
both sides | QUANTITY | 0.98+ |
S3 | TITLE | 0.98+ |
8 years ago | DATE | 0.98+ |
second thing | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
about 60% | QUANTITY | 0.98+ |
tonight | DATE | 0.97+ |
first | QUANTITY | 0.97+ |
Tableau | TITLE | 0.97+ |
two big areas | QUANTITY | 0.96+ |
one | QUANTITY | 0.95+ |
SQL | TITLE | 0.94+ |
seven | QUANTITY | 0.94+ |
6 | DATE | 0.94+ |
CTO | PERSON | 0.93+ |
CQL | TITLE | 0.93+ |
7 years | DATE | 0.93+ |
first move | QUANTITY | 0.93+ |
next decade | DATE | 0.92+ |
single | QUANTITY | 0.91+ |
DBS | ORGANIZATION | 0.9+ |
20% | QUANTITY | 0.9+ |
one thing | QUANTITY | 0.87+ |
5 | DATE | 0.87+ |
Hadoop | TITLE | 0.87+ |
Looker | TITLE | 0.8+ |
Grafana | TITLE | 0.73+ |
DPA | ORGANIZATION | 0.71+ |
one more thing | QUANTITY | 0.71+ |
end of the | DATE | 0.69+ |
Vice President | PERSON | 0.65+ |
petabytes | QUANTITY | 0.64+ |
cabana | TITLE | 0.62+ |
CEO | PERSON | 0.57+ |
HTFS | ORGANIZATION | 0.54+ |
house | ORGANIZATION | 0.49+ |
theCUBE | ORGANIZATION | 0.48+ |
Ed Walsh and Thomas Hazel, ChaosSearch | JSON
>>Hi, Brian, this is Dave Volante. Welcome to this cube conversation with Thomas Hazel was the founder and CTO of chaos surgeon. I'm also joined by ed Walsh. Who's the CEO Thomas. Good to see you. >>Great to be here. >>Explain Jason. First of all, what >>Jason, Jason has a powerful data representation, a data source. Uh, but let's just say that we try to drive value out of it. It gets complicated. Uh, I can search. We activate customers, data lakes. So, you know, customers stream their Jason data to this, uh, cloud stores that we activate. Now, the trick is the complexity of a Jason data structure. You can do all these complexity of representation. Now here's the problem putting that representation into a elastic search database or relational databases, very problematic. So what people choose to do is they pick and choose what they want and or they just stored as a blob. And so I said, what if, what if we create a new index technology that could store it as a full representation, but dynamically in a, we call our data refinery published access to all the permutations that you may want, where if you do a full on flatten, your flattening of its Jason, one row theoretically could be put into a million rows and relational data sort of explode, >>But then it gets really expensive. But so, but everybody says they have Jason support, every database vendor that I talked to, it's a big announcement. We now support Jason. What's the deal. >>Exactly. So you take your relational database with all those relational constructs and you have a proprietary Jason API to pick and choose. So instead of picking, choosing upfront, now you're picking, choosing in the backend where you really want us the power of the relational analysis of that Jaison data. And that's where chaos comes in, where we expand those data streams we do in a relational way. So all that tooling you've been built to know and love. Now you can access to it. So if you're doing proprietary APIs or Jason data, you're not using Looker, you're not using Tableau. You're doing some type of proprietary, probably emailing now on the backend. >>Okay. So you say all the tools that you've trained, everybody on you can't really use them. You got to build some custom stuff and okay, so, so, so maybe bring that home then in terms of what what's the money, why do the suits care about this stuff? >>The reason this is so important is think about anything, cloud native Kubernetes, your different applications. What you're doing in Mongo is all Jason is it's very powerful but painful, but if you're not keeping the data, what people are doing a data scientist is, or they're just doing leveling, they're saying I'm going to only keep the first four things. So think about it's Kubernetes, it's your app logs. They're trying to figure out for black Friday, what happens? It's Lilly saying, Hey, every minute they'll cut a new log. You're able to say, listen, these are the users that were in that system for an hour. And here's a different things. They do. The fact of the matter is if you cut it off, you lose all that fidelity, all that data. So it's really important that to have. So if you're trying to figure out either what happened for security, what happened for on a performance, or if you're trying to figure out, Hey, I'm VP of product or growth, how do I cross sell things? >>You need to know what everyone's doing. If you're not handling Jason natively, like we're doing either your, it keeps on expanding on black Friday. All of a sudden the logs get huge. And the next day it's not, but it's really powerful data that you need to harness for business values. It's, what's going to drive growth. It's what's going to do the digital transformation. So without the technology, you're kind of blind. And to be honest, you don't know. Cause a data scientist is kind of deleted the data on you. So this is big for the business and digital transformation, but also it was such a pain. The data scientists in DBS were forced to just basically make it simple. So it didn't blow up their system. We allow them to keep it simple, but yes, >>Both power. It reminds me if you like, go on vacation, you got your video camera. Somebody breaks into your house. You go back to Lucas and see who and that the data's gone. The video's gone because it didn't, you didn't, you weren't able to save it cause it's too >>Expensive. Well, it's funny. This is the first day source. That's driving the design of the database because of all the value we should be designed the database around the information. It stores not the structure and how it's been organized. And so our viewpoint is you get to choose your structure yet contain all that content. So if a vendor >>It says to kind of, I'm a customer then says, Hey, we got Jason support. What questions should I ask to really peel the onion? >>Well, particularly relational. Is it a relational access to that data? Now you could say, oh, I've ETL does Jason into it. But chances are the explosion of Jason permutations of one row to a million. They're probably not doing the full representation. So from our viewpoint is either you're doing a blob type access to proprietary Jason APIs or you're picking and choosing those, the choices say that is the market thought. However, what if you could take all the vegetation and design your schema based on how you want to consume it versus how you could store it. And that's a big difference with, >>So I should be asking how, how do I consume this data? Are you ETL? Bring it in how much data explosion is going to occur. Once I do this, and you're saying for chaos, search the answer to those questions. >>The answer is, again, our philosophy simply stream your data into your cloud object, storage, your data lake and with our index technology and our data refinery. You get to create views, dynamic the incident, whether it's a terabyte or petabyte, and describe how you want your data because consumed in a relational way or an elastic search way, both are consumable through our data refinery, which is >>For us. The refinery gives you the view. So what happens if someone wants a different view, I want to actually unpack different columns or different matrices. You able to do that in a virtual view, it's available immediately over petabytes of data. You don't have that episode where you come back, look at the video camera. There's no data there left. So that's, >>We do appreciate the time and the explanation on really understanding Jason. Thank you. All right. And thank you for watching this cube conversation. This is Dave Volante. We'll see you next time.
SUMMARY :
Good to see you. First of all, what where if you do a full on flatten, your flattening of its Jason, one row theoretically What's the deal. So you take your relational database with all those relational constructs and you have a proprietary You got to build some custom The fact of the matter is if you cut it off, you lose all that And to be honest, you don't know. It reminds me if you like, go on vacation, you got your video camera. And so our viewpoint is you It says to kind of, I'm a customer then says, Hey, we got Jason support. However, what if you could take all the vegetation and design your schema based on how you want to Bring it in how much data explosion is going to occur. whether it's a terabyte or petabyte, and describe how you want your data because consumed in a relational way You don't have that episode where you come back, look at the video camera. And thank you for watching this cube conversation.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volante | PERSON | 0.99+ |
Brian | PERSON | 0.99+ |
Jason | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
Lilly | PERSON | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
JSON | PERSON | 0.99+ |
Thomas | PERSON | 0.99+ |
first day | QUANTITY | 0.99+ |
black Friday | EVENT | 0.99+ |
an hour | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
Both | QUANTITY | 0.97+ |
ed Walsh | PERSON | 0.97+ |
Tableau | TITLE | 0.95+ |
first four things | QUANTITY | 0.94+ |
Kubernetes | TITLE | 0.93+ |
one row | QUANTITY | 0.92+ |
Mongo | ORGANIZATION | 0.9+ |
Jason | ORGANIZATION | 0.89+ |
ChaosSearch | ORGANIZATION | 0.89+ |
a million | QUANTITY | 0.88+ |
next day | DATE | 0.86+ |
Jason | TITLE | 0.81+ |
First | QUANTITY | 0.74+ |
million rows | QUANTITY | 0.73+ |
ETL | ORGANIZATION | 0.7+ |
petabytes | QUANTITY | 0.69+ |
Looker | ORGANIZATION | 0.66+ |
DBS | ORGANIZATION | 0.58+ |
Jaison | PERSON | 0.52+ |
Lucas | PERSON | 0.49+ |
Ed Walsh and Thomas Hazel V1
>>Welcome to the cube. I'm Dave Volante. Today, we're going to explore the ebb and flow of data as it travels into the cloud. In the data lake, the concept of data lakes was a Loring when it was first coined last decade by CTO James Dickson, rather than be limited to highly structured and curated data that lives in a relational database in the form of an expensive and rigid data warehouse or a data Mart, a data lake is formed by flowing data from a variety of sources into a scalable repository, like say an S3 bucket that anyone can access, dive into. They can extract water. It can a data from that lake and analyze data. That's much more fine-grained and less expensive to store at scale. The problem became that organizations started to dump everything into their data lakes with no schema on it, right? No metadata, no context to shove it into the data lake and figure out what's valuable. >>At some point down the road kind of reminds you of your attic, right? Except this is an attic in the cloud. So it's too big to clean out over a weekend. We'll look it's 2021 and we should be solving this problem by now, a lot of folks are working on this, but often the solutions at other complexities for technology pros. So to understand this better, we're going to enlist the help of chaos search CEO and Walsh and Thomas Hazel, the CTO and founder of chaos search. We're also going to speak with Kevin Miller. Who's the vice president and general manager of S3 at Amazon web services. And of course they manage the largest and deepest data lakes on the planet. And we'll hear from a customer to get their perspective on this problem and how to go about solving it, but let's get started. Ed Thomas. Great to see you. Thanks for coming on the cube. Likewise face. It's really good to be in this nice face. Great. So let me start with you. We've been talking about data lakes in the cloud forever. Why is it still so difficult to extract value from those data? >>Good question. I mean, a data analytics at scale is always been a challenge, right? So, and it's, uh, we're making some incremental changes. As you mentioned that we need to seem some step function changes, but, uh, in fact, it's the reason, uh, search was really founded. But if you look at it the same challenge around data warehouse or a data lake, really, it's not just a flowing the data in is how to get insights out. So it kind of falls into a couple of areas, but the business side will always complain and it's kind of uniform across everything in data lakes, everything that we're offering, they'll say, Hey, listen, I typically have to deal with a centralized team to do that data prep because it's data scientist and DBS. Most of the time they're a centralized group, sometimes are business units, but most of the time, because they're scarce resources together. >>And then it takes a lot of time. It's arduous, it's complicated. It's a rigid process of the deal of the team, hard to add new data. But also it's hard to, you know, it's very hard to share data and there's no way to governance without locking it down. And of course they would be more self-service. So there's you hear from the business side constantly now underneath is like, there's some real technology issues that we haven't really changed the way we're doing data prep since the two thousands. Right? So if you look at it, it's, it falls, uh, two big areas. It's one. How do data prep, how do you take a request comes in from a business unit. I want to do X, Y, Z with this data. I want to use this type of tool sets to do the following. Someone has to be smart, how to put that data in the right schema. >>You mentioned you have to put it in the right format, that the tool sets can analyze that data before you do anything. And then secondly, I'll come back to that because that's a biggest challenge. But the second challenge is how these different data lakes and data we're also going to persisting data and the complexity of managing that data and also the cost of computing. And I'll go through that. But basically the biggest thing is actually getting it from raw data so that the rigidness and complexity that the business sides are using it is literally someone has to do this ETL process extract, transform load. They're actually taking data request comes in. I need so much data in this type of way to put together their Lilly, physically duplicating data and putting it together and schema they're stitching together almost a data puddle for all these different requests. >>And what happens is anytime they have to do that, someone has to do it. And it's very skilled. Resources are scant in the enterprise, right? So it's a DBS and data scientists. And then when they want new data, you give them a set of data set. They're always saying, what can I add this data? Now that I've seen the reports, I want to add this data more fresh. And the same process has to happen. This takes about 60 to 80% of the data scientists in DPA's to do this work. It's kind of well-documented. Uh, and this is what actually stops the process. That's what is rigid. They have to be rigid because there's a process around that. Uh, that's the biggest challenge to doing this. And it takes in the enterprise, uh, weeks or months. I always say three weeks to three months. And no one challenges beyond that. It also takes the same skill set of people that you want to drive. Digital transformation, data, warehousing initiatives, uh, monitorization being, data driven, or all these data scientists and DBS. They don't have enough of, so this is not only hurting you getting insights out of your dead like that, or else it's also this resource constraints hurting you actually getting smaller. >>The Tomic unit is that team that's super specialized team. Right. Right. Yeah. Okay. So you guys talk about activating the data lake. Yep, sure. Analytics, what what's unique about that? What problems are you all solving? You know, when you guys crew created this, this, this magic sauce. >>No, and it basically, there's a lot of things I highlighted the biggest one is how to do the data prep, but also you're persisting and using the data. But in the end, it's like, there's a lot of challenges that how to get analytics at scale. And this is really where Thomas founded the team to go after this. But, um, I'll try to say it simply, what are we doing? I'll try to compare and stress what we do compared to what you do with maybe an elastic cluster or a BI cluster. Um, and if you look at it, what we do is we simply put your data in S3, don't move it, don't transform it. In fact, we're not we're against data movement. What we do is we literally pointed at that data and we index that data and make it available in a data representation that you can give virtual views to end users. >>And those virtual views are available immediately over petabytes of data. And it re it actually gets presented to the end user as an open API. So if you're elastic search user, you can use all your lesser search tools on this view. If you're a SQL user, Tableau, Looker, all the different tools, same thing with machine learning next year. So what we do is we take it, make it very simple. Simply put it there. It's already there already. Point is at it. We do the hard of indexing and making available. And then you publish in the open API as your users can use exactly what they do today. So that's dramatically. I'll give you a before and after. So let's say you're doing elastic search. You're doing logging analytics at scale, they're lending their data in S3. And then they're,, they're physically duplicating a moving data and typically deleting a lot of data to get in a format that elastic search can use. >>They're persisting it up in a data layer called leucine. It's physically sitting in memories, CPU, uh, uh, SSDs. And it's not one of them. It's a bunch of those. They in the cloud, you have to set them up because they're persisting ECC. They stand up semi by 24, not a very cost-effective way to the cloud, uh, cloud computing. What we do in comparison to that is literally pointing it at the same S3. In fact, you can run a complete parallel, the data necessary. It's being ETL. That we're just one more use case read only, or allow you to get that data and make this virtual views. So we run a complete parallel, but what happens is we just give a virtual view to the end users. We don't need this persistence layer, this extra cost layer, this extra, um, uh, time cost and complexity of doing that. >>So what happens is when you look at what happens in elastic, they have a constraint, a trade-off of how much you can keep and how much you can afford to keep. And also it becomes unstable at time because you have to build out a schema. It's on a server, the more the schema scales out, guess what you have to add more servers, very expensive. They're up seven by 24. And also they become brittle. As you lose one node. The whole thing has to be put together. We have none of that cost and complexity. We literally go from to keep whatever you want, whatever you want to keep an S3, a single persistence, very cost effective. And what we do is, um, costs. We save 50 to 80% why we don't go with the old paradigm of sit it up on servers, spin them up for persistence and keep them up. >>Somebody 24, we're literally asking her cluster, what do you want to cut? We bring up the right compute resources. And then we release those sources after the query done. So we can do some queries that they can't imagine at scale, but we're able to do the exact same query at 50 to 80% savings. And they don't have to do any of the toil of moving that data or managing that layer of persistence, which is not only expensive. It becomes brittle. And then it becomes an I'll be quick. Once you go to BI, it's the same challenge, but the BI systems, the requests are constant coming at from a business unit down to the centralized data team. Give me this flavor of debt. I want to use this piece of, you know, this analytic tool in that desk set. So they have to do all this pipeline. They're constantly saying, okay, I'll give you this data, this data I'm duplicating that data. I'm moving in stitching together. And then the minute you want more data, they do the same process all over. We completely eliminate that. >>The questions queue up, Thomas, it had me, you don't have to move the data. That's, that's kind of the >>Writing piece here. Isn't it? I absolutely, no. I think, you know, the daylight philosophy has always been solid, right? The problem is we had that who do hang over, right? Where let's say we were using that platform, little, too many variety of ways. And so I always believed in daily philosophy when James came and coined that I'm like, that's it. However, HTFS that wasn't really a service cloud. Oddish storage is a service that the, the last society, the security and the durability, all that benefits are really why we founded, uh, Oncotype storage as a first move. >>So it was talking Thomas about, you know, being able to shut off essentially the compute and you have to keep paying for it, but there's other vendors out there and stuff like that. Something similar as separating, compute from storage that they're famous for that. And, and, and yet Databricks out there doing their lake house thing. Do you compete with those? How do you participate and how do you differentiate? >>I know you've heard this term data lakes, warehouse now, lake house. And so what everybody wants is simple in easy N however, the problem with data lakes was complexity of out driving value. And I said, what if, what if you have the easy end and the value out? So if you look at, uh, say snowflake as a, as a warehousing solution, you have to all that prep and data movement to get into that system. And that it's rigid static. Now, Databricks, now that lake house has exact same thing. Now, should they have a data lake philosophy, but their data ingestion is not daily philosophy. So I said, what if we had that simple in with a unique architecture, indexed technology, make it virtually accessible publishable dynamically at petabyte scale. And so our service connects to the customer's cloud storage data, stream the data in set up what we call a live indexing stream, and then go to our data refinery and publish views that can be consumed the lasted API, use cabana Grafana, or say SQL tables look or say Tableau. And so we're getting the benefits of both sides, you know, schema on read, write performance with scheme on, right. Reperformance. And if you can do that, that's the true promise of a data lake, you know, again, nothing against Hadoop, but a schema on read with all that complexity of, uh, software was, uh, what was a little data, swamp >>Got to start it. Okay. So we got to give a good prompt, but everybody I talked to has got this big bunch of spark clusters now saying, all right, this, this doesn't scale we're stuck. And so, you know, I'm a big fan of and our concept of the data lake and it's it's early days. But if you fast forward to the end of the decade, you know, what do you see as being the sort of critical components of this notion of, you know, people call it data mesh, but you've got the analytics stack. Uh, you, you, you're a visionary Thomas, how do you see this thing playing out over the next? >>I love for thought leadership, to be honest, our core principles were her core principles now, you know, 5, 6, 7 years ago. And so this idea of, you know, de centralize that data as a product, you know, self-serve and, and federated, computer, uh, governance, I mean, all that, it was our core principle. The trick is how do you enable that mesh philosophy? We, I could say we're a mesh ready, meaning that, you know, we can participate in a way that very few products can participate. If there's gates data into your system, the CTLA, the schema management, my argument with the data meshes like producers and consumers have the same rights. I want the consumer people that choose how they want to consume that data, as well as the producer publishing it. I can say our data refinery is that answer. You know, shoot, I love to open up a standard, right, where we can really talk about the producers and consumers and the rights each others have. But I think she's right on the philosophy. I think as products mature in this cloud, in this data lake capabilities, the trick is those gates. If you have the structure up front, it gets at those pipelines. You know, the chance of you getting your data into a mesh is the weeks and months that it was mentioning. >>Well, I think you're right. I think the problem with, with data mesh today is the lack of standards you've got. You know, when you draw the conceptual diagrams, you've got a lot of lollipops, which are API APIs, but they're all, you know, unique primitives. So there aren't standards by which to your point, the consumer can take the data the way he or she wants it and build their own data products without having to tap people on the shoulder to say, how can I use this? Where's the data live and, and, and, and, and being able to add their own >>You're exactly right. So I'm an organization I'm generally data will be courageous, a stream it to a lake. And then the service, uh, Ks search service is the data's con uh, discoverable and configurable by the consumer. Let's say you want to go to the corner store? You know, I want to make a certain meal tonight. I want to pick and choose what I want, how I want it. Imagine if the data mesh truly can have that producer of information, you, all the things you can buy a grocery store and what you want to make for dinner. And if you'd static, if you call up your producer to do the change, was it really a data mesh enabled service? I would argue not that >>Bring us home >>Well. Uh, and, um, maybe one more thing with this, cause some of this is we talking 20, 31, but largely these principles are what we have in production today, right? So even the self service where you can actually have business context on top of a debt, like we do that today, we talked about, we get rid of the physical ETL, which is 80% of the work, but the last 20% it's done by this refinery where you can do virtual views, the right our back and do all the transformation need and make it available. But also that's available to, you can actually give that as a role-based access service to your end users actually analysts, and you don't want to be a data scientist or DBA in the hands of a data science. The DBA is powerful, but the fact of matter, you don't have to affect all of our employees, regardless of seniority. If they're in finance or in sales, they actually go through and learn how to do this. So you don't have to be it. So part of that, and they can come up with their own view, which that's one of the things about debt lakes, the business unit wants to do themselves, but more importantly, because they have that context of what they're trying to do instead of queuing up the very specific request that takes weeks, they're able to do it themselves and to find out that >>Different data stores and ETL that I can do things in real time or near real time. And that's that's game changing and something we haven't been able to do, um, ever. Hmm. >>And then maybe just to wrap it up, listen, um, you know, eight years ago is a group of founders came up with the concept. How do you actually get after analytics at scale and solve the real problems? And it's not one thing it's not just getting S3, it's all these different things. And what we have in market today is the ability to literally just simply stream it to S3 by the way, simply do what we do is automate the process of getting the data in a representation that you can now share an augment. And then we publish open API. So can actually use a tool as you want first use case log analytics, Hey, it's easy to just stream your logs in and we give you elastic search puppet services, same thing that with CQL, you'll see mainstream machine learning next year. So listen, I think we have the data lake, you know, 3.0 now, and we're just stretching our legs run off >>Well, and you have to say it log analytics. But if I really do believe in this concept of building data products and data services, because I want to sell them, I want to monetize them and being able to do that quickly and easily, so that can consume them as the future. So guys, thanks so much for coming on the program. Really appreciate it. All right. In a moment, Kevin Miller of Amazon web services joins me. You're watching the cube, your leader in high tech coverage.
SUMMARY :
that organizations started to dump everything into their data lakes with no schema on it, At some point down the road kind of reminds you of your attic, right? But if you look at it the same challenge around data warehouse So if you look at it, it's, it falls, uh, two big areas. You mentioned you have to put it in the right format, that the tool sets can analyze that data before you do anything. It also takes the same skill set of people that you want So you guys talk about activating the data lake. Um, and if you look at it, what we do is we simply put your data in S3, don't move it, And then you publish in the open API as your users can use exactly what they you have to set them up because they're persisting ECC. It's on a server, the more the schema scales out, guess what you have to add more servers, And then the minute you want more data, they do the same process all over. The questions queue up, Thomas, it had me, you don't have to move the data. I absolutely, no. I think, you know, the daylight philosophy has always been So it was talking Thomas about, you know, being able to shut off essentially the And I said, what if, what if you have the easy end and the value out? the sort of critical components of this notion of, you know, people call it data mesh, And so this idea of, you know, de centralize that You know, when you draw the conceptual diagrams, you've got a lot of lollipops, which are API APIs, but they're all, if you call up your producer to do the change, was it really a data mesh enabled service? but the fact of matter, you don't have to affect all of our employees, regardless of seniority. And that's that's game changing And then maybe just to wrap it up, listen, um, you know, eight years ago is a group of founders Well, and you have to say it log analytics.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kevin Miller | PERSON | 0.99+ |
Thomas | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Ed Thomas | PERSON | 0.99+ |
50 | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
James | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
three months | QUANTITY | 0.99+ |
three weeks | QUANTITY | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
next year | DATE | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
S3 | ORGANIZATION | 0.99+ |
second challenge | QUANTITY | 0.99+ |
24 | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
eight years ago | DATE | 0.98+ |
Today | DATE | 0.98+ |
two thousands | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
20% | QUANTITY | 0.98+ |
tonight | DATE | 0.97+ |
last decade | DATE | 0.97+ |
S3 | TITLE | 0.97+ |
first | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
Tableau | TITLE | 0.95+ |
single | QUANTITY | 0.95+ |
James Dickson | PERSON | 0.94+ |
Hadoop | TITLE | 0.94+ |
two big areas | QUANTITY | 0.94+ |
20 | QUANTITY | 0.94+ |
SQL | TITLE | 0.93+ |
seven | QUANTITY | 0.93+ |
CTO | PERSON | 0.93+ |
about 60 | QUANTITY | 0.93+ |
Oncotype | ORGANIZATION | 0.92+ |
first move | QUANTITY | 0.92+ |
secondly | QUANTITY | 0.91+ |
one more thing | QUANTITY | 0.89+ |
DBS | ORGANIZATION | 0.89+ |
one node | QUANTITY | 0.85+ |
Walsh | PERSON | 0.83+ |
petabytes | QUANTITY | 0.77+ |
Tomic | ORGANIZATION | 0.77+ |
31 | QUANTITY | 0.77+ |
end of the | DATE | 0.76+ |
cabana | TITLE | 0.73+ |
HTFS | ORGANIZATION | 0.7+ |
Mart | ORGANIZATION | 0.68+ |
Grafana | TITLE | 0.63+ |
data | ORGANIZATION | 0.58+ |
Looker | TITLE | 0.55+ |
CQL | TITLE | 0.55+ |
DPA | ORGANIZATION | 0.54+ |
Thomas Hazel, ChaosSearch & Jeremy Foran, BAI Communications | AWS Startup Showcase
(upbeat music) >> Hey everyone, I'm John Furrier with The Cube, we're here in Palo Alto, California for a remote interview and session for The Cube presents AWS startup showcase, the next big thing in AI security in life sciences. I'm John Furrier. We're here with a great segment on cloud. Next big thing in Cloud with Chaos Search, Thomas Hazel, Chief Technology and Science Officer of Chaos Search joined by Jeremy Foran, the head of data analytics, the bad boy of data analyst as they say, but BAI communications, Jeremy Thomas, great to have you on. >> Great to be here. >> Pleasure to be here. >> So we're going to be talking about applying large scale log analytics to building the future of the transit industry. Obviously Telco's a big part of that, smart cities, you name the use case self-driving trucks, cars, you name it, everything's now edge. That the edge is super valuable, it's a new kind of last mile if you will, it's moving fast, it's mobile. This is a huge deal. Let's get into it, Thomas. What's this big story around this, this session? >> Well, we provide unique ability to take all that edge data and drive it into a data lake offering that we provide data analytics, both in logs, BI and coming out with ML there this year into next. So our unique play is transforming customers' cloud outer storage into an analytical platform. And really, I think with BIA is a log analytics specifically where, you know there's a lot of data streams from all those devices going into a lake that we transform their lake into analytics for driving, I guess, operational analysis. >> You know, Jeremy, I remember back in the day, I'm old enough to remember when the edge was the remote switch or campus hub or something. And then even on the Telco side, there was no wifi back in 2000 and you know, someone was driving in a car and you got any signal, you're lucky. Now you got, you know, no perimeter you have unlimited connectivity everywhere. This has opened up more of an Omni channel data problem. How do you see that world? Because you still got more devices pushing out at this edge and it's getting super local, right? Even on the body, even on people in the car. So certainly a lot of change on the infrastructure side. What does that pose for data challenge? >> Yeah, I, I would say that, you know users always want more, more bandwidth, more performance and that requires us to create more systems that require more complexity to deliver that user experience that we're, we're very proud of. And with that complexity means, you know exponentially more data. And so one of the wifi networks we offer in the Toronto subway system, T-connect, you know we see a 100-200,000 unique users a day and you can imagine just the amount of infrastructure to support that so that everyone has a seamless experience and can get their news and emails and even stream media while they're waiting for the subway. >> So you guys provide state of the art infrastructure for cell, wifi, broadcast, radio, IP networks, basically I mean, I call it the smart city kind of go-to. But that's basically anything involving kind of that edge piece. This is a huge thing. So as smart cities are on the table, which and you seeing 5G being called more of an enterprise app where there's feeding large dense areas of people this is now a new modern version of what I would call the, the smart city blueprint. What's changed in your mind on this whole modernization of this smart city infrastructure concept? What's new? What's cutting edge? >> Yeah. I would say that, you know there was an explosion of data and a lot of our insights aren't coming from one system anymore. It's coming from collecting data from all of the different pieces, the different infrastructure whether that's your fiber infrastructure or your wireless infrastructure, and then to solve problems you need to correlate data across those systems. So we're seeing more and more technologies that allow you to do that correlation. And that's really where we're finding tons of value, right? >> Thomas, take us through what you guys do as a, as a, as a product, a value proposition, the secret sauce, and and why I'm here with Jeremy? Why is this conversation important for the folks watching? What's the connection between Chaos Search and BAI communication? >> Well, it's data, right? And lots of it. So our unique platform allows people like Jeremy to stream all this data, right? In you know, today's world terabytes go to petabytes really easily, billions go to trillion really easily, and so providing the analysis of that data for their operations is challenging particularly based on technology and architectures that have been around for a long time. So what we do here at Chaos Search is the ability for BIA to stream all these devices, all these services into one centralized data lake on their cloud outer storage, where we connect to that cloud outer storage and transform it into an analytical database to do, in this case log analytics and do it seamlessly, easily where a new workload a new stream just streams into that lake. And we, as a service take over, we discover we index it and publish well-known open API and visualization so that they can focus on their business, not all the operational data pipeline, database and data engineering type work that again, at these types of scales is is frankly a nightmare. >> You know, one of the things that we've always observed on The Cube when you see new things come out that are really cool groundbreaking products like you guys are doing it's always a challenge to manage the cost and complexity of bringing in the new. So Jeremy, take us through this tech stack here because you know, it's, sometimes it might be unwieldy just in from a tech stack perspective, nevermind the business logic or the business processes that got to be either unwound or changed. Can you take us through the IT stack that's critical to support your, your area? >> Yeah, absolutely. So with all the various different equipment you know, to provide our public wifi and and our desks, carrier agnostic, LT and 5G networks, you know, we need to be able to adhere to PCI compliance and ISO 27,000, so that, you know, requires us to keep a tremendous amount of our data. And the challenge we were facing is how do we do that cost effectively, and not have to make any sort of compromises on how we do that? A lot of times you'll find you don't know the value of your data today until tomorrow. An example would be COVID. You know, we, when we were storing data two years ago we weren't planning for a pandemic, but now that we were able to retain that data and look back we can see a tremendous amount of value with trying to forecast how our systems will recover when things get back to normal. And so when I met Thomas and we were sort of talking about how we were going to solve some of these data retention problems, he started explaining to me their compression in some of the performance metrics of their profession. And, you know, I said, oh, middle out compression. And it was a bit, it's been a bit of a running joke between me and him and I'm sure others, but it's incredibly impressive the amount of data we're able to store at the kind of cost, right? >> What, what problem does, did he solve for you? Because I mean, these guys, honestly, you know the startups have a lot and the Cloud's enabling more value now, we're seeing this, but when you look at this what was your, what was your core problem that you had? >> Yeah, so we, when you we want to be able to, I mean, primarily this is for our CIS log server. And CIS long servers today aren't what they were 10, 15 years ago where you just sort of had a machine and if something broke you went and looked, right? Now, they're very complex, that data is feeding to various systems and third-party software. So, you know, we're actively looking for changes in patterns and we have our, you know security teams auditing these from, for penetration testing and such. And then the getting that data to S3 so that we could have it in case, you know, for two, three years of storage. Well, the problem we were facing is all of that all of these different systems we needed to feed and retain data, we couldn't do that on site. We wanted to do use S3 but when we were doing some projections, it's like, we, we don't really have the budget for all of these places. Meeting Thomas and, and working with Chaos Search, you know, using their compression brought those costs down drastically. And then as we've been working with them the really exciting thing is they we're bringing more and more features to that surface or offering. So, you know, first it was just storing that data away. And now we're starting to build solutions off of that sitting in storage. So that's where it gets really exciting because you know, there, it's nothing to start getting anomaly detection off those logs, which, you know originally it was just, we need to store them in case somebody needs them two, three years from now. >> So Thomas Thomas, if I get this right then what I'm hearing is obviously I've put aside the complexity and the governing side the regulations for a minute just generally. Data retention as, as a key value proposition and having data available when you need it and then to do that and doing it in a very cost-effective simple way. It sounds like what you guys are offering. Is that right? >> Yeah, I mean, one key aspect of our solution is retention, right? Those are a lot of the challenges, but at the same time we provide real time notification like a classic log analytic type platform, alerting, monitoring. The key thing is to bringing both those worlds together and solving that problem. And so this, you know, middle in middle out, well, to be frank, we created a new technology called what we call Chaos Index that is a database index that is wonderfully small as as we're indicating, but also provides all the features that makes Cloud object storage, high performance. And so the idea is that use this lake offering to store all your data in a cost effective way but our service allows you to analyze it both in a long retention perspective as well as real-time perspective and bringing those two worlds together is so key because typically you have Silo Solutions and whether it's real-time at scale or retention scale the cost complexity and time to build out those solutions I know Jeremy knows also, well, a lot of folks come to us to solve those problems because you know when you're dealing with, you know terabytes and up, you know these things get complicated and to be frank, fall over quite often. >> Yeah. Let me, let me just ask you the question that's probably on everyone's mind who's watching and you guys probably have both heard this many times, because a lot of people just throw the data lake solution around like it's, you know why they whitewash their kind of old legacy solutions with data lake, store it on data lake. It's been called a data swamp. So people are fearful that, okay. I love this idea of a data lake, who doesn't like throwing data into a repository, having it available at will with notifications, all this secret magic beans that just magically create value. But I doubt that, I don't want to turn into a data swamp. So Thomas and Jeremy, talk about that, that concern. How do you mitigate that? How do you talk to that? Because if done properly, there's huge value in having a control plane or some sort of data system that is going to be tied in with signals and just storage retention. So I see the value. How do you manage the concern that people might say, Hey, I don't want to date a swamp? >> Yeah, I'll jump into that. So, you know, let's just be frank, Hadoop was a great tool for a very narrow scenario. I think that data swamp came out because people were using the tooling in an incorrect way. I've always had the belief that data lakes are the future. You just have the right to have the right service the right philosophy to leverage it. So what we do here at Chaos Search is we allow you to organize it, discover it, automatically index that data so that swamp doesn't get swampy. You know, when you stream data into your lake how do you organize it, such that it's has a nice stream? How do you transform that data into a value? So with our service we actually start where the storage begins, not a end point, not an archive. So we have tooling and services that keep your lake from being swampy to be, to be clear. And, but the key value is the benefits of the lake, the cost effectiveness, the reliability, security, the scale, those are all the benefits. The problem was that no one really made cloud offer storage a first-class citizen and we've done that. We've dressed the swamp nature but provided all the value of analysis. And that cost metrics, that scale. No one can touch cloud outer storage, it just, you can't. But what we've done is cracked the code of how you make it analytical. >> Jeremy, I want to get your thoughts on this too, on your side I mean, as a practitioner and customer of, of of these solutions, you know, the concern is am I missing anything? And I've been a big proponent of data retention for many, many years. You know, Dave Alondra in our Cube knows all know that I bang on the table all the time, store your data, be a data hoarder, because it's going to come back and be valuable. Costs are going down so I'm a big fan of data retention. But the fear might be on, what am I missing? Because machine learning starts to come in down the road you got AI, the more data you have that's accessible in real time, the more machine learning is effective. Do you, do you worry about missing anything or do you just store everything? >> We, we store everything. Sometimes it's, it's interesting where the value and insights come from your data. Something that see, might seem trivial today down the road offers tremendous, tremendous value. So one of the things we do is provide because we have wifi in the subway infrastructure, you know taking that wifi data, we can start to understand the flow of people in and out of the subway network. And we can take that and provide insights to the rail operators, which get them from A to B quicker. You know, when we built the wifi it wasn't with the intention of getting Torontonians across the city faster. But that was one of the values that we were able to get from the data in terms of, you know, Thomas's solution, I think one of the reasons we we engaged him in the first place is because I didn't believe his compression. It sounded a little too good to be true. And so when it was time to try them out, you know all we had to do was ship data to an S3 bucket. You know, there's tons of, of solutions to do that. And, and data shippers right out of the box. It took a few, you know, a few minutes and then to start exploring the data was in Cabana, which is or their dashboard, which is, you know, an interface that's easy to use. So we were, you know, within a two days getting the value out of that data that we were looking for which is, you know, phenomenal. We've been very happy. >> Thomas, sounds like you've got a great, great testimonial here and it's not like an easy problem that he's living in there. I mean, I think, you know, I was mentioning this earlier and we're going to get into it now. There's regulations and there's certain compliance issues. First of all, everyone has this now problem now, it's not just within that space. But just the technical complexities of packets moving around I got on my wifi and the stop here, I'm jumping over here, and there's a ton of data it's all over the place, it's totally unstructured. So it's a tough, tough test for you guys, Chaos Search. So yeah, it's almost like the Mount Everest of customer testimonials. You've got to, it's a big, it's a big use case here. How does this translate to other clients? And talk about this governance and security controls because I know this highly regulated and you got there's penalties involved on his side of the world and Telco, the providers that have these edge devices there's actually penalties and, and whatnot so, not just commercial, it's maybe a, you know risk management, but here there's actually penalties. >> Absolutely. So, you know centralizing your data has a real benefit of of not getting in trouble, right? So you have one place, you store one place that's a good thing, but what we've done and this was a key aspect to our offering is we as Chaos, Chaos Search folks, we don't own the customer's data. We don't own BIA's data. They own the data. They give us access rights, very standard way with Cloud App storage roll on policies from Amazon, read only access rights to their data. And so not owning a customer's data is a big selling point not only for them, but for us for compliance regulatory perspective. So, you know, unlike a lot of solutions where you move the data into them and now they are responsible, actually BIA owns everything. We, they provide access so that we could provide an analysis that they could turn off at any point in time. We're also SOC 2 type 1 and type 2 compliant you got to do it, you know, in this, this world, you know when we were young we ran at this because of all of these compliance scenarios that we will be in, but, you know, the long as short of it is, we're transient service. The storage, cloud storage is the source of truth where all data resides and, you know, think about it, it's architecturally smart, it's cost effective, it's secure, it's reliable, it's durable. But from a security perspective, having the customer own their own data is a big differentiation in the market, a big differentiation. >> Jeremy, talk about on your end the security controls surrounding the log management environments that span across countries with different regulations. Now you've got all kinds of policy dimensions and technical dimensions and topology dimensions. >> Yeah, absolutely. So how we approach it is we look at where we have offerings across the globe and we figure out what the sort of highest watermark level of adherence we need to hit. And then we standardize across that. And by shipping to S3, it allows us to enforce that governance really easily and right to Tom's point you know, we manage the data, which is very important to us and we don't have to be worried about a third party or if we want to change providers years down the road. Although I don't think anyone's coming out with 81% compression anytime soon (laughs). But yeah, so that's, for us, it's about meeting those high standards and having the technologies that enable us to do it. And Chaos Search is a very big part of that right now. >> All right let me ask you a question, for the folks watching that are like really interested in this topic, what would you say to them when evaluating Chaos Search obviously, your use case is complex, but so are others as enterprises start to have an edge, obviously the security posture shifts, everything shifts. There's no more perimeter and the data problem becomes acute to them. So the enterprises are going to start seeing what you've been living for in your world. What's your advice to people watching? >> My advice would be to give them a try. You know, it's it's has been really quite impressive. The customer service has been hands-on and we've been getting, you know, they've been under-promising and over-delivering, which when you have the kind of requirements to manage solutions in these very complex environment, cloud local, you know various data centers and such, you know that kind of customer service is very important, right? It enables us to continue to deliver those high quality solutions. >> So Thomas give us the, the overview of the secret sauce. You've got a great testimonial here. You got people watching, what's different now in the world that you're going after, what wave are you on? Talk to the people who are watching this and saying, okay why Chaos Search? Why are you relevant? Obviously there's some cool things you're doing. I love that. What's cool, and what's relevant and why what's in it for them if they work with you? >> Yeah. So you know, that that whole Silicon Valley reference actually got that from my patent attorney when we were talking. But yeah, no, we, we, you know, focus on if we can crack this code of making data, one a face small, store small, moves small, process small. But then make it multimodal access make it virtual transformation. If we could do that, and we could transform cloud outer storage into a high-performance medical database all these heavy, heavy problems, all that complexity that scaffolding that you build to do these type of scales would be solved. Now what we had to focus on and this has been my, I guess you say life passion is working on a new data representation. And that's our secret sauce that enables a new architecture a new service that where the customer folks on their tooling, their APIs, their visualizations that they know and love, what we focus is on taking that data lake, and again, to transform it into an analytical database, both for log analytics think of like elastic search replacement, as well as a BI replacement for your SQL warehousing database. And coming out later this year into 2022, ML support on one representation. You don't have the silo your information you don't have to re index your data, both. So elastic search CQL and actually ML TensorFlow actions on the exact same representation. So think about the data retention, doing some post analysis on all those logs of data, months, years, and then maybe set up some triggers if you see some anomaly that's happening within your service. So you think about it, the hunt with BI reporting, with predictive analysis on one platform. Again, it sounds a little unicorn, I agree with Jeremy, maybe it didn't sound true but it's been a life's work. So it didn't happen overnight. And you know, it's eight years, at least in the in the making, but I guess the life journey in the end. >> Well, you know, the timing is great. You know, all the database geeks out there who have been following the data industry know that, you know there's a good point for structured data but when you start getting into mechanisms and they become a bottleneck or a blocker to innovation, you know you starting to see this idea of a data lake being let the data kind of form, let it be. You know, I hate the word control plane but more of a, a connective tissue between systems is become an interesting thing. So now you can store everything so you know, no worries there, no blind spots and then let the magic of machine learning in the future, come around. So Jeremy, with that, I got to ask you since you're the bad boy of data analytics at BAI communications head of data analytics, what does that, what do you look for in the future as you start to set this up because I can almost imagine and connecting the dots here in the interview, you got the data lake you're storing everything, which is good. Now you have to create more insights and get ahead of the curve and provide some prescriptive and automated ways to do things better. What's your vision? >> First I would just like to say that, you know when astrophysicists talk about, you know, dark dark energy, dark matter, I'm convinced that's where Thomas is hiding the ones and zeros to get that compression, right? I don't don't know that to be fact but I know it to be true. And then in terms of machine learning and these sort of future technologies, which are becoming available you know, starting from scratch and trying to build out you know, models that have value, you know that takes a fair amount of work. And that landscape keeps changing, right? Being able to push our data into an S3 bucket and then you know, retain that data and then get anomaly detection on top of it. That's, I mean, that's something special and that unlocks a lot of ability for you know, our teams to very easily deliver anomaly detection, machine learning to our customers, without having to take on a lot of work to understand the latest and greatest in machine learning. So, I mean, it's really empowering to our team, right? And, and a tool that we're going to. >> Yeah, I love and I love the name, Chaos Search, Thomas. I got to say, you know it brings up the inside baseball around chaos monkey which everyone knows was a DevOps tool to create kind of day two simulate day two operations and disruptions in DevOps. But what you're really getting at is your whole new architecture that's beyond DevOps movement, it's like next gen architecture. Talk about that to the people watching who have a lot of legacy and want to transform over to a more enabling platform that's going to give them some headroom for their data. What, what do you say to them? How do they get started? What, how should they, how what's their mindset? What they, what are some first principles you can share? >> Well, you know, I always start with first principles but you know, I like to say we're the next next gen. The key thing with the Chaos Search offering is you can start today with B, without even Chaos Search. Stream your data to S3. We're going to make hip and cool data lakes again. And actually it's a, Google it now, data lakes are hip and cool. So start streaming now, start managing your data in a well-formed centralized viewpoint with security governance and cost effectiveness. Then call Chaos Search shop, and we'll make access to it easily, simply to ultimately solve your problems. The bug whether your security issue, the bug, whether it's more performance issues at scale, right? And so when workloads can be added instantaneously in your data lake it's, it's game changing it's mind changing. So from the DevOps folks where, you know, you're up all night trying to say, how am I going to scale from terabyte, you know one today to 50 terabytes, don't. Stream it to S3. We'll take over, we'll worry about that scale pain. You worry about your job of security, performance, operations, integrity. >> That really highlights the cloud scale the value proposition as, as apps start to be using data as an input, not just as a a part of a repo repo, so great stuff. Thomas, thanks for sharing your life's work and your technology magic. Jeremy, thanks for coming on and sharing your use cases with us and how you are making it all work. Appreciate it. >> Thank you. >> My pleasure. >> Okay. This is The Cubes, coverage and presenting AWS this time showcase the next big thing here with Chaos Search. I'm John Furrier, your host. Thanks for watching. (upbeat music)
SUMMARY :
great to have you on. it's a new kind of last mile if you will, specifically where, you know and you know, someone was driving and you can imagine just the amount and you seeing 5G being called that allow you to do that correlation. and so providing the analysis and complexity of bringing in the new. And the challenge we were and we have our, you know and having data available when you need it And so this, you know, of data system that is going to be tied in is we allow you to organize it, of these solutions, you So we were, you know, within and you got there's penalties of solutions where you the security controls surrounding the log and having the technologies and the data problem you know, they've been after, what wave are you on? that scaffolding that you in the interview, you got the data lake like to say that, you know I got to say, you know but you know, I like to say with us and how you the next big thing here with Chaos Search.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeremy | PERSON | 0.99+ |
Thomas | PERSON | 0.99+ |
Dave Alondra | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Jeremy Thomas | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
Telco | ORGANIZATION | 0.99+ |
Jeremy Foran | PERSON | 0.99+ |
BIA | ORGANIZATION | 0.99+ |
Tom | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
81% | QUANTITY | 0.99+ |
Chaos Search | ORGANIZATION | 0.99+ |
eight years | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
Palo Alto, California | LOCATION | 0.99+ |
2000 | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
50 terabytes | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
billions | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Toronto | LOCATION | 0.99+ |
ORGANIZATION | 0.98+ | |
First | QUANTITY | 0.98+ |
S3 | TITLE | 0.98+ |
one platform | QUANTITY | 0.98+ |
ChaosSearch | ORGANIZATION | 0.98+ |
first principles | QUANTITY | 0.98+ |
two worlds | QUANTITY | 0.98+ |
first principles | QUANTITY | 0.98+ |
2022 | DATE | 0.98+ |
one place | QUANTITY | 0.98+ |
one system | QUANTITY | 0.98+ |
three years | QUANTITY | 0.98+ |
DevOps | TITLE | 0.98+ |
two years ago | DATE | 0.97+ |
Thomas Thomas | PERSON | 0.96+ |
Chaos | ORGANIZATION | 0.96+ |
SQL | TITLE | 0.96+ |
BAI | ORGANIZATION | 0.96+ |
trillion | QUANTITY | 0.95+ |
BAI Communications | ORGANIZATION | 0.95+ |
Mount Everest | LOCATION | 0.95+ |
The Cube | ORGANIZATION | 0.95+ |
this year | DATE | 0.95+ |
first | QUANTITY | 0.95+ |
Cloud App | TITLE | 0.94+ |
Hadoop | TITLE | 0.94+ |
pandemic | EVENT | 0.94+ |
first place | QUANTITY | 0.94+ |
Discussion about Walmart's Approach | Supercloud2
(upbeat electronic music) >> Okay, welcome back to Supercloud 2, live here in Palo Alto. I'm John Furrier, with Dave Vellante. Again, all day wall-to-wall coverage, just had a great interview with Walmart, we've got a Next interview coming up, you're going to hear from Bob Muglia and Tristan Handy, two experts, both experienced entrepreneurs, executives in technology. We're here to break down what just happened with Walmart, and what's coming up with George Gilbert, former colleague, Wikibon analyst, Gartner Analyst, and now independent investor and expert. George, great to see you, I know you're following this space. Like you read about it, remember the first days when Dataverse came out, we were talking about them coming out of Berkeley? >> Dave: Snowflake. >> John: Snowflake. >> Dave: Snowflake In the early days. >> We, collectively, have been chronicling the data movement since 2010, you were part of our team, now you've got your nose to the grindstone, you're seeing the next wave. What's this all about? Walmart building their own super cloud, we got Bob Muglia talking about how these next wave of apps are coming. What are the super apps? What's the super cloud to you? >> Well, this key's off Dave's really interesting questions to Walmart, which was like, how are they building their supercloud? 'Cause it makes a concrete example. But what was most interesting about his description of the Walmart WCMP, I forgot what it stood for. >> Dave: Walmart Cloud Native Platform. >> Walmart, okay. He was describing where the logic could run in these stateless containers, and maybe eventually serverless functions. But that's just it, and that's the paradigm of microservices, where the logic is in this stateless thing, where you can shoot it, or it fails, and you can spin up another one, and you've lost nothing. >> That was their triplet model. >> Yeah, in fact, and that was what they were trying to move to, where these things move fluidly between data centers. >> But there's a but, right? Which is they're all stateless apps in the cloud. >> George: Yeah. >> And all their stateful apps are on-prem and VMs. >> Or the stateful part of the apps are in VMs. >> Okay. >> And so if they really want to lift their super cloud layer off of this different provider's infrastructure, they're going to need a much more advanced software platform that manages data. And that goes to the -- >> Muglia and Handy, that you and I did, that's coming up next. So the big takeaway there, George, was, I'll set it up and you can chime in, a new breed of data apps is emerging, and this highly decentralized infrastructure. And Tristan Handy of DBT Labs has a sort of a solution to begin the journey today, Muglia is working on something that's way out there, describe what you learned from it. >> Okay. So to talk about what the new data apps are, and then the platform to run them, I go back to the using what will probably be seen as one of the first data app examples, was Uber, where you're describing entities in the real world, riders, drivers, routes, city, like a city plan, these are all defined by data. And the data is described in a structure called a knowledge graph, for lack of a, no one's come up with a better term. But that means the tough, the stuff that Jack built, which was all stateless and sits above cloud vendors' infrastructure, it needs an entirely different type of software that's much, much harder to build. And the way Bob described it is, you're going to need an entirely new data management infrastructure to handle this. But where, you know, we had this really colorful interview where it was like Rock 'Em Sock 'Em, but they weren't really that much in opposition to each other, because Tristan is going to define this layer, starting with like business intelligence metrics, where you're defining things like bookings, billings, and revenue, in business terms, not in SQL terms -- >> Well, business terms, if I can interrupt, he said the one thing we haven't figured out how to APIify is KPIs that sit inside of a data warehouse, and that's essentially what he's doing. >> George: That's what he's doing, yes. >> Right. And so then you can now expose those APIs, those KPIs, that sit inside of a data warehouse, or a data lake, a data store, whatever, through APIs. >> George: And the difference -- >> So what does that do for you? >> Okay, so all of a sudden, instead of working at technical data terms, where you're dealing with tables and columns and rows, you're dealing instead with business entities, using the Uber example of drivers, riders, routes, you know, ETA prices. But you can define, DBT will be able to define those progressively in richer terms, today they're just doing things like bookings, billings, and revenue. But Bob's point was, today, the data warehouse that actually runs that stuff, whereas DBT defines it, the data warehouse that runs it, you can't do it with relational technology >> Dave: Relational totality, cashing architecture. >> SQL, you can't -- >> SQL caching architectures in memory, you can't do it, you've got to rethink down to the way the data lake is laid out on the disk or cache. Which by the way, Thomas Hazel, who's speaking later, he's the chief scientist and founder at Chaos Search, he says, "I've actually done this," basically leave it in an S3 bucket, and I'm going to query it, you know, with no caching. >> All right, so what I hear you saying then, tell me if I got this right, there are some some things that are inadequate in today's world, that's not compatible with the Supercloud wave. >> Yeah. >> Specifically how you're using storage, and data, and stateful. >> Yes. >> And then the software that makes it run, is that what you're saying? >> George: Yeah. >> There's one other thing you mentioned to me, it's like, when you're using a CRM system, a human is inputting data. >> George: Nothing happens till the human does something. >> Right, nothing happens until that data entry occurs. What you're talking about is a world that self forms, polling data from the transaction system, or the ERP system, and then builds a plan without human intervention. >> Yeah. Something in the real world happens, where the user says, "I want a ride." And then the software goes out and says, "Okay, we got to match a driver to the rider, we got to calculate how long it takes to get there, how long to deliver 'em." That's not driven by a form, other than the first person hitting a button and saying, "I want a ride." All the other stuff happens autonomously, driven by data and analytics. >> But my question was different, Dave, so I want to get specific, because this is where the startups are going to come in, this is the disruption. Snowflake is a data warehouse that's in the cloud, they call it a data cloud, they refactored it, they did it differently, the success, we all know it looks like. These areas where it's inadequate for the future are areas that'll probably be either disrupted, or refactored. What is that? >> That's what Muglia's contention is, that the DBT can start adding that layer where you define these business entities, they're like mini digital twins, you can define them, but the data warehouse isn't strong enough to actually manage and run them. And Muglia is behind a company that is rethinking the database, really in a fundamental way that hasn't been done in 40 or 50 years. It's the first, in his contention, the first real rethink of database technology in a fundamental way since the rise of the relational database 50 years ago. >> And I think you admit it's a real Hail Mary, I mean it's quite a long shot right? >> George: Yes. >> Huge potential. >> But they're pretty far along. >> Well, we've been talking on theCUBE for 12 years, and what, 10 years going to AWS Reinvent, Dave, that no one database will rule the world, Amazon kind of showed that with them. What's different, is it databases are changing, or you can have multiple databases, or? >> It's a good question. And the reason we've had multiple different types of databases, each one specialized for a different type of workload, but actually what Muglia is behind is a new engine that would essentially, you'll never get rid of the data warehouse, or the equivalent engine in like a Databricks datalake house, but it's a new engine that manages the thing that describes all the data and holds it together, and that's the new application platform. >> George, we have one minute left, I want to get real quick thought, you're an investor, and we know your history, and the folks watching, George's got a deep pedigree in investment data, and we can testify against that. If you're going to invest in a company right now, if you're a customer, I got to make a bet, what does success look like for me, what do I want walking through my door, and what do I want to send out? What companies do I want to look at? What's the kind of of vendor do I want to evaluate? Which ones do I want to send home? >> Well, the first thing a customer really has to do when they're thinking about next gen applications, all the people have told you guys, "we got to get our data in order," getting that data in order means building an integrated view of all your data landscape, which is data coming out of all your applications. It starts with the data model, so, today, you basically extract data from all your operational systems, put it in this one giant, central place, like a warehouse or lake house, but eventually you want this, whether you call it a fabric or a mesh, it's all the data that describes how everything hangs together as in one big knowledge graph. There's different ways to implement that. And that's the most critical thing, 'cause that describes your Uber landscape, your Uber platform. >> That's going to power the digital transformation, which will power the business transformation, which powers the business model, which allows the builders to build -- >> Yes. >> Coders to code. That's Supercloud application. >> Yeah. >> George, great stuff. Next interview you're going to see right here is Bob Muglia and Tristan Handy, they're going to unpack this new wave. Great segment, really worth unpacking and reading between the lines with George, and Dave Vellante, and those two great guests. And then we'll come back here for the studio for more of the live coverage of Supercloud 2. Thanks for watching. (upbeat electronic music)
SUMMARY :
remember the first days What's the super cloud to you? of the Walmart WCMP, I and that's the paradigm of microservices, and that was what they stateless apps in the cloud. And all their stateful of the apps are in VMs. And that goes to the -- Muglia and Handy, that you and I did, But that means the tough, he said the one thing we haven't And so then you can now the data warehouse that runs it, Dave: Relational totality, Which by the way, Thomas I hear you saying then, and data, and stateful. thing you mentioned to me, George: Nothing happens polling data from the transaction Something in the real world happens, that's in the cloud, that the DBT can start adding that layer Amazon kind of showed that with them. and that's the new application platform. and the folks watching, all the people have told you guys, Coders to code. for more of the live
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Bob Muglia | PERSON | 0.99+ |
Tristan Handy | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Chaos Search | ORGANIZATION | 0.99+ |
Jack | PERSON | 0.99+ |
Tristan | PERSON | 0.99+ |
12 years | QUANTITY | 0.99+ |
Berkeley | LOCATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
DBT Labs | ORGANIZATION | 0.99+ |
10 years | QUANTITY | 0.99+ |
two experts | QUANTITY | 0.99+ |
Supercloud 2 | TITLE | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Muglia | ORGANIZATION | 0.99+ |
one minute | QUANTITY | 0.99+ |
40 | QUANTITY | 0.99+ |
two great guests | QUANTITY | 0.98+ |
Wikibon | ORGANIZATION | 0.98+ |
50 years | QUANTITY | 0.98+ |
John | PERSON | 0.98+ |
Rock 'Em Sock 'Em | TITLE | 0.98+ |
today | DATE | 0.98+ |
first person | QUANTITY | 0.98+ |
Databricks | ORGANIZATION | 0.98+ |
S3 | COMMERCIAL_ITEM | 0.97+ |
50 years ago | DATE | 0.97+ |
2010 | DATE | 0.97+ |
Mary | PERSON | 0.96+ |
first days | QUANTITY | 0.96+ |
SQL | TITLE | 0.96+ |
one | QUANTITY | 0.95+ |
Supercloud wave | EVENT | 0.95+ |
each one | QUANTITY | 0.93+ |
DBT | ORGANIZATION | 0.91+ |
Supercloud | TITLE | 0.91+ |
Supercloud2 | TITLE | 0.91+ |
Supercloud 2 | ORGANIZATION | 0.89+ |
Snowflake | TITLE | 0.86+ |
Dataverse | ORGANIZATION | 0.83+ |
triplet | QUANTITY | 0.78+ |
Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps
>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)
SUMMARY :
with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Bob Muglia | PERSON | 0.99+ |
Alex Myerson | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
David Flynn | PERSON | 0.99+ |
Veronica | PERSON | 0.99+ |
Jack | PERSON | 0.99+ |
Nelu Mihai | PERSON | 0.99+ |
Zhamak Dehghani | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
Nick Taylor | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jack Greenfield | PERSON | 0.99+ |
Kristen Martin | PERSON | 0.99+ |
Ken Schiffman | PERSON | 0.99+ |
Veronica Durgin | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Rob Ho | PERSON | 0.99+ |
Warner Media | ORGANIZATION | 0.99+ |
Tristan Handy | PERSON | 0.99+ |
Veronika Durgin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Ionis Pharmaceutical | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Bob Muglia | PERSON | 0.99+ |
David Flore | PERSON | 0.99+ |
DBT Labs | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
21 sessions | QUANTITY | 0.99+ |
Darren Bramberm | PERSON | 0.99+ |
33 guests | QUANTITY | 0.99+ |
Nir Zuk | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Harveer Singh | PERSON | 0.99+ |
Kit Colbert | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Supercloud 2 | TITLE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
Western Union | ORGANIZATION | 0.99+ |
Cohesity | ORGANIZATION | 0.99+ |
Supercloud | ORGANIZATION | 0.99+ |
200 locations | QUANTITY | 0.99+ |
August | DATE | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Data Mesh | ORGANIZATION | 0.99+ |
Palo Alto Networks | ORGANIZATION | 0.99+ |
David.Vellante@siliconangle.com | OTHER | 0.99+ |
next week | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
first point | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
VMware | ORGANIZATION | 0.98+ |
Silicon Angle | ORGANIZATION | 0.98+ |
ETR | ORGANIZATION | 0.98+ |
Eric Bradley | PERSON | 0.98+ |
two | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Sachs | ORGANIZATION | 0.98+ |
SAKS | ORGANIZATION | 0.98+ |
Supercloud | EVENT | 0.98+ |
last August | DATE | 0.98+ |
each week | QUANTITY | 0.98+ |
Kevin Miller, Amazon Web Services | ChaosSearch: Make Your Data Lake Deliver
>>Welcome back. I really liked the drill down a data lakes with ed Walsh and Thomas Hazel. They building some cool stuff over there. The data lake we see it's evolving and chaos search has built some pretty cool tech to enable customers to get more value out of data that's in lakes so that it doesn't become stagnant. Time to dig, dig deeper, dive deeper into the water. We're here with Kevin Miller. Who's the vice president and general manager of S3 at Amazon web services. We're going to talk about activating S3 for analytics. Kevin, welcome. Good to see you again. >>Yeah, thanks Dan. It's great to be here again. So >>S3 was the very first service offered by AWS 15 years ago. We covered that out in Seattle. It was a great event you guys had, it has become the most prominent and popular example of object storage in the marketplace. And for years, customers use S3 is simple, cheap data storage, but because there's so much data now stored in S3 customers are looking to do more with the platform. So Kevin, as we look ahead to reinvent this year, we're super excited about that. What's new. What's got you excited when it comes to the AWS flagship storage offering. >>Yeah. Dan, well, that's right. And we're definitely looking forward to reinvent. We have some fun things that we're planning to announce there. So stay tuned on those, but I'd say that one of the things that's most exciting for me as customers do more with their data and look to store more, to capture more of the data that they're generating every day is our storage class that we had an announced a few years ago, but we, we actually just announced some improvements to the S3 intelligent tiering storage class. And this is really our storage class. The only one in the cloud at this point that delivers automatic storage cost savings for customers where the data access patterns change. And that can happen. For example, as customers have some data that they're collecting and then a team spins up and decides to try to do something more with that data and that data that was very cool and sitting sort of idle is now being actively used. And so with intelligent tiering, we're automatically monitoring data. And then there's for customers. There's no retrieval costs and no tiering charges. We're automatically moving the data into an access tier that reduces their costs though. And that data is not being accessed. So we've announced some improvements to that just a few months ago. And I'll just say, I look forward to some more announcements at reinvent that will extend, continue to extend what we have in our intelligent tiering storage class. >>That's cool, Kevin. I mean, you've seen, you know, that technology, that tiering concept had been around, you know, but since back in the mainframe days, the problem was, it was always inside a box. So you, you didn't have the scale of the cloud and you didn't have that automation. So I want to ask you as the leader of that business, when you meet with customers, Kevin, what do they tell you that they're there they're facing as challenges when they want to do more, get better insights out of all that data that they've moved into S3? >>Well, I think that's just it, Dave. I think that most customers I speak with they, of course they have the things that they want to do with their storage costs and reducing storage costs and just making sure they have capacity available. But increasingly I think the real emphasis is around business transformation. What can I do with this data? That's very unique and different than either that unlike, you know, prior optimizations where it would just reduce the bottom line, they're saying, what can I do that will actually drive my top line more by either, you know, generating new product ideas, um, allowing for faster, you know, close, closed loop process for acquiring customers. And so it's really that business transformation and all, everything around it that I think is really exciting. And for a lot of customers, that's a pretty long journey and, and helping them get started on that, including transforming their workforce and up-skilling, you know, parts of their workforce to be more agile and more oriented around software development, developing new products using software. >>So w when I first met the folks at, at chaos search, you know, Thomas took me through sort of the architecture w with ed as well. They had me at, you don't have to move your data. That was saying that was the grabber for me. And there are a number of public customers that digital river, uh, Blackboard or Klarna, we're going to get the customer perspective little later on and others that use both AWS S3 and chaos search. And they're trying to get more out of their, their S3 data and execute analytics at scale. So wonder if you could share with us Kevin, what types of activities and opportunities do you see for customers like these that are making the move to put their enterprise data in S3 in terms of capabilities and outcomes that they are trying to achieve and are able to achieve beyond using S3 is just a Bitbucket, >>Right? Well, Dan, I think you hit the nail on the head when you talk about outcomes. Cause that I think is, is key here. Customers want to reduce the time it takes to get to a tangible result that it affects the business that improves their business. And so that's one of the things that I excites me about what CAS search is doing here specifically is that automatic indexing, being able to take the data as it is in their bucket, index it and keep that index fresh and then allow for the customers to innovate on top of that and to try to experiment with a new capability, see, see what works and then double down on the things that really do work to drive that business. And so I just think that that capability reduces the amount of what I might call undifferentiated, heavy, lifting the work to just sort of index and organize and catalog data. And instead allow customers to really focus on here's the idea. Let's try to get this into production or into a test environment as quickly as possible to see if this can really drive some value for our business. >>Yeah. So you're seeing that sort of value that you've mentioned the non-differentiated heavy lifting, moving up the stack, right. It used to just be provisioning and managing the, now it's all the layers above that and it would go and beyond that. So my question to you, Kevin, is how do you see the evolution of this, all this data at scale I'm especially interested in, as it pertains to data that's of course, an S3, which is your swim lane. When you talk to customers who want to do more with their data and analytics, and by the way, even beyond analytics, you know, where it's having conversations now in the community about, about building data products and creating new value, but how do you respond and how do you see chaos search fitting in to those outcomes? >>Well, I think that's, that's it Dave, it's about kind of going up the stack and instead of spending time organizing and cataloging data, particularly as the data volumes give much larger when the modern customers and modern data lakes that we're seeing quickly go from a few petabytes to tens, to hundreds of petabytes or more. And when you reaching that kind of scale of data, it's a single person can reasonably kind of wrap their head around all that data. You need tools as three provides a number of first party tools and, you know, we're investing in things like our S3 batch operations to really help give the end users of that data, the business owners that leverage to manage their data at scale and apply their new ideas to the data and generate, you know, pilots and production work that really drives their business forward. And so I think that, you know, cast search again, I would just say as a good example of, you know, the kind of software that I think helps go, upstack automate some of that data management and just help customers focus really specifically on the things that they want to accomplish for their, their business. >>So this is, >>I mean, we've talked for well over a decade, how to get more value out of data. And it's been challenging for a lot of organizations, but we're seeing, we're seeing themes of scale automation, fine-grain tooling ecosystem participating, uh, on top of that data and then extracting that, that data value who Kevin, I'm really excited to see you face to face at re-inventing and learn more about some of the announcements that you're going to make. We'll see you there. >>Yeah. Stay tuned. Looking forward to seeing in person absolutely >>Have Kevin on, keep it right there because in a moment we're going to get the customer perspective on how a leading practitioner is applying chaos search on top of S3 to create a business value from data you're watching the cube, your leader, digital high tech coverage.
SUMMARY :
Good to see you again. So stored in S3 customers are looking to do more with the platform. And I'll just say, I look forward to some more announcements at reinvent that will extend, that business, when you meet with customers, Kevin, what do they tell you that they're And so it's really that business transformation and all, everything around it that I think is really exciting. So w when I first met the folks at, at chaos search, you know, And so that's one of the things that I excites So my question to you, Kevin, is how do you see the evolution of this, And so I think that, you know, cast search again, I would just say as a good example of, you know, I'm really excited to see you face to face at re-inventing and learn more about some Looking forward to seeing in person absolutely of S3 to create a business value from data you're watching the cube,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David Nicholson | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Joel | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Mona | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
David Vellante | PERSON | 0.99+ |
Keith | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Jeff | PERSON | 0.99+ |
Kevin | PERSON | 0.99+ |
Joel Minick | PERSON | 0.99+ |
Andy | PERSON | 0.99+ |
Ryan | PERSON | 0.99+ |
Cathy Dally | PERSON | 0.99+ |
Patrick | PERSON | 0.99+ |
Greg | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Stephen | PERSON | 0.99+ |
Kevin Miller | PERSON | 0.99+ |
Marcus | PERSON | 0.99+ |
Dave Alante | PERSON | 0.99+ |
Eric | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Dan | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Greg Tinker | PERSON | 0.99+ |
Utah | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Raleigh | LOCATION | 0.99+ |
Brooklyn | LOCATION | 0.99+ |
Carl Krupitzer | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
Lenovo | ORGANIZATION | 0.99+ |
JetBlue | ORGANIZATION | 0.99+ |
2015 | DATE | 0.99+ |
Dave | PERSON | 0.99+ |
Angie Embree | PERSON | 0.99+ |
Kirk Skaugen | PERSON | 0.99+ |
Dave Nicholson | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
Simon | PERSON | 0.99+ |
United | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Southwest | ORGANIZATION | 0.99+ |
Kirk | PERSON | 0.99+ |
Frank | PERSON | 0.99+ |
Patrick Osborne | PERSON | 0.99+ |
1984 | DATE | 0.99+ |
China | LOCATION | 0.99+ |
Boston | LOCATION | 0.99+ |
California | LOCATION | 0.99+ |
Singapore | LOCATION | 0.99+ |