Ed Bailey, Cribl | AWS Startup Showcase S2 E2
(upbeat music) >> Welcome everyone to theCUBE presentation of the AWS Startup Showcase, the theme here is Data as Code. This is season two, episode two of our ongoing series covering the exciting startups from the AWS ecosystem. And talk about the future of data, future of analytics, the future of development and all kind of cool stuff in Multicloud. I'm your host, John Furrier. Today we're joined by Ed Bailey, Senior Technology, Technical Evangelist at Cribl. Thanks for coming on the queue here. >> I thank you for the invitation, thrilled to be here. >> The theme of this session is the observability lake, which I love by the way I'm getting into that in a second. A breach investigation's best friend, which is a great topic. Couple of things, one, I like the breach investigation angle, but I also like this observability lake positioning, because I think this is a teaser of what's coming, more and more data usage where it's actually being applied specifically for things here, it's observability lake. So first, what is an observability lake? Why is it important? >> Why it's important is technology professionals, especially security professionals need data to make decisions. They need data to drive better decisions. They need data to understand, just to achieve understanding. And that means they need everything. They don't need what they can afford to store. They don't need not what vendor is going to let them store. They need everything. And I think as a point of the observability lake, because you couple an observability pipeline with the lake to bring your enterprise of data, to make it accessible for analytics, to be able to use it, to be able to get value from it. And I think that's one of the things that's missing right now in the enterprises. Admins are being forced to make decisions about, okay, we can't afford to keep this, we can afford to keep this, they're missing things. They're missing parts of the picture. And by bringing, able to bring it together, to be able to have your cake and eat it too, where I can get what I need and I can do it affordably is just, I think that's the future, and it just drives value for everyone. >> And it just makes a lot of sense data lake or the earlier concert, throw everything into the lake, and you can figure it out, you can query it, you can take action on it real time, you can stream it. You can do all kinds of things with it. Verb observability is important because it's the most critical thing people are doing right now for all kinds of things from QA, administration, security. So this is where the breach piece comes in. I like that's part of the talk because the breached investigation's best friend, it implies that you got the secret sourced to behind it, right? So, what is the state of the breach investigation today? What's going on with that? Because we know breaches, we see 'em out there, but like, why is this the best friend of a breach investigator? >> Well, and this is unfortunate, but typically there's an enormous delay between breach and detection. And right now, there's an IBM study, I think it's 287 days, but from the actual breach to detection and containment. It's an enormous amount of time. And the key is so when you do detect a breach, you're bringing in your instant, your response team, and typically without an observability lake, without Cribl solutions around observability pipeline, you're going to have an incomplete picture. The incident response team has to first to understand what's the scope of the breach. Is it one server? Is it three servers? Is it all the servers? You got to understand what's been compromised, what's been the end, what's the impact? How did the breach occur in the first place? And they need all the data to stitch that together, and they need it quickly. The more time it takes to get that data, the more time it takes for them to finish their analysis and contain the breach. I mean, hence the, I think about an 87, 90 days to contain a breach. And so by being able to remove the friction, by able to make it easier to achieve these goals, what shouldn't be hard, but making, by removing that friction, you speed up the containment and resolution time. Not to mention for many system administrators, they don't simply have the data because they can afford to store the data in their SIEM. Or they have to go to their backup team to get a restore which can take days. And so that's-- It's just so many obstacles to getting resolution right now. >> I mean, it's just, you're crawling through glass there, right? Because you think about it like just the timing aspect. Where is the data? Where is it stored and relevant and-- >> And do you have it at all? >> And you have it at all, and then, you know, that person doesn't work anywhere, they change jobs. I mean, who is keeping track of all this? You guys have now, this capability where you can come in and do the instrumentation with the observability lake without a lot of change to the environment, which is not the way it used to be. Used to be, buy a tool, build a platform. Cribl has a solution that eases the struggles with the enterprise. What specifically is that pain point? And what do you guys do specifically? >> Well, I'll start out with kind of example, what drew me to Cribl, so back in 2018. I'm running the Splunk team for a very large multinational. The complexity of that, we were dealing with the complexity of the data, the demands we were getting from security and operations were just an enormous issue to overcome. I had vendors come to me all the time that will solve your problems, but that means you got to move to our platform where you have to get rid of Splunk or you have to do this, and I'm losing something. And what Cribl stream brought into, was I could put it between my sources and my destinations and manage my data. And I would have flow control over the data. I don't have to lose anything. I could keep continuing use our existing analytics tools, and that sense of power and control, and I don't have to lose anything. I was like, there's something wrong here. This is too good to be true. And so what we're talking about now in terms of breach investigation, is that with Cribl stream, I can create a clone of my data to an object store. So this is in, this is almost any object store. So it can be AWS, it could be the other vendor object stores. It could be on-prem object stores. And then I can house my data, I can house all my data at the cheapest possible price. So instead of eating up my most expensive storage, I put all my data in my object store. And I only put the data I need for the detections in my SIEM. So if, and hopefully never, but if you do have a breach, lock stream has a wonderful UI that makes a trivial to then pick my data out of my object store and restore it back into my SIEM so that my IR team has to develop a complete picture of how the breach happen. What's the scope? What is their lateral movement and answer those questions. And it just, it takes the friction away. Just like you said, just no more crawling over glass. You're running to your solution. >> You mentioned object store, and you're streaming that in. You talk about the Cribble stream tool. I'm assuming there when you're streaming the pipeline stuff, but is there a schema involved? Is there database challenges? What, how do you guys look at that? I know you're vendor agnostic. I like that piece, you plug in and you leverage all the tools that are out there, Splunk, Datadog, whatever. But how about on the database side, what's the impact there? >> Well, so I'm assuming you're talking about the object store itself, so we don't have to apply the schema. We can fit the data to whichever the object store is. We structure the data so it makes it easier to understand. For example, if I want to see communications from one IP to another IP, we structure it to make it easier to see that and query that, but it is just, we're-- Yeah, it's completely vendor neutral and this makes it so simple, so simple to enable, I think-- >> So no pre-defined schema needed. >> No, not at all. And this, it made it so much easier. I think we enabled this for the enterprise. I think it took us three hours to do, and we were able to then start, I mean, start cutting our retention costs dramatically. >> Yeah, it's great when you get that kind of value, time to value critical and all the skeptics fall to the sides pretty quickly. (chuckles) I got to ask you, well, go ahead. >> So I say, I mean, previously, I would have to go to our backup team. We'd have to open up a ticket, we'd have to have a bridge, then we'd have to go through the process of pulling tape and being, it could take, you know, hours, hours if not days to restore the amount of data we needed. And just it, you know, we were able to run to our goals, and solve business problems instead of focusing on the process steps of getting things done. >> Right, so take me through the architecture here and some customer examples, 'cause you have the Cribble streaming there, observability pipeline. That's key, you mentioned that. >> Yes. >> And then they build out these observability lakes from that. So what is the impact of that? Can you share the customers that are using that solution? What are they seeing for benefits? What are some of the impact? Can you give us some specifics? >> I mean, I can't share with all the exact customer names. I can definitely give you some examples. Like referenceable conference would be TransUnion, so that I came from TransUnion. I was one of the first customers and it solved enormous number of problems for us. Autodesk is another great example. The idea that we're able to automate and data practices. I mean, just for example, what we were talking about with backups. We'd have to, you have to put a lot of time into managing your backups in your inner analytics platforms, you have to. And then you're locked into custom database schemas, you're locked into vendors. And it's also, it's still, it's expensive. So being able to spend a few hours, dramatically cut your costs, but still have the data available, and that's the key. I didn't have to make compromises, 'cause before I was having to say, okay, we're going to keep this, we're going to just drop this and hope for the best. And we just don't, we just didn't have to do that anymore. I think for the same thing for TransUnion and Autodesk, the idea that we're going to lower our cost, we're going to make it easier for our administrators to do their job and so they can spend more time on business value fundamentals, like responding to a breach. You're going to spend time working with your teams, getting value observability solutions and stop spending time on writing custom solutions using to open source tools. 'Cause your engineering time is the most precious asset for any enterprise and you got to focus your engineering time on where it's needed the most. >> Yeah, and they can't underestimate the hassle and cost of ownership, of swapping out pre-existing stuff, just for the sake of having a functionality. I mean that's a big-- >> It's pain and that's a big thing about lock stream is that being vendor neutral is so important. If you want to use the Splunk universal forwarder, that's great. If you want to use Beats, that's awesome. If you want to use Fluentd, even better. If you want to use all three, you can do that too. It's the customer choice and we're saying to people, use what suits your needs. And if you want to write some of your data to elastic, that's great. Some of your data to Splunk, that's even better. Some of it to, pick your pick, fine as well or Exabeam. You have the choices to put together, put your own solutions together and put your data where you need it to be. We're not asking you only in our ecosystem to work with only our partners. We're letting you pick and choose what suits your business. >> Yeah, you know, that's the direction I was just talking about the Amazon folks around their serverless. You know, you can use any tool, you know, you can, they have that core architecture for everything, the S3 and then pick whatever you want to use. SageMaker, just that other thing. This is the new way. That's the way it has to be to be effective. How do you guys handle that? What's been the reaction from customers? Do they like, roll their eyes and doubt you guys, or can you do it? Are they skeptical? How fast can you convert 'em over? (chuckles) >> Right, and that's always the challenge. And that's, I mean, the best part of my day is talking to customers. I love hearing and feedback, what they like, what they don't and what they need. And of course I was skeptical. I didn't believe it when I first saw it because I was like this, you know, because I'm, I was used to being locked in. I was used to having to put a lot of effort, a lot of custom code, like, what do you mean? It's this easy? I believe I did the first, this is 2018, and I did our first demos, like 30 minutes in, and I cut about 1/2 million dollars out of our license in the first 30 minutes in our first demo. And I was stunned because I mean, it's like, this is easy. >> Yeah, I mean-- >> Yeah, exactly. I mean, this is, and then this is the future. And then for example, we needed to bring in so like the security team wanted to bring in a UBA solution that wasn't part of the vendor ecosystem that we were in. And I was like, not a problem. We're going to use log stream. We're going to clone a copy of our data to the UBA solution. We were able to get value from this UBA solution in weeks. What typically is a six month cycle to start getting value. And it just, it was just too easy and the best part of it. And the thing is, it just struck me was my engineers can now spend their time on delivering value instead of integrations and moving data around. >> Yeah, and also we can spend more time preventing breaches. But what's interesting is counterintuitive here is that, if you, as you add more flexibility and choice, you'd think it'd be harder to handle a breach, right? So, now let's go back to the scenario. Now you guys, say an organization has a breach, and they have the observability pipeline, They got the lake in place, your observability lake, take me through the investigation. How easy is it, what happens? How they start it, what goes on? >> So, once your SOC detects a breach, then they bring in the idea. Typically you're going to bring in your incident response team. So what we did, and this is one more way that we removed that friction, we cleaned up the glass, is we delegate to the instant response team, the ability to restore, we call it-- So if Cribl calls it replay, we play data at our object store back into your SIEM. There's a very nice UI that gives you the ability to say, "I want data from this time period, at this time period, I want it to be all the data." Or the ability to filter and say, "I want this, just this IP." For example, if I detected, okay, this IP has been breached then I'm going to pull all the data that mentions this IP and this timeframe, hit a button and it just starts. And then it's going to restore how as fast your IOPS are for your solution. And then it's back in your tool, it's back in your tool. One of the things I also want to mention is we have an amazing enrichment capability. So one of the things that we would do is we would've pipelines so as the data comes out of the object store, it hits the pipeline, and then we enrich it. We hit use GoIP information, perverse and NAS. It gets processed through threat Intel feed. So the data's already enriched and ready for the incident response people to do their job. And so it just, it bamboozle the friction of getting to the point where I can start doing my job. >> You know, at this theme, this episode for this showcase is about Data as Code. And which is, you know, we've been, I've been saying this on theCUBES for since it was being around 13 years ago, that developers are going to be dealing with data like they deal with software code, and you're starting to see, you mentioned enrichment. Where do you see Data as Code going? How relevant in it now, because we really talking about when you add machine learning in here, that has to be enriched, and iterated on too. We're talking about taking things off a branch and putting it back into the core. This is a data discussion, this isn't software, but it sounds the same. >> Right, and this is something that the irony is that, I remember first time saying it to an auditor. I was constantly going with auditors, and that's what I described is I'm going to show you the code that manages the data. This is the data's code that's going to show you how we transform it, how we secure it, where the data goes, how it's enriched. So you can see the whole story, the data life cycle in one place. And that's how we handled our orders. And I think that is enormously, you know, positive because it's so easy to be confused. It's so easy to have complexity to get in the way of progress. And by being able to represent your Data as Code, it's a step forward 'cause the amount of data and the complexity of data, it's not getting simpler, it's getting more complex. So we need to come up with better ways to handle it. >> Now you've been on both sides of the fence. You've been in the trenches as customer, now you're a supplier with Great Solution. What are people doing with this data engineering roles? Because it's not enough data engineering. I mean, 'cause if you say Data as Code, if you believe that to be true and many people do, we do. And you looked at the history of infrastructure risk code that enabled DevOps, AIOps, MLOps, DataOps, it's happening, right? So data stack ops is coming. Obviously security is huge in this. How does that data engineering role evolve? Because it just seems more and more that there's going to be a big push towards an SRE version of data, right? >> I completely agree. I was working with a customer yesterday, and I spent a large part of our conversation talking about implementing development practices for administrators. It's a new role. It's a new way to think of things 'cause traditionally your Splunk or elastic administrators is talking about operating systems and memory and talking about how to use proprietary tools in the vendor, that's just not quite the same. And so we started talking about, you need to have, you need to start getting used to code reviews. Yeah, the idea of getting used to making sure everything has a comment, was one thing I told him was like, you know, if you have a function has to have a comment, just by default, just it has to. Yeah, the standards of how you write things, how you name things all really start to matter. And also you got to start adding, considering your skillset. And this is some mean probably one of the best hire I ever made was I hired a guy with a math degree, because I needed his help to understand how do machine learning works, how to pick the best type of algorithm. And I think this is going to evolve, that you're going to be just away from the gray bearded administrator to some other gray bearded administrator with a math degree. >> It's interesting, it's a step function. You have a data engineer who's got that kind of capabilities, like what the SRA did with infrastructure. The step function of enablement, the value creation from really good data engineering, puts the democratization playback on the table, and changes, >> Thank you very much John. >> And changes that entire landscape. How do you, what's your reaction to that? >> I completely agree 'cause so operational data. So operational security data is the most volatile data in the enterprise. It changes on a whim, you have developers who change things. They don't tell you what happens, vendor doesn't tell you what happened, and so that idea, that life cycle of managing data. So the same types of standards of disciplines that database administrators have done for years is going to have, it has to filter down into the operational areas, and you need tooling that's going to give you the ability to manage that data, manage it in flight in real time, in order to drive detections, in order to drive response. All those business value things we've been talking about. >> So I got to ask you the larger role that you see with observability lakes we were talking before we came on camera live here about how exciting this kind of concept is, and you were attracted to the company because of it. I love the observability lake concept because it puts all that data in one spot, you can manage it. But you got machine learning in AI around the corner that also can help. How has all this changed in the landscape of data security and things because it makes a lot of sense, and I can only see it getting better with machine learning. >> Yeah, definitely does. >> Totally, and so the core issue, and I don't want to say, so when you talk about observability, most people have assumptions around observability is only an operational or an application support process. It's also security process. The idea that you're looking for your unknown, unknowns. This is what keeps security administrators up at night is I'm being attacked by something I don't know about. How do you find those unknown? And that's where your machine learning comes in. And that's where that you have to understand there's so many different types of machine learning algorithms, where the guy that I hired, I mean, had started educating me about the umpteen number of algorithms and how it applies to different data and how you get different value, how you have to test your data constantly. There's no such thing as the magical black box of machine learning that gives you value. You have to implement, but just like the developer practices to keep testing and over and over again, data scientists, for example. >> The best friend of a machine learning algorithm is data, right? You got to keep feeding that data, and when the data sets are baked and secure and vetted, even better, all cool. Had great stuff, great insight. Congratulations Cribl, Great Solution. Love the architecture, love the pipelining of the observability data and streaming that in to a lake. Great stuff. Give a plug for the company where you guys are at, where people can get information. I know you guys got a bunch of live feeds on YouTube, Twitch, here in theCUBE. Where else can people find you? Give the plug. >> Oh, please, please join our slack community, go to cribl.io/community. We have an amazing community. This was another thing that drew me to the company is have a large group of people who are genuinely excited about data, about managing data. If you want to try Cribl out, we have some great tool. Try Cribl tools out. We have a cloud platform, one terabyte up free data. So go to cribl.io/cloud or cribl.cloud, sign up for, you know, just never times out. You're not 30 day, it's forever up to one terabyte. Try out our new products as well, Cribl Edge. And then finally come watch Nick Decker and I, every Thursday, 2:00 PM Eastern. We have live streams on Twitter, LinkedIn and YouTube live. And so just my Twitter handle is EBA 1367. Love to have, love to chat, love to have these conversations. And also, we are hiring. >> All right, good stuff. Great team, great concepts, right? Of course, we're theCUBE here. We got our video lake coming on soon. I think I love this idea of having these video. Hey, videos data too, right? I mean, we've got to keep coming to you. >> I love it, I love videos, it's awesome. It's a great way to communicate, it's a great way to have a conversation. That's the best thing about us, having conversations. I appreciate your time. >> Thank you so much, Ed, for representing Cribl here on the Data as Code. This is season two episode two of the ongoing series covering the hottest, most exciting startups from the AWS ecosystem. Talking about the future data, I'm John Furrier, your host. Thanks for watching. >> Ed: All right, thank you. (slow upbeat music)
SUMMARY :
And talk about the future of I thank you for the I like the breach investigation angle, to be able to have your I like that's part of the talk And the key is so when Where is the data? and do the instrumentation And I only put the data I need I like that piece, you We can fit the data to for the enterprise. I got to ask you, well, go ahead. and being, it could take, you know, hours, the Cribble streaming there, What are some of the impact? and that's the key. just for the sake of You have the choices to put together, This is the new way. I believe I did the first, this is 2018, And the thing is, it just They got the lake in place, the ability to restore, we call it-- and putting it back into the core. is I'm going to show you more that there's going to be And I think this is going to evolve, the value creation from And changes that entire landscape. that's going to give you the So I got to ask you the Totally, and so the core of the observability data and that drew me to the company I think I love this idea That's the best thing about Cribl here on the Data as Code. Ed: All right, thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Ed | PERSON | 0.99+ |
Ed Bailey | PERSON | 0.99+ |
TransUnion | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
2018 | DATE | 0.99+ |
Autodesk | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
three hours | QUANTITY | 0.99+ |
287 days | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
30 day | QUANTITY | 0.99+ |
six month | QUANTITY | 0.99+ |
first demo | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
Cribl | ORGANIZATION | 0.99+ |
first demos | QUANTITY | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
Twitch | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
three servers | QUANTITY | 0.99+ |
Splunk | ORGANIZATION | 0.99+ |
one spot | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
30 minutes | QUANTITY | 0.98+ |
Cribl | PERSON | 0.98+ |
UBA | ORGANIZATION | 0.98+ |
one place | QUANTITY | 0.98+ |
one terabyte | QUANTITY | 0.98+ |
first 30 minutes | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
SRA | ORGANIZATION | 0.97+ |
Today | DATE | 0.97+ |
one more way | QUANTITY | 0.97+ |
about 1/2 million dollars | QUANTITY | 0.96+ |
one server | QUANTITY | 0.96+ |
ORGANIZATION | 0.96+ | |
Beats | ORGANIZATION | 0.96+ |
Nick Decker | PERSON | 0.96+ |
Cribl | TITLE | 0.95+ |
today | DATE | 0.94+ |
Cribl Edge | TITLE | 0.94+ |
first customers | QUANTITY | 0.94+ |
87, 90 days | QUANTITY | 0.93+ |
Thursday, 2:00 PM Eastern | DATE | 0.92+ |
around 13 years ago | DATE | 0.91+ |
first time | QUANTITY | 0.89+ |
three | QUANTITY | 0.87+ |
cribl.io/community | OTHER | 0.87+ |
Intel | ORGANIZATION | 0.87+ |
cribl.cloud | TITLE | 0.86+ |
Datadog | ORGANIZATION | 0.85+ |
S3 | TITLE | 0.84+ |
Cribl stream | TITLE | 0.82+ |
cribl.io/cloud | TITLE | 0.81+ |
Couple of things | QUANTITY | 0.78+ |
two | OTHER | 0.78+ |
episode | QUANTITY | 0.74+ |
AWS Startup Showcase | EVENT | 0.72+ |
lock | TITLE | 0.72+ |
Exabeam | ORGANIZATION | 0.71+ |
Startup Showcase S2 E2 | EVENT | 0.69+ |
season two | QUANTITY | 0.67+ |
Multicloud | TITLE | 0.67+ |
up to one terabyte | QUANTITY | 0.67+ |
Mark Lyons, Dremio | AWS Startup Showcase S2 E2
(upbeat music) >> Hello, everyone and welcome to theCUBE presentation of the AWS startup showcase, data as code. This is season two, episode two of the ongoing series covering the exciting startups from the AWS ecosystem. Here we're talking about operationalizing the data lake. I'm your host, John Furrier, and my guest here is Mark Lyons, VP of product management at Dremio. Great to see you, Mark. Thanks for coming on. >> Hey John, nice to see you again. Thanks for having me. >> Yeah, we were talking before we came on camera here on this showcase we're going to spend the next 20 minutes talking about the new architectures of data lakes and how they expand and scale. But we kind of were reminiscing by the old big data days, and how this really changed. There's a lot of hangovers from (mumbles) kind of fall through, Cloud took over, now we're in a new era and the theme here is data as code. Really highlights that data is now in the developer cycles of operations. So infrastructure is code-led DevOps movement for Cloud programmable infrastructure. Now you got data as code, which is really accelerating DataOps, MLOps, DatabaseOps, and more developer focus. So this is a big part of it. You guys at Dremio have a Cloud platform, query engine and a data tier innovation. Take us through the positioning of Dremio right now. What's the current state of the offering? >> Yeah, sure, so happy to, and thanks for kind of introing into the space that we're headed. I think the world is changing, and databases are changing. So today, Dremio is a full database platform, data lakehouse platform on the Cloud. So we're all about keeping your data in open formats in your Cloud storage, but bringing that full functionality that you would want to access the data, as well as manage the data. All the functionality folks would be used to from NC SQL compatibility, inserts updates, deletes on that data, keeping that data in Parquet files in the iceberg table format, another level of abstraction so that people can access the data in a very efficient way. And going even further than that, what we announced with Dremio Arctic which is in public preview on our Cloud platform, is a full get like experience for the data. So just like you said, data as code, right? We went through waves and source code and infrastructure as code. And now we can treat the data as code, which is amazing. You can have development branches, you can have staging branches, ETL branches, which are separate from production. Developers can do experiments. You can make changes, you can test those changes before you merge back to production and let the consumers see that data. Lots of innovation on the platform, super fast velocity of delivery, and lots of customers adopting it in just in the first month here since we announced Dremio Cloud generally available where the adoption's been amazing. >> Yeah, and I think we're going to dig into the a lot of the architecture, but I want to highlight your point you made about the branching off and taking a branch of Git. This is what developers do, right? The developers use GitHub, Git, they bake branches from code. They build on top of other code. That's open source. This is what's been around for generations. Now for the first time we're seeing data sets being taken out of production to be worked on and coded and tested and even doing look backs or even forward looking analysis. This is data being programmed. This is data as code. This is really, you couldn't get any closer to data as code. >> Yeah. It's all done through metadata by the way. So there's no actual copying of these data sets 'cause in these big data systems, Cloud data lakes and stuff, and these tables are billions of records, trillions of records, super wide, hundreds of columns wide, thousands of columns wide. You have to do this all through metadata operations so you can control what version of the data basically a individual's working with and which version of the data the production systems are seeing because these data sets are too big. You don't want to be moving them. You can't be moving them. You can't be copying them. It's all metadata and manifest files and pointers to basically keep track of what's going on. >> I think this is the most important trend we've seen in a long time, because if you think about what Agile did for developers, okay, speed, DevOps, Cloud scale, now you've got agility in the data side of it where you're basically breaking down the old proprietary, old ways of doing data warehousing, but not killing the functionality of what data warehouses did. Just doing more volume data warehouses where proprietary, not open. They were different use cases. They were single application developers when used data warehouse query, not a lot of volume. But as you get volume, these things are inadequate. And now you've got the new open Agile. Is this Agile data engineering at play here? >> Yeah, I think it totally is. It's bringing it as far forward in as possible. We're talking about making the data engineering process easier and more productive for the data engineer, which ultimately makes the consumers of that data much happier as well as way more experiments can happen. Way more use cases can be tried. If it's not a burden and it doesn't require building a whole new pipeline and defining a schema and adding columns and data types and all this stuff, you can do a lot more with your data much faster. So it's really going to be super impactful to all these businesses out there trying to be data driven, especially when you're looking at data as a code and branching, a branch off, you can de-risk your changes. You're not worried about messing up the production system, messing up that data, having it seen by end user. Some businesses data is their business so that data would be going all the way to a consumer, a third party. And then it gets really scary. There's a lot of risk if you show the wrong credit score to a consumer or you do something like that. So it's really de-risking... >> Even updating machine learning algorithms. So for instance, if the data sets change, you can always be iterating on things like machine learning or learning algorithms. This is kind of new. This is awesome, right? >> I think it's going to change the world because this stuff was so painful to do. The data sets had gotten so much bigger as you know, but we were still doing it in the old way, which was typically moving data around for everyone. It was copying data down, sampling data, moving data, and now we're just basically saying, hey, don't do that anymore. We got to stop moving the data. It doesn't make any sense. >> So I got to ask you Mark, data lakes are growing in popularity. I was originally down on data lakes. I called them data swamps. I didn't think they were going to be as popular because at that time, distributed file systems like Hadoop, and object store in the Cloud were really cool. So what happened between that promise of distributed file systems and object store and data lakes? What made data lakes popular? What made that work in your opinion? >> Yeah, it really comes down to the metadata, which I already mentioned once. But we went through these waves. John you saw we did the EDWs to the data lakes and then the Cloud data warehouses. I think we're at the start of a cycle back to the data lake. And it's because the data lakes this time around with the Apache iceberg table format, with project (mumbles) and what Dremio's working on around metadata, these things aren't going to become data swamps anymore. They're actually going to be functional systems that do inserts updates into leads. You can see all the commits. You can time travel them. And all the files are actually managed and optimized so you have to partition the data. You have to merge small files into larger files. Oh, by the way, this is stuff that all the warehouses have done behind the scenes and all the housekeeping they do, but people weren't really aware of it. And the data lakes the first time around didn't solve all these problems so that those files landing in a distributed file system does become a mess. If you just land JSON, Avro or Parquet files, CSV files into the HDFS, or in S3 compatible, object store doesn't matter, if you're just parking files and you're going to deal with it as schema and read instead of schema and write, you're going to have a mess. If you don't know which tool changed the files, which user deleted a file, updated a file, you will end up with a mess really quickly. So to take care of that, you have to put a table format so everyone's looking at Apache iceberg or the data bricks Delta format, which is an interesting conversation similar to the Parquet and org file format that we saw play out. And then you track the metadata. So you have those manifest files. You know which files change when, which engine, which commit. And you can actually make a functional system that's not going to become a swamp. >> Another trend that's extending on beyond the data lake is other data sources, right? So you have a lot of other data, not just in data lakes so you have to kind of work with that. How do you guys answer the question around some of the mission critical BI dashboards out there on the latency side? A lot of people have been complaining that these mission critical BI dashboards aren't getting the kind of performance as they add more data sources and they try to do more. >> Yeah, that's a great question. Dremio does actually a bunch of interesting things to bring the performance of these systems up because at the end of the day, people want to access their data really quickly. They want the response times of these dashboards to be interactive. Otherwise the data's not interesting if it takes too long to get it. To answer a question, yeah, a couple of things. First of all, from a data source's side, Dremio is very proficient with our Parquet files in an object store, like we just talked about, but it also can access data in other relational systems. So whether that's a Postgres system, whether that's a Teradata system or an Oracle system. That's really useful if you have dimensional data, customer data, not the largest data set in the world, not the fastest moving data set in the world, but you don't want to move it. We can query that where it resides. Bringing in new sources is definitely, we all know that's a key to getting better insights. It's in your data, is joining sources together. And then from a query speed standpoint, there's a lot of things going on here. Everything from kind of Apache, the Apache Avro project, which is in memory format of Parquet and not kind of serialize and de-serialize the data back and forth. As well as what we call reflection, which is basically a re-indexing or pre-computing of the data, but we leave it in Parquet format, in a open format in the customer's account so that you can have aggregates and other things that are really popular in these dashboards pre-computed. So millisecond response, lightning fast, like tricks that a warehouse would do that the warehouses have been doing forever. Right? >> Yeah, more deals coming in. And obviously the architecture we'll get into that now has to handle the growth. And as your customers and practitioners see the volume and the variety and the velocity of the data coming in, how are they adjusting their data strategies to respond to this? Again, Cloud is clearly the answer, not the data warehouse, but what are they doing? What's the strategy adjustment? >> It's interesting when we start talking to folks, I think sometimes it's a really big shift in thinking about data architectures and data strategies when you look at the Dremio approach. It's very different than what most people are doing today around ETL pipelines and then bringing stuff into a warehouse and oh, the warehouse is too overloaded so let's build some cubes and extracts into the next tier of tools to speed up those dashboards for those tools. And Dremio has totally flipped this on a sentence and said, no, let's not do all those things. That's time consuming. It's brittle, it breaks. And actually your agility and the scope of what you can do with your data decreases. You go from all your data and all your data sources to smaller and smaller. We actually call it the perimeter doom and a lot of people look at this and say, yeah, that kind of looks like how we're doing things today. So from a Dremio perspective, it's really about no copy, try to keep as much data in one place, keep it in one open format and less data movement. And that's a very different approach for people. I think they don't realize how much you can accomplish that way. And your latency shrinks down too. Your actual latency from data created to insight is much shorter. And it's not because of the query response time, that latency is mostly because of data movement and copy and all these things. So you really want to shrink your time to insight. It's not about getting a faster query from a few seconds down, it's about changing the architecture. >> The data drift as they say, interesting there. I got to ask you on the personnel side, team side, you got the technical side, you got the non-technical consumers of the data, you got the data science or data engineering is ramping up. We mentioned earlier data engineering being Agile, is a key innovation here. As you got to blend the two personas of technical and non-technical people playing with data, coding with data, we're the bottlenecks in this process today. How can data teams overcome these bottlenecks? >> I think we see a lot of bottlenecks in the process today, a lot of data movement, a lot of change requests, update this dashboard. Oh, well, that dashboard update requires an ETL pipeline update, requires a column to be added to this warehouse. So then you've got these personas, like you said, some more technical, less technical, the data consumers, the data engineers. Well, the data engineers are getting totally overloaded with requests and work. And it's not even super value-add work to the business. It's not really driving big changes in their culture and insights and new new use cases for data. It's turning through kind of small changes, but it's taking too much time. It's taking days, if not weeks for these organizations to manage small changes. And then the data consumers, the less technical folks, they can't get the answers that they want. They're waiting and waiting and waiting and they don't understand why things are so challenging, how things could take so much time. So from a Dremio perspective, it's amazing to watch these organizations unleash their data. Get the data engineers, their productivity up. Stop dealing with some of the last mile ETL and small changes to the data. And Dremio actually says, hey, data consumers, here's a really nice gooey. You don't need to be a SQL expert, well, the tool will write the joints for you. You can click on a column and say, hey, I want to calculate a new field and calculate that field. And it's all done virtually so it's not changing the physical data sets. The actual data engineering team doesn't even really need to care at that point. So you get happier data consumers at the end of the day. They're doing things more self-service. They're learning about the data and the data engineering teams can go do value-add things. They can re-architecture the platform for the future. They can do POCs to test out new technologies that could support new use cases and bring those into the organization. Things that really add value, instead of just churning through backlogs of, hey, can we get a column added or we change... Everyone's doing app development, AB testing, and those developers are king. Those pipelines stream all this data down when the JSON files change. You need agility. And if you don't have that agility, you just get this endless backlog that you never... >> This is data as code in action. You're committing data back into the main brand that's been tested. That's what developers do. So this is really kind of the next step function. I got to put the customer hat on for a second and ask you kind of the pessimist question. Okay, we've had data lakes, I've got data lakes, it's been data lakes around, I got query engines here and there, they're all over the place, what's missing? What's been missing from the architecture to fully realize the potential of a data lakehouse? >> Yeah, I think that's a great question. The customers say exactly that John. They say, "I've got 22 databases, you got to be kidding me. You showed up with another database." Or, hey, let's talk about a Cloud data lake or a data lake. Again, I did the data lake thing. I had a data lake and it wasn't everything I thought it was going to be. >> It was bad. It was data swamp. >> Yeah, so customers really think this way, and you say, well, what's different this time around? Well, the Cloud in the original data lake world, and I'm just going to focus on data lakes, so the original data lake worlds, everything was still direct attached storage, so you had to scale your storage and compute out together. And we built these huge systems. Thousands of thousands of HDFS nodes and stuff. Well, the Cloud brought the separated compute and storage, but data lakes have never seen separated compute and storage until now. We went from the data lake with directed tap storage to the Cloud data warehouse with separated compute and storage. So the Cloud architecture and getting compute and storage separated is a huge shift in the data lake world. And that agility of like, well, I'm only going to apply it, the compute that I need for this question, for this answer right now, and not get 5,000 servers of compute sitting around at some peak moment. Or just 5,000 compute servers because I have five petabytes or 50 petabytes of data that need to be stored in the discs that are attached to them. So I think the Cloud architecture and separating compute and storage is the first thing that's different this time around about data lakes. But then more importantly than that is the metadata tier. Is the data tier and having sufficient metadata to have the functionality that people need on the data lake. Whether that's for governance and compliance standpoints, to actually be able to do a delete on your data lake, or that's for productivity and treating that data as code, like we're talking about today, and being able to time travel it, version it, branch it. And now these data lakes, the data lakes back in the original days were getting to 50 petabytes. Now think about how big these Cloud data lakes could be. Even larger and you can't move that data around so we have to be really intelligent and really smart about the data operations and versioning all that data, knowing which engine touch the data, which person was the last commit and being able to track all that, is ultimately what's going to make this successful. Because if you don't have the governance in place these days with data, the projects are going to fail. >> Yeah, and I think separating the query layer or SQL layer and the data tier is another innovation that you guys have. Also it's a managed Cloud service, Dremio Cloud now. And you got the open source angle too, which is also going to open up more standardization around some of these awesome features like you mentioned the joints, and I think you guys built on top of Parquet and some other cool things. And you got a community developing, so you get the Cloud and community kind of coming together. So it's the real world that is coming to light saying, hey, I need real world applications, not the theory of old school. So what use cases do you see suited for this kind of new way, new architecture, new community, new programability? >> Yeah, I see people doing all sorts of interesting things and I'm sure with what we've introduced with Dremio Arctic and the data is code is going to open up a whole new world of things that we don't even know about today. But generally speaking, we have customers doing very interesting things, very data application things. Like building really high performance data into use cases whether that's a supply chain and manufacturing use case, whether that's a pharma or biotech use case, a banking use case, and really unleashing that data right into an application. We also see a lot of traditional data analytics use cases more in the traditional business intelligence or dashboarding use cases. That stuff is totally achievable, no problems there. But I think the most interesting stuff is companies are really figuring out how to bring that data. When we offer the flexibility that we're talking about, and the agility that we're talking about, you can really start to bring that data back into the apps, into the work streams, into the places where the business gets more value out of it. Not in a dashboard that some person might have access to, or a set of people have access to. So even in the Dremio Cloud announcement, the press release, there was a customer, they're in Europe, it's called Garvis AI and they do AI for supply chains. It's an intelligent application and it's showing customers transparently how they're getting to these predictions. And they stood this all up in a very short period of time, because it's a Cloud product. They don't have to deal with provisioning, management, upgrades. I think they had their stuff going in like 30 minutes or something, like super quick, which is amazing. The data was already there, and a lot of organizations, their data's already in these Cloud storages. And if that's the case... >> If they have data, they're a use case. This is agility. This is agility coming to the data engineering field, making data programmable, enabling the data applications, the data ops for everybody, for coding... >> For everybody. And for so many more use cases at these companies. These data engineering teams, these data platform teams, whether they're in marketing or ad tech or Fiserv or Telco, they have a list. There's a list about a roadmap of use cases that they're waiting to get to. And if they're drowning underwater in the current tooling and barely keeping that alive, and oh, by the way, John, you can't go higher 30 new data engineers tomorrow and bring on the team to get capacity. You have to innovate at the architecture level, to unlock more data use cases because you're not going to go triple your team. That's not possible. >> It's going to unlock a tsunami of value. Because everyone's clogged in the system and it's painful. Right? >> Yeah. >> They've got delays, you've got bottlenecks. you've got people complaining it's hard, scar tissue. So now I think this brings ease of use and speed to the table. >> Yeah. >> I think that's what we're all about, is making the data super easy for everyone. This should be fun and easy, not really painful and really hard and risky. In a lot of these old ways of doing things, there's a lot of risk. You start changing your ETL pipeline. You add a column to the table. All of a sudden, you've got potential risk that things are going to break and you don't even know what's going to break. >> Proprietary, not a lot of volume and usage, and on-premises, open, Cloud, Agile. (John chuckles) Come on, which path? The curtain or the box, what are you going to take? It's a no brainer. >> Which way do you want to go? >> Mark, thanks for coming on theCUBE. Really appreciate it for being part of the AWS startup showcase data as code, great conversation. Data as code is going to enable a next wave of innovation and impact the future of data analytics. Thanks for coming on theCUBE. >> Yeah, thanks John and thanks to the AWS team. A great partnership between AWS and Dremio too. Talk to you soon. >> Keep it right there, more action here on theCUBE. As part of the showcase, stay with us. This is theCUBE, your leader in tech coverage. I'm John Furrier, your host, thanks for watching. (downbeat music)
SUMMARY :
of the AWS startup showcase, data as code. Hey John, nice to see you again. and the theme here is data as code. Lots of innovation on the platform, Now for the first time the production systems are seeing in the data side of it for the data engineer, So for instance, if the data sets change, I think it's going to change the world and object store in the And it's because the data extending on beyond the data lake of the data, but we leave and the variety and the the scope of what you can do I got to ask you on the and the data engineering teams kind of the pessimist question. Again, I did the data lake thing. It was data swamp. and really smart about the data operations and the data tier is another and the data is code is going the data engineering field, and bring on the team to get capacity. Because everyone's clogged in the system to the table. is making the data The curtain or the box, and impact the future of data analytics. Talk to you soon. As part of the showcase, stay with us.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
AWS | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Mark Lyons | PERSON | 0.99+ |
30 minutes | QUANTITY | 0.99+ |
Telco | ORGANIZATION | 0.99+ |
Mark | PERSON | 0.99+ |
50 petabytes | QUANTITY | 0.99+ |
five petabytes | QUANTITY | 0.99+ |
two personas | QUANTITY | 0.99+ |
5,000 servers | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
hundreds of columns | QUANTITY | 0.99+ |
22 databases | QUANTITY | 0.99+ |
Dremio | ORGANIZATION | 0.99+ |
trillions of records | QUANTITY | 0.99+ |
Dremio | PERSON | 0.99+ |
Dremio Arctic | ORGANIZATION | 0.99+ |
Fiserv | ORGANIZATION | 0.99+ |
first time | QUANTITY | 0.98+ |
30 new data engineers | QUANTITY | 0.98+ |
billions of records | QUANTITY | 0.98+ |
thousands of columns | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
Thousands of thousands | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
one place | QUANTITY | 0.97+ |
Oracle | ORGANIZATION | 0.97+ |
Apache | ORGANIZATION | 0.96+ |
S3 | TITLE | 0.96+ |
Git | TITLE | 0.96+ |
Cloud | TITLE | 0.95+ |
Hadoop | TITLE | 0.95+ |
first month | QUANTITY | 0.94+ |
Parquet | TITLE | 0.94+ |
Dremio Cloud | TITLE | 0.91+ |
5,000 compute servers | QUANTITY | 0.91+ |
one | QUANTITY | 0.91+ |
JSON | TITLE | 0.89+ |
First | QUANTITY | 0.89+ |
single application | QUANTITY | 0.89+ |
Garvis | ORGANIZATION | 0.88+ |
GitHub | ORGANIZATION | 0.87+ |
Apache | TITLE | 0.82+ |
episode | QUANTITY | 0.79+ |
Agile | TITLE | 0.77+ |
season two | QUANTITY | 0.74+ |
Agile | ORGANIZATION | 0.69+ |
DevOps | TITLE | 0.67+ |
Startup Showcase S2 E2 | EVENT | 0.66+ |
Teradata | ORGANIZATION | 0.65+ |
theCUBE | ORGANIZATION | 0.64+ |
Javier de la Torre, Carto | AWS Startup Showcase S2 E2
(upbeat music) >> Hello, and welcome to theCUBE's presentation of the a AWS startup showcase, data as code is the theme. This is season two episode two of the ongoing series covering the exciting startups from the AWS ecosystem and we talk about data analytics. I'm your old John Furrier with the cube, and we have Javier De La Torre. who's the founder and chief strategy officer of Carto, which is doing some amazing innovation around geographic information systems or GIS. Javier welcome to the cube for this showcase. >> Thank you. Thank you for having me. >> So, you know, one of the things that you guys are bringing to the table is spatial analytic data that now moves into spatial relations, which is, you know, we know about geofencing. You're seeing more data coming from satellites, ground stations, you name it. Things are coming into the market from a data perspective, that's across the board and geo's one of them GIS systems. This is what you guys are doing in the rise of SQL in particular with spatial. This is a huge new benefit to the world. Can you take a minute to explain what Carto's doing and what spatial SQL is? >> Sure. Yeah. So like you said, like data, obviously we know is growing very fast and as you know now, being leveraged by many organizations in many different ways. There's one part of data, one dimension that is location. We like to say that everything happens somewhere. So therefore everything can be analyzed and understood based on the location. So we like to put an example, if all your neighbors get an alarm in their homes, the likelihood that you will get an alarm increases, right? So that's obvious we are all affected by our surroundings. What is spatial analytics, this type of analytics does is try to uncover those spacial relations so that you can model, you can predict where something is going to happen, or, you know, like, or optimize it, you know, like where else you want it to happen, right? So that's at the core of it. Now, this is something that as an industry has been done for many years, like the GIS or geographic information systems have existed for a long time. But now, and this is what Carto really brings to the table. We're looking at really the marketizing it, so that it's in the hands of any analyst, our vision is that you need to go five years, to a geography school to be able to do this type of spatial analysis. And the way that we want to make that happen is what we call with the rise of a spatial SQL. We add these capabilities around spatial analytics based on the language that is very, very popular for a analysts, which is SQL. So what we do is enables you to do this spatial analysis on top of the well known and well used SQL methods. >> It's interesting the cloud native and the cloud scale wave and now data as code has shown that the old school, the old guard, the old way of doing things, you mentioned data warehousing, okay, as one. BI tools in particular have always been limited. And the scope of the limitation was the environment was different. You have to have domain expertise, rich knowledge of the syntax. Usually it's for an application developer, not for like real time and building it into the CICD pipeline, or just from a workflow standpoint, making it available. The so-called democratization, this is where this connects. And so I got to ask you, what are you most excited about in the innovations at Carto? Can you share some of the things that people might know about or might not know about that's happening at Carto, that takes advantage of this cloud native wave because companies are now on this bandwagon. >> Yeah, no, it is. And cloud native analytics is probably the most disruptive kind of like trend that we've seen over the few years, in our particular space on the spatial it has tremendous effects on the way that we provide our service. So I'd like to kind of highlight four main reasons why cloud analytics, cloud native is super important to us. So the first one is obviously is a scalability, the working with the sizes of data that we work now in terms of location was just not possible or before. So for someone that is performing now analysis on autonomous car, or you're like that has any sensorized GPS on a device and is collecting hundreds of billions of points. If you want to do analysis on that type of data, cloud native allows you to do that in a scalable way, but it also is very cost effective. That is something that you'll see very quickly when your data grows a lot, which is that this computing storage separation, the idea that is store your data at cloud prices, but then use them with these data warehouses that we work in this private, makes for a very, very cost effective solution. But then, you know, there is other two, obviously one of them being SQL and spatial SQL that like means we like to say that SQL is becoming the lingua franca for analytics. So it's used by many products that you can connect through the usage of SQL, but I think like you coming towards why I think it's even more interesting it's like, in the cloud the concept like we all are serving, we are all living in the same infrastructure enables us that we can distribute a spatial data sets to a customer that they can join it on their database on SQL without having to move the data from one another, like in the case of Redshift or Amazon Redshift car connects and you using something called a spectrum, we can connect live to data that is stored on S3. And I think that is going to disrupt a lot the way that we think about data distributions and how cost effective it is. I think, it has a lot of your like potential on it. And in that sense what Carto is providing on top of it in the format of formats like parquet, which is a very popular with big data format. We adding geo parquet, we are specializing this big data technology for doing the spatial analysis. And that to me it is very exciting because it's putting some of the best tools at the hands of doing the space analytics for something that we're not able to do before. So to me, this is one area that I'm very, very excited. >> Well, I want to back up for a second. So you mentioned parquet and the standards around that format. And also you mentioned Redshift, so let me get this right. So you just saying that you can connect into Redshift. So I'm a customer and I have Redshift I'm using, I got my S3, I'm using Redshift for analysis. You're saying you can plug right into Redshift. >> Yes. And this is a very, very, very important part because what Carto does is leverage Redshift computing infrastructure to essentially kind of like do all the analysis. So what we do is we bring a spatial analysis where the data is, where Redshift is versus in the past, what we will do is take the data where the analysis was and that sense, it's at the core of cloud native. >> Okay. This is really where I see the exciting shift where data as code now becomes a reality is that you bring the... It redefines architecture, the script is flipped. The architecture has been redefined. You're making the data move to the environments that needs to move when it has to, if it doesn't have to move you bring compute to it. So you're seeing new kinds of use cases. So I have to ask you on the use cases and examples for Carto AWS customers with spatial analytics, what are some of the examples on how your clients are using cloud native spatial analytics or Carto? >> Yeah. So one, for example, that we've seen a lot, on the AWS ecosystem, obviously because of its suites and its position. We work together with another service in the AWS ecosystem called Amazon Location. So that actually provides you access to maps and SDKs for navigation. So it means that you are like a company that is delivering food or any other goods in the city. We have like hundreds or thousands of drivers around the city moving, doing all these deliveries. And each of these drivers they have an app and they're collecting actively their location, their position, right? So you get all the data and then it gets stored on something like a Redshift data cluster on S3 as well. There's different architectures in there, but now you essentially have like a full log of the activity that is happening on the ground from your business. So what Carto does on top of that data is you connect your data into Carto. And now you can do analysis, for example, for finding out where you user may be placed, another distribution center, you know, for optimizing your delivering routes, or like if you're in the restaurant business where you might want to have a new dark kitchen, right? So all this type of analysis based on, since I know where you're doing your operations, I can post analyze the data and then provide you a different way that you can think about solving your operation. So that's an example of a great use case that we're seeing right now. >> Talk to me about about the traditional BI tools out there, because you mentioned earlier, they lack the specific capabilities. You guys bring that to the table. What about the scalability limitations? Can you talk about where that is? Is there limitations there, obviously, if they don't have the capabilities, you can't scale that's one, but you know, as you start plugging into Redshift, scale and performance matters, what's the issue there? Can you unpack that a little bit real quick? >> Yeah. It goes back to the particulars of the spacial data, location data, like in the use case, like I was describing you very quickly are going to end up with really a lot of your like terabytes, if not petabytes of data very quickly, if you're start aggregating all this data, because it gets created by sensors. So volumes in our world kind of tends to grow a lot now. So when you work with BI tools, there's two things that you have to take in consideration. BI tools are great for seeing things like for example, if all you want to see is where your customers are, a BI tool is great. Seeing, creating a map and seeing your customers. That's totally in the world of BI. But if you want to understand why your customers are there, or where else could they be, you're going to need to perform what we call a spatial analysis. You're going to have to create a spatial model. You're going to have to, and for that BI tools will not give you that that's one side, the other it talks about the volumes that I was describing. Most of these BI tools can handle certain aggregations. Like, for example, if you are reading, if you're connecting your, let's say 10 billion data set to a BI tool, the BI tool will do some aggregations because you cannot display 10,000 rows on a BI tool and that's okay, you get aggregations and that works. But when it comes to a map, you cannot aggregate the data on the map. You actually want to see all the data on the map, and that's what Carto provides you. It allows you to make maps that sees all the data, not just aggregated by county or aggregated by other kind of like area, you see all your data on the map. >> You know, what's interesting is that location based service has been around for a long time. You know, when mobile started even hitting the scene, you saw it get better mashups, Google Maps, all this Google API mashups, things like that. You know, developers are used to it, but they could never get to the promised land on the big data side, because they just didn't have the compute. But now you add in geofencing, geo information, you now have access to this new edge like data, right? So I have to ask you on the mobile side, are you guys working with any 5G or edge providers? Because I can almost imagine that the spatial equation gets more complicated and more data full when you start blowing out edge data, like with 5G, you got more, more things happening at the edge. It's only going to fill in more data points. Can you share that's how that use case is going with mobile, mobile carriers or 5G? >> Yeah, that's totally, yeah. It's totally the case. Well, first, even before, you know, like we are there, we actually helping a lot of telcos on actually planning the 5G deployment. Where do you place your antennas is a very, very important topic when you're like talking about 5G. Because you know, like 5G networks require a lot of density. So it's a lot about like, okay, where do I start deploying my infrastructure to ensure the customers like meet, like have the best service and the places where I want to kind of like go first So like... >> You mean like the RF maps, like understanding how RF propagates. >> Well, that's one signal, but the other is like, imagine that your telco is more interested on, you know, let's say on a certain kind of like consumer profile, like young people that are using the one type of service. Well, we know where these demographics kind of lives. So you might want to start kind of like deploying your 5G in those areas, right. Versus if you go to more commercial and more kind of like residential areas, there might be other demographics. So that's one part around market analysis. Then the second part is once these 5G networks are in place, you're right. I mean, one of the premises that kind of like these news technologies give us is because the network is much smarter. You can have all these edge cases, there's much more location data that can be collected. So what we see now is a rise on the amount of what we call telemetry. That for example, the IOT space can make around location. And that's now enabled because of 5G. So I think 5G is going to be one of those trends that are going to make like more and more data coming into, I mean, more location, data available for analysis. >> So how does that, I mean, this is a great conversation because everyone can realize they're at a stadium and they see multiple bars but they can't get bandwidth. So they got a back haul problem or not enough signal. Everyone knows when they're driving their car, they know they can relate to the consumer side of it. So I get how the spatial data grows. What's the impact to Carto and specifically the cloud, because if you have more data coming in, you need the actionable insight. So I can see the use case, oh, put the antenna here. That's an actionable business decision, more content, more revenue, more happy customers, but where else is the impact to you guys and the spatial piece of it? >> Yeah. Well, I mean like there's many, many factors, right? So one of them, for example, on the telco, one of the things where we realize impact is that it gives the visibility to the operator, for example, around the quality of service. Like, okay, are my customers getting the quality of services where I want? Or like you said, like if there sitting outside a concert the quality of service in one particular area is dropping very fast. So the idea of like being able to now in real time, kind of like detect location issues, like I'm having an issue in this place. That means that then now I can act, I can drive up bandwidth, put more capacity et cetera right. So I think the biggest impact that we are seeing we are going to see on the upcoming years is that like more and more use cases going towards real time. So where, like before it was like, well, now that it has happened, I'm going to analyze it. I'm going to look at, you know, like how I could do better next time towards a more of like an industry where Carto ourselves, we are embedded in more real time type of, you know, like analytics where it's okay, if this happens, then do that, right. So it's going to be more personalized at the level that like in the code environment, it has to be art of a full kind of like pipeline kind of like type of analysis. That's already programmatically prepared to act on real time. >> That's great and it's a good segue. My next question, as more and more companies adopt cloud native analytics, what trends are you seeing out of the key to watch? Obviously you're seeing more developers coming on site, on the scene, open sources growing, what's the big cloud native analytics trends for Carto and geographic information. >> Yeah. So I think you know like the, we were talking before the cloud native now is unstoppable, but one of the things that we are seeing that is still needs to be developed and we are seeing progress is around a standardization, for example, around like data sets that are provided by different providers. What I mean with that is like, you as an organization, you're going to be responsible for your data like that you create on your cloud, right. On S3, or, you know and then you going to have a competing engine, like Redshift and you're going to have all that set up, but then you also going to have to think about like, okay, how do I ingest data from third party providers that are important for my analysis? So for example, Carto provides a lot of demographics, human mobility. we aggregate and clean up and prepare lot of spacial data so that we can then enrich your business. So for us, how we deliver that into your cloud native solution is a very important factor. And we haven't seen yet enough standardization around that. And that's one of the things, what we are pushing, you know, with the concept of geo Parquet of standardizing that body. That's one, then there is another, this is more what I like to say that you know, we are helping companies figure out their own geographies. What we mean by that is like most companies, when they start thinking about like how they interact, on the space, on the location, some of them will work like by zip codes and other by cities, they organize their operations based on a geography in a way, or technically what we call a geographic support system. Well, nowadays, like the most advance companies are defining their geographies in a continuous spectrum in what we call global grid system or spatial indexes that allows them to understand the business, not just as a set of regions, but as a continuous space. And that is now possible because of the technologies that we are introducing around spatial indexes at the cloud native infrastructure. And it provides a great a way to match data with resources and operate at scale. To me those two trends are going to be like very, very important because of the capabilities that cloud native brings to our spatial industry. >> So it changes the operation. So it's data as ops, data as code, is data ops, like infrastructures code means cloud DevOps. So I got to ask you because that's cool. Spatial index is a whole another way to think of it, rather than you go hyper local, super local, you get local zones for AWS and regions. Things are getting down to the granular levels I see that. So I have to ask you, what does data as code mean to you and what does it mean to Carto? Because you're kind of teasing at this new way because it's redefining the operation, the data operations, data engineering. So data as code is real. What does that mean to you? >> No, I think we already seeing it happening to me and to Carto what I will describe data as code is when an organization has moved from doing an analysis after the fact, like where they're like post kind of like analysis in a way to where they're actually kind of like putting analytics on their operational cycle. So then they need to really code it. They need to make these analysis, put them and insert them into the architecture bus, if you want to say of the organization. So if I get a customer, happens to be in this location, I'm going to trigger that and then this is going to do that. Or if this happens, I'm need to open up. And this is where if an organization is going to react in more real time, and we know that organizations need to drive in that direction, the only way that they can make that happen is if they operationalize analytics on their daily operations. And that can only happen with data as code. >> Yeah. And that's interesting. Look at ML ops, AI ops, people talk about that. This is data, so developers meets operations, that's the cloud, data meets code that's operations, that's data business. >> You got it. And add to that, the spacial with Carto and we go it. >> Yeah, because every piece of data now is important. And the spatial's key real quick before we close out, what is the index thing? Explain the benefit real quick of a spatial index. >> Yes. So the spatial index is well everybody can understand how we organize societies politically, right? Our countries, you have like states and then you have like counties and you have all these different kind, what we call administrative boundaries, right? That's a way that we organize information too, right? A spatial index is when you divide the world, not in administrative boundaries, but you actually make a grid. Imagine that you just essentially make a grid of the world. right? And you make that grid so that in every cell you can then split it into, let's say for example, four more cells. So you now have like an organization. You split the world in a grid that you can have multiple resolutions think like Google maps when you see the entire world, but you can zoom in and you end up seeing, you know, like one particular place, so that's one thing. So what a spatial indexes allows you is to technically put, you know like your location, not based coordinate, but actually on one grid place on an index. And we use that then later to correlate, let's say your data with someone else data, as we can use what we call this spatial indexes to do joints very, very fast and we can do a lot of operations with it. So it is a new way to do spatial computing based on this type of indexes, but for more than anything for an organization, what spatial index allows is that you don't need to work on zip codes or in boundaries on artificial boundaries. I mean, your customer doesn't change because he goes from this place to the road, to the other side of the road, this is the same place. It's an arbitrary in location. It's a spatial index break out all of that. You're like you break with your zip codes, you break. And you essentially have a continuous geography, that actually is a much closer look up to the reality. >> It's like the forest and the trees and the bark of the tree. (Javier laughing) You can see everything. >> That's it, you can get a look at everything. >> Javi, great to have you on. In real quick closing give a quick plug for the company, summarize what you do, what you're looking into, how many people you got, when you're hiring, what's the key goals for the company? >> Yeah, sure. So Carto is a company, now we are around 200 people. Our vision is that spatial analytics is something that every organization should do. So we really try to enable organizations with the best data and analysis around spatial. And we do all that cloud native on top of your data warehouse. So what we are really in enabling these organizations is to take that cloud native approach that they're already embracing it also to spatial analysis. >> Javi, founder, chief strategy officer for Carto. Great to have you on data as code, all data's real, all data has impact, operational impact with data is the new big trend. Thanks for coming on and sharing the company story and all your key innovations. Thank you. >> Thanks to you. >> Okay. This is the startup showcase. Data as code, season two episode two of the ongoing series. Every episode will explore new topics and new exciting companies pioneering this next cloud native wave of innovation. I'm John Furrier, your host of theCUBE. Thanks for watching. (upbeat music)
SUMMARY :
data as code is the theme. Thank you for having me. one of the things that you guys the likelihood that you will shown that the old school, products that you can connect So you just saying that you like do all the analysis. So I have to ask you on the use cases So it means that you are like a company You guys bring that to the table. So when you work with BI tools, So I have to ask you on the mobile side, and the places where I want You mean like the RF maps, on the amount of what we call telemetry. So I can see the use case, I'm going to look at, you know, out of the key to watch? that you create on your cloud, right. So I got to ask you because that's cool. and to Carto what I will operations, that's the cloud, And add to that, the spacial And the spatial's key real is to technically put, you and the bark of the tree. That's it, you can Javi, great to have you on. is to take that cloud native approach Great to have you on data and new exciting companies pioneering
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Javier De La Torre | PERSON | 0.99+ |
Javi | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Carto | ORGANIZATION | 0.99+ |
10,000 rows | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Javier | PERSON | 0.99+ |
five years | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
one part | QUANTITY | 0.99+ |
Redshift | TITLE | 0.99+ |
SQL | TITLE | 0.99+ |
second part | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
two | QUANTITY | 0.98+ |
one side | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
Google maps | TITLE | 0.97+ |
5G | ORGANIZATION | 0.97+ |
around 200 people | QUANTITY | 0.97+ |
one dimension | QUANTITY | 0.97+ |
Google Maps | TITLE | 0.97+ |
one signal | QUANTITY | 0.97+ |
Carto | PERSON | 0.96+ |
two trends | QUANTITY | 0.96+ |
telco | ORGANIZATION | 0.95+ |
one area | QUANTITY | 0.95+ |
Javier de la Torre, Carto | PERSON | 0.94+ |
four more cells | QUANTITY | 0.92+ |
10 billion data | QUANTITY | 0.92+ |
first one | QUANTITY | 0.91+ |
hundreds of | QUANTITY | 0.9+ |
one thing | QUANTITY | 0.9+ |
S3 | TITLE | 0.89+ |
parquet | TITLE | 0.84+ |
Carto | TITLE | 0.83+ |
season two | QUANTITY | 0.82+ |
petabytes | QUANTITY | 0.77+ |
billions of points | QUANTITY | 0.76+ |
Redshift | ORGANIZATION | 0.76+ |
one grid | QUANTITY | 0.75+ |
episode | QUANTITY | 0.75+ |
Gian Merlino, Imply.io | AWS Startup Showcase S2 E2
(upbeat music) >> Hello, and welcome to theCUBE's presentation of the AWS Startup Showcase: Data as Code. This is Season 2, Episode 2 of the ongoing SaaS covering exciting startups from the AWS ecosystem and we're going to talk about the future of enterprise data analytics. I'm your host, John Furrier and today we're joined by Gian Merlino CTO and co-founder of Imply.io. Welcome to theCUBE. >> Hey, thanks for having me. >> Building analytics apps with Apache Druid and Imply is what the focus of this talk is and your company being showcased today. So thanks for coming on. You guys have been in the streaming data large scale for many, many years of pioneer going back. This past decade has been the key focus. Druid's unique position in that market has been key, you guys been empowering it. Take a minute to explain what you guys are doing over there at Imply. >> Yeah, for sure. So I guess to talk about Imply, I'll talk about Druid first. Imply is a open source based company and Apache Druid is the open source project that the Imply product's built around. So what Druid's all about is it's a database to power analytical applications. And there's a couple things I want to talk about there. The first off is, is why do we need that? And the second is why are we good at, and I'll just a little flavor of both. So why do we need database to power analytical apps? It's the same reason we need databases to power transactional apps. I mean, the requirements of these applications are different analytical applications, apps where you have tons of data coming in, you have lots of different people wanting to interact with that data, see what's happening both real time and historical. The requirements of that kind of application have sort of given rise to a new kind of database that Druid is one example of. There's others, of course, out there in both the open source and non open source world. And what makes Druid really good at it is, people often say what is Druid's big secret? How is it so good? Why is it so fast? And I never know what to say to that. I always sort of go to, well it's just getting all the little details right. It's a lot of pieces that individually need to be engineered, you build up software in layers, you build up a database in layers, just like any other piece of software. And to have really high performance and to do really well at a specific purpose, you kind of have to get each layer right and have each layer have as little overhead as possible. And so just a lot of kind of nitty gritty engineering work. >> What's interesting about the trends over the past 10 years in particular, maybe you can go back 10, 15 years is state of the art database was, stream a bunch of data put it into a pile, index it, interrogate it, get some reports, pretty basic stuff and then all of a sudden now you have with cloud, thousands of databases out there, potentially hundreds of databases living in the wild. So now data with Kafka and Kinesis, these kinds of technologies streaming data's happening in real time so you don't have time to put it in a pile or index it. You want real time analytics. And so perhaps whether they're mobile app, Instagrams of the world, this is now what people want in the enterprise. You guys are the heart of this. Can you talk about that dynamic of getting data quickly at scale? >> So our thinking is that actually both things matter. Realtime data matters but also historical context matters. And the best way to get historical context out of data is to put it in a pile, index it, so to speak, and then the best way to get realtime context to what's happening right now is to be able to operate on these streams. And so one of the things that we do in Druid, I wish I had more time to talk about it but one of the things that we do in Druid is we kind of integrate this real time processing and this historical processing. So we actually have a system that we call the historical system that does what you're saying, take all this data, put in a pile, index it for all your historical data. And we have a system that we call the realtime system that is pulling data in from things like Kafka, Kinesis, getting data pushed into it as the case may be. And this system is responsible for all the data that's recent, maybe the last hour or two of data will be handled by this system and then the older stuff handled by historical system. And our query layer blends these two together seamlessly so a user never needs to think about whether they're querying realtime data or historical data. It's presented as a blended view. >> It's interesting and you know a lot of the people just say, Hey, I don't really have the expertise, and now they're trying to learn it so their default was throw into a data lake. So that brings back that historical. So the rise of the data lake, you're seeing Databricks and others out there doing very well with the data lakes. How do you guys fit into that 'cause that makes it a lot of sense too cause that looks like historical information? >> So data lakes are great technology. We love that kind of stuff. I would say that a really popular pattern, with Druid there's actually two very popular patterns. One is, I would say streaming forward. So stream focus where you connect up to something like Kafka and you load data to stream and then we will actually take that data, we'll store all the historical data that came from the stream and instead of blend those two together. And another other pattern that's also very common is the data lake pattern. So you have a data lake and then you're sort of mirroring that data from the data lake into Druid. This is really common when you have a data lake that you want to be able to build an application on top of, you want to say I have this data in the data lake, I have my table, I want to build an application that has hundreds of people using it, that has really fast response time, that is always online. And so when I mirror that data into Druid and then build my app on top of that. >> Gian take me through the progression of the maturity cycle here. As you look back even a few years, the pioneers and the hardcore streaming data using data analytics at scale that you guys are doing with Druid was really a few percentage of the population doing that. And then as the hyperscale became mainstream, it's now in the enterprise, how stable is it? What's the current state of the art relative to the stability and adoption of the techniques that you guys are seeing? >> I think what we're seeing right now at this stage in the game, and this is something that we kind of see at the commercial side of Imply, what we're seeing at this stage of the game is that these kinds of realization that you actually can get a lot of value out of data by building interactive apps around it and by allowing people to kind of slice and dice it and play with it and just kind of getting out there to everybody, that there is a lot of value here and that it is actually very feasible to do with current technology. So I've been working on this problem, just in my own career for the past decade, 10 years ago where we were is even the most high tech of tech companies were like, well, I could sort of see the value. It seems like it might be difficult. And we're kind of getting from there to the high tech companies realizing that it is valuable and it is very doable. And I think that was something there was a tipping point that I saw a few years ago when these Druid and database like really started to blow up. And I think now we're seeing that beyond sort of the high tech companies, which is great to see. >> And a lot of people see the value of the data and they see the application as data as code means the application developers really want to have that functionality. Can you share the roadmap for the next 12 months for you guys on what's coming next? What's coming around the corner? >> Yeah, for sure. I mentioned during the Apache open source community, different products we're one member of that community, very prominent one but one member so I'll talk a bit about what we're doing for the Druid project as part of our effort to make Druid better and take it to the next level. And then I'll talk about some of the stuff we're doing on the, I guess, the Druid sort of commercial side. So on the Druid side, stuff that we're doing to make Druid better, take it to the next level, the big thing is something that we really started writing about a few weeks ago, the multi-stage query engine that we're working on, a new multi-stage query engine. If you're interested, the full details are on blog on our website and also on GitHub on Apache Druid GitHub, but short version is Druid's. We're sort of extending Druid's Query engine to support more and varied kinds of queries with a focus on sort of reporting queries, more complex queries. Druid's core query engine has classically been extremely good at doing rapid fire queries very quickly, so think thousands of queries per second where each query is maybe something that involves a filter in a group eye like a relatively straightforward query but we're just doing thousands of them constantly. Historically folks have not reached for technologies like Druid is, really complex and a thousand line sequel queries, complex supporting needs. Although people really do need to do both interactive stuff and complex stuff on the same dataset and so that's why we're building out these capabilities in Druid. And then on the implied commercial side, the big effort for this year is Polaris which is our cloud based Druid offering. >> Talk about the relationship between Druid and Imply? Share with the folks out there how that works. >> So Druid is, like I mentioned before, it's Apache Druid so it's a community based project. It's not a project that is owned by Imply, some open source projects are sort of owned or sponsored by a particular organization. Druid is not, Druid is an independent project. Imply is the biggest contributor to Druid. So the imply engineering team is contributing tons of stuff constantly and we're really putting a lot of the work in to improve Druid although it is a community effort. >> You guys are launching a new SaaS service on AWS. Can you tell me about what that's happening there, what it's all about? >> Yeah, so we actually launched that a couple weeks ago. It's called Polaris. It's very cool. So historically there's been two ways, you can either get started with Apache Druid, it's open source, you install it yourself, or you can get started with Imply Enterprise which is our enterprise offering. And these are the two ways you can get started historically. One of the issues of getting started with Apache Druid is that it is a very complicated distributed database. It's simple enough to run on a single server but once you want to scale things out, once you get all these things set up, you may want someone to take some of that operational burden off your hands. And on the Imply Enterprise side, it says right there in the name, it's enterprise product. It's something that may take a little bit of time to get started with. It's not something you can just roll up with a credit card and sign up for. So Polaris is really about of having a cloud product that's sort of designed to be really easy to get started with, really self-service that kind of stuff. So kind of providing a really nice getting started experience that does take that maintenance burden and operational burden away from you but is also sort of as easy to get started with as something that's database would be. >> So a more developer friendly than from an onboarding standpoint, classic. >> Exactly. Much more developer friendly is what we're going for with that product. >> So take me through the state of the art data as code in your mind 'cause infrastructure is code, DevOps has been awesome, that's cloud scale, we've seen that. Data as Code is a term we coined but means data's in the developer process. How do you see data being integrated into the workflow for developers in the future? >> Great question. I mean all kinds of ways. Part of the reason that, I kind of alluded to this earlier, building analytical applications, building applications based on data and based on letting people do analysis, how valuable it is and I guess to develop in that context there's kind of two big ways that we sort of see these things getting pushed out. One is developers building apps for other people to use. So think like, I want to build something like Google analytics, I want to build something that clicks my web traffic and then lets the marketing team slice and dice through it and make decisions about how well the marketing's doing. You can build something like that with databases like Druid and products like what we're having in Imply. I guess the other way is things that are actually helping developers do their own job. So kind of like use your own product or use it for yourself. And in this world, you kind of have things like... So going beyond what I think my favorite use case, I'll just talk about one. My favorite use case is so I'm really into performance, I spend the last 10 years of my life working on high performance database so obviously I'm into this kind of stuff. I love when people use our product to help make their own products faster. So this concept of performance monitoring and performance management for applications. One thing that I've seen some of our customers do and some of our users do that I really love is when you kind of take that performance data of your own app, as far as it can possibly go take it to the next level. I think the basic level of using performance data is I collect performance data from my application deployed out there in the world and I can just use it for monitoring. I can say, okay my response times are getting high in this region, maybe there's something wrong with that region. One of the very original use cases for Druid was that Netflix doing performance analysis, performance analysis more exciting than monitoring because you're not just understanding that there's a performance, is good or bad in whatever region sort of getting very fine grain. You're saying in this region, on this server rack for these devices, I'm seeing a degradation or I'm seeing a increase. You can see things like Apple just rolled out a new version of iOS and on that new version of iOS, my app is performing worse than the older version. And even though not many devices are on that new version yet I can kind of see that because I have the ability to get really deep in the data and then I can start slicing nice that more. I can say for those new iOS people, is it all iOS devices? Is it just the iPhone? Is it just the iPad? And that kind of stuff is just one example but it's an example that I really like. >> It's kind of like the data about the data was always good to have context, you're like data analytics for data analytics to see how it's working at scale. This is interesting because now you're bringing up the classic finding the needle in the haystack of needles, so to speak where you have so much data out there like edge cases, edge computing, for instance, you have devices sending data off. There's so much data coming in, the scale is a big issue. This is kind of where you guys seem to be a nice fit for, large scale data ingestion, large scaled data management, large scale data insights kind of all rolled in to one. Is that kind of-? >> Yeah, for sure. One of the things that we knew we had to do with Druid was we were building it for the sort of internet age and so we knew it had to scale well. So the original use case for Druid, the very first one that we ended up building for, the reason we build in the first place is because that original use case had massive scale and we struggled finding something, we were literally trying to do what we see people doing now which is we're trying to build an app on a massive data set and we're struggling to do it. And so we knew it had to scale to massive data sets. And so that's a little flavor of kind know how that works is, like I was mentioning earlier this, this realtime system and historical system, the realtime system is scalable, it's scalable out if you're reading from Kafka, we scale out just like any other Kafka consumer. And then the historical system is all based on what we call segments which are these files that has a few million rows per file. And a cluster is really big, might have thousands of servers, millions of segments, but it's a design that is kind of, it's a design that does scale to these multi-trillion road tables. >> It's interesting, you go back when you probably started, you had Twitter, Netflix, Facebook, I mean a handful of companies that were at the scale. Now, the trend is you're on this wave where those hyperscalers and, or these unique huge scale app companies are now mainstream enterprise. So as you guys roll out the enterprise version of building analytics and applications, which Druid and Imply, they got to going to get religion on this. And I think it's not hard because it's distributed computing which they're used to. So how is that enterprise transition going because I can imagine people would want it and are just kicking the tires or learning and then trying to put it into action. How are you seeing the adoption of the enterprise piece of it? >> The thing that's driving the interest is for sure doing more and more stuff on the internet because anything that happens on the internet whether it's apps or web based, there's more and more happening there and anything that is connected to the internet, anything that's serving customers on the internet, it's going to generate an absolute mountain of data. And the only question is not if you're going to have that much data, you do if you're doing anything on the internet, the only question is what are you going to do with it? So that's I think what drives the interest, is people want to try to get value out of this. And then what drives the actual adoption is I think, I don't want to necessarily talk about specific folks but within every industry I would say there's people that are leaders, there's organizations that are leaders, teams that are leaders, what drives a lot of interest is seeing someone in your own industry that has adopted new technology and has gotten a lot of value out of it. So a big part of what we do at Imply is that identify those leaders, work with them and then you can talk about how it's helped them in their business. And then also I guess the classic enterprise thing, what they're looking for is a sense of stability, a sense of supportability, a sense of robustness and this is something that comes with maturity. I think that the super high tech companies are comfortable using some open source software that's rolled off the presses a few months ago; he big enterprises are looking for something that has corporate backing, they're looking for something that's been around for a while and I think that Druid technologies like it are breaching that little maturity right now. >> It's interesting that supply chain has come up in the software side. That conversation is a lot now, you're hearing about open source being great, but in the cloud scale, you can get the data in there to identify opportunities and also potentially vulnerabilities is big discussion. Question for you on the cloud native side, how do you see cloud native, cloud scale with services like serverless Lambda, edge merging, it's easier to get into the cloud scale. How do you see the enterprise being hardened out with Druid and Imply? >> I think the cloud stuff is great, we love using it to build all of our own stuff, our product is of course built on other cloud technologies and I think these technologies built on each other, you sort of have like I mentioned earlier, all software is built in layers and cloud architecture is the same thing. What we see ourselves as doing is we're building the next layer of that stack. So we're building the analytics database layer. You saw when people first started doing these in public cloud, the very first two services that came out you can get a virtual machine and you can store some data and you can retrieve that data but there's no real analytics on it, there's just kind of storage and retrieval. And then as time goes on higher and higher levels get built out delivering more and more value and then the levels mature as they go up. And so the the bottom of layers are incredibly mature, the top most layers are cutting edge and there's a kind of a maturity gradient between those two. And so what we're doing is we're building out one of those layers. >> Awesome extraction layers, faster performance, great stuff. Final question for you, Gian, what's your vision for the future? How do you Imply and Druid it going? What's it look like five years from now? >> I think that for sure it seems like that there's two big trends that are happening in the world and it's going to sound a little bit self serving for me to say it but I believe what we're doing here says, I'm here 'cause I believe it, I believe in open source and I believe in cloud stuff. That's why I'm really excited that what we're doing is we're building a great cloud product based on a great open source project. I think that's the kind of company that I would want to buy from if I wasn't at this company and I was just building something, I would want to buy a great cloud product that's backed by a great open source project. So I think the kind of the way I see the industry going, the way I see us going and I think would be a great place to end up just kind of as an engineering world, as an industry is a lot of these really great open source projects doing things like what Kubernetes doing containers, we're doing with analytics et cetera. And then really first class really well done cloud versions of each one of them and so you can kind of choose, do you want to get down and dirty with the open source or do you want to choose just kind of have the abstraction of the cloud. >> That's awesome. Cloud scale, cloud flexibility, community getting down and dirty open source, the best of both worlds. Great solution. Goin, thanks for coming on and thanks for sharing here in the Showcase. Thanks for coming on theCUBE. >> Thank you too. >> Okay, this is theCUBE Showcase Season 2, Episode 2. I'm John Furrier, your host. Data as Code is the theme of this episode. Thanks for watching. (upbeat music)
SUMMARY :
of the AWS Startup Showcase: Data as Code. Take a minute to explain what you guys are And the second is why are we good at, Instagrams of the world, And so one of the things know a lot of the people data that came from the of the art relative to the that beyond sort of the the next 12 months for you So on the Druid side, Talk about the relationship Imply is the biggest contributor to Druid. Can you tell me about what And on the Imply Enterprise side, So a more developer friendly than from we're going for with that product. means data's in the developer process. I have the ability to get It's kind of like the One of the things that of the enterprise piece of it? I guess the classic enterprise thing, but in the cloud scale, And so the the bottom of How do you Imply and Druid it going? and so you can kind of choose, here in the Showcase. Data as Code is the theme of this episode.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John Furrier | PERSON | 0.99+ |
Gian Merlino | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
two ways | QUANTITY | 0.99+ |
iOS | TITLE | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
each layer | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
millions | QUANTITY | 0.99+ |
Druid | TITLE | 0.99+ |
iPad | COMMERCIAL_ITEM | 0.99+ |
first | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Imply | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
each query | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.98+ |
ORGANIZATION | 0.98+ | |
ORGANIZATION | 0.98+ | |
Gian | PERSON | 0.98+ |
Kafka | TITLE | 0.98+ |
Imply.io | ORGANIZATION | 0.97+ |
one example | QUANTITY | 0.97+ |
first two services | QUANTITY | 0.97+ |
hundreds of people | QUANTITY | 0.97+ |
each one | QUANTITY | 0.97+ |
two big ways | QUANTITY | 0.97+ |
10 years ago | DATE | 0.96+ |
past decade | DATE | 0.96+ |
first class | QUANTITY | 0.96+ |
one member | QUANTITY | 0.96+ |
Lambda | TITLE | 0.96+ |
two big trends | QUANTITY | 0.96+ |
Apache | ORGANIZATION | 0.95+ |
both worlds | QUANTITY | 0.95+ |
Polaris | ORGANIZATION | 0.95+ |
one member | QUANTITY | 0.95+ |
today | DATE | 0.95+ |
AWS Heroes Panel | AWS Startup Showcase S2 E2 | Data as Code
>>Hi, everyone. Welcome to the cubes presentation of the AWS startup showcase the theme. This episode is data as code, and this is season two, episode two of the ongoing series covering exciting startups from the ecosystem in cloud and the future of data analytics. I'm your host, John furry. You're getting great featured panel here with AWS heroes, Lynn blankets, the CEO of Lindbergh Lega consulting, Peter Hanson's, founder of cloud Cedar and Alex debris, principal of debris advisory. Great to see all of you here and, uh, remotely and look forward to see you in person at the next re-invent or other event. >>Thanks for having us. >>So Lynn, you're doing a lot of work in healthcare, Peter you're in the middle of all the action as data as code Alex. You're in deep on the databases. We've got a good round up of, of topics here ranging from healthcare to getting under the hood on databases. So as we'll start with you, what are you working on right now? What trends do you see in the database space? >>Yeah, sure. So I do, uh, I do a lot of consulting work working with different people and, you know, often with, with dynamo DB or, or just general serverless technology type stuff. Um, if you want to talk about trends that I'm seeing right now, I would say trends you're seeing as a lot, just more serverless native databases or cloud native databases where you're seeing these cool databases come out that really take advantage of, uh, this new cloud environment, right? Where you have scalability, you have plasticity of the clouds. So you're not having, you know, instant space environments anymore. You're paying for capacity, you're paying for throughput. You're able to scale up and down. You're not managing individual instances. So a lot of cool stuff that we're seeing, you know, um, with this new generation of, of infrastructure and in particular database is taking advantage of this, this new cloud world >>And really lot deep into the database side in terms of like cloud native impact, diversity of database types, when to use certain databases that also a big deal. >>Yeah, absolutely. I like, I totally agree. I love seeing the different types of databases and, you know, AWS has this whole, uh, purpose-built database strategy. And I think that, that makes a lot of sense. Um, you know, I want to go too far with it. I would, I would more think about purpose-built categories and things like that, you know, specialize in an OLTB database within your, within your organization, whether that's dynamo DB or document DB or relational database Aurora or something like that. But then also choose some sort of analytics database, you know, if it's drew it or Redshift or Athena, and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. If you want to, uh, you know, do some graph analytics, fraud detection, checkout tiger graph, a lot of cool stuff that we're seeing from the startup showcase here. >>Looking forward to unpacking that Lynn you've been in love now, a healthcare action with cloud ops, the pandemic pushes hard core on everybody. What are you working on? >>Yeah, it's all COVID data all the time. Uh, before the pandemic, I was supporting research groups for cancer genomics, which I still do, but, um, what's, uh, impactful is the explosive data volumes. You know, when you there's big data and there's genomic data, you know, I've worked with clients that have broken data centers, broken public cloud provider data centers because of the daily volume they're putting in. So there's this volume aspect. And then there's a collaboration, particularly around COVID research because of pandemic. And so you have this explosive volume, you have this, um, need for, uh, computational complexity. And that means cloud the challenge is it, you know, put the pedal to the metal. So you've got all these bioinformatics researchers that are used to single machine. Suddenly they have to deal with distributed compute. So it's a wild time to be in this space. >>What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically what's the big change has happened. >>The amount of data that is being put into the public cloud, um, previously people would have their data on their local, uh, capacity, and then they would publish their paper and the data may or may not become available for, uh, reproducing the research, uh, to accelerate for drug discovery and even variant identification. The data sets are being pushed to public cloud repositories, which is a whole new set of concerns. You have not only dealing with the volume and cost, but security, you know, there's federated security is non-trivial and not well understood by this domain. So there's so much work available here. >>Awesome. Peter, you're doing a lot with the data as a platform kind of view and platform engineering data as code is, is something that's being kicked around. What are you working on and how does platform engineering change as data becomes so much more prevalent in its value proposition? >>Yeah. So I'm the founder of cloud Cedar and, um, we sort of built this company out, this consultancy all around the challenges that a lot of companies have got with getting their data sorted, getting it organized, getting it ready for other use cases, such as analytics and machine learning, um, AI workloads and the like. So typically a platform engineering team will look after the organization of a company infrastructure, making sure that it's coherent across the company and a data platform, engineering teams doing something similar in that sense where they're, they're looking at making sure that, uh, data teams have a solid foundation to build upon, uh, that everything's quite predictable and what that enables is a faster velocity and the ability to use data as code as a way of specifying and onboarding data, building that, translating it, transforming it out into its specific domains and then on to data products. >>I have to ask you while you're here. Um, there's a big trend around data meshes right now. You're hearing, we've had a lot of stuff on the cube. Um, what are practical that people are using data mesh, first of all, is it relevant and how are people looking at this data mesh conversation? >>I think it becomes more and more relevant, uh, the bigger the organization that you're dealing with. So, you know, often times in the enterprise, you've got, uh, projects with timelines of five to 10 years often outlasting technology life cycles. The technology that you're building on is probably irrelevant by the time that you complete it. And what we're seeing is that data engineering teams and data teams more broadly, this organizational bottleneck and data mesh is all about, uh, breaking down that, um, bottleneck and decentralizing the work, shifting that work back onto, uh, development teams who oftentimes have got more of the context and a centralized data engineering team. And we're seeing a lot of, uh, Philocity increases as a result of that. >>It's interesting. There's so many different aspects of how data is changing the world. Lynn talks about the volume with the cloud and genomics. We're hearing data engineering at a platform level. You're talking about slicing and dicing and real-time information. You mentioned rock set, Alex. So I'd like to ask each of you to answer this next question, which is how has the team dynamics changed with data engineering because every single company's impacted. So if you're researchers, Lynn, you're pumping more data into the cloud, that's got a little bit of data engineering to it. Do they even understand that is that impacting them? So how has data changed the responsibilities or roles in this new emerging area of data engineering or whatever you want to call it? Lynn, we'll start with you. What do you, what do you see this impact? >>Well, you know, I mean, dev ops becomes data ops and ML ops and, uh, you know, this is a whole emergent area of work and it starts with an understanding of container technologies, which, you know, in different verticals like FinTech, that's a given, right, but in bioinformatics building an appropriately optimized Docker container is something I'm still working with customers now on because they have the concept of a Docker container is just a virtual machine, which obviously it isn't, or shouldn't be. So, um, you have, again, as I mentioned previously, this humongous skill gap, um, concepts like D, which are prevalent in ad tech FinTech, that's not available yet for most of my customers. So those are the things that I'm building. So the whole ops space is, um, this a wide open area. And really it's a question of practicality. Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using the data lake platform. But a lot of my customers are going to move to like an interim pass based solutions. If they're using spark, for example, they might use to use a managed spark solution as an interim, um, step up to the cloud before they build their own containers. Because the amount of knowledge to do that effectively is non-trivial >>Peter, you mentioned data, you mentioned data lakes, onboarding data into lake house architectures, for instance, something that you're familiar with. Um, this is not obvious to some verticals obvious to others. What do you see this data engineering impact from a personnel standpoint? And then ultimately how things get built, >>You know, are you directing that to me, >>Peter? >>Yeah. So I think, um, first and foremost, you know, the workload that data engineering teams are dealing with is ever increasing. Usually there's a 10 X ratio of, um, software engineers to data engineers within a business and usually double the amount of analysts to data engineers again. And so they're, they're fighting it ever increasing backload. And, uh, so they're fighting an ever increasing backlog of, of, uh, tasks to do and tickets to, to, to churn through. And so what we're seeing is that data engineering teams are becoming data platform engineering teams where they're building capability instead of constantly hamster wheels spinning if you will. And so with that in mind, with onboarding data into, uh, a Lakehouse architecture or a data lake where data engineering teams, uh, uh, getting wins is developing a very good baseline of structure where they're getting the categorization, the data tagging, whether this data is of a particular domain, does it contain some, um, PII data, for instance, uh, and, and, and, and then the security aspects, and also, you know, the mechanisms on which to do the data transformations, >>Alex, on the database side, those are known personas in an enterprise, a them, the database team, but now the scale is so big. Um, and there's so much going on in databases. How does the data engineering impact organizations from your standpoint? >>Yeah, absolutely. I think definitely, you know, gone are the days where you have a single relational database that is serving operational queries for your users, and you can also serve analytics queries, you know, for your internal teams. It's, it's now split up into those purpose-built databases, like we've said. Uh, but now you've got two different teams managing it and they're, they're designing their data model for different things. You know? So L LLTP might have a more de-normalized model, something that works for very fast operations and it's optimized for that, but now you need to suck that data out and get it elsewhere so that your, your PM or your business analyst, or whoever can crunch through some of that. And, you know, now it needs to be in a more normalized format. How do you sort of bridge that gap? That's a tough one. I think you need to, you know, build empathy on each side of, of what each side is doing and, and build the tools to say, Hey, this is going to help you, uh, you know, LLTP team, if we know what, what users are actually doing, and, and if you can get us into the right format there, so that then I can, you know, we can analyze it, um, on the backend. >>So I think, I think building empathy across those teams is helpful. >>When I left to come back to, you mentioned a health and informatics is coming back. Um, but it's interesting, you know, I look at a database world and you look at the solutions that are out there. A lot of companies that build data solutions don't have a data problem. They've never, they're not swimming in a lot of data, but then you look at like the field that you're working in right now with the genomics and health and, and quantum, they're always, they're dealing with data all the time. So you have people who deal with a lot of data all the time are breaking through New Zealand. People who are don't have that experience are now becoming data full, right? So people are now either it's a first time problem, or they've always been swimming in a ton of data. So it's more of what's the new playbook. And then, wow, I've never had to deal with a lot of data before. What's your take? >>It's interesting. Cause they know, uh, bioinformatics hires, um, uh, grad students. So grad students, you know, use their, our scripts with their file on their laptop. And so, um, to get those folks to understand distributed container-based computing is like I said, a not non-trivial problem. What's been really interesting with the money pouring in to COVID research is when I first started, some of the workflows would take, you know, literally 500 hours and that was just okay. And coming out of FinTech, I was, uh, I could, I was blown away like FinTech is like, could that please take a millisecond rather than a second? Right. And so what has now happened, which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really gone up because of the pandemic. And so there are, there are, there's this blending of people like me with more of a big data background coming into bioinformatics and working side by side. >>So it's this interesting sort of translation because you have the whole taxonomy of bioinformatics with genomics and sequencers and all the weird file types that you get. And then you have the whole taxonomy of dev ops data ops, you know, containers and Kubernetes and all that. And trying to get that into pipelines that can actually, you know, be efficient, given the constraints. Of course, we, on the tech side, we always want to make it super optimized. I had a customer that we got it down from 500 hours to minutes, but they wanted to stay with the past solution because it was easier for them to go from 500 hours to five hours was good enough, but you know, the techies want to get it down to five minutes. >>This is, this is, we've seen this movie before dev ops, um, edge and op operations, you know, IOT, world scenes, the convergence of cultures. Now you have data and then old, old school operations kind of coming up. So this kind of supports the thesis. That data as code is the next infrastructure as code. What do you guys, what's the reaction there for you guys? What do you think about that? What does data's code mean? If infrastructure's code was cloud and dev ops, what is data as code? What does that mean? >>I could take it if you like. I think, um, data teams, organizations, um, have been long been this bottleneck within the organization and there's like this dark matter of untapped energy and potential waiting to be unleashed a data with the advent of open source projects like DBT, um, have been slowly sort of embracing software development, lifecycle practices. And this is really sort of seeing a, a big steep increase in, um, in their velocity. And, and this is only going to increase and improve as we're seeing data teams, um, embrace starter as code. I think it's, uh, the future is bright for data. So I'm very excited. >>Lynn Peter reaction. I mean, agility data is code is developer concept CICB pipeline. You mentioned it new operational workflows coming into traditional operations reaction. >>Yeah. I mean, I think Peter's right on there. I'd say, you know, some of those tools we're seeing come in from, from software, like, like DBT, basically giving you that infrastructure as code, but applied to that data realm. Also there have been a few, like get for data type things, pack a derm, I believe is one and a few other ones where you bring that in and you also see a lot of immutability concepts flowing into the data realm. So I think just seeing some of those software engineering concepts come over to the data world has, has been pretty interesting >>What we'll literally just versioning datasets and the identification of what's in a data set. What's not in a data set. Some of this is around ethical AI as well, um, is a whole, uh, area that has come out of research groups. Um, mostly AI research groups, but is being applied to medical data and needs to be obviously, um, so this, this, this, um, metadata and versioning around data sets is really, I think, a very of the moment area. >>Yeah, I think we, we, you guys are bringing up a really good kind of direction that's happening in data. And that is something that you're seeing on the software side, open source and now dev ops. And now going to data is that the supply chain challenges of we've been talking about it here on the cube and this, this, um, this episode is, you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets is data secure. What is that going to look like? So you starting to get into this what's the supply chain, is it verified data sets if data sets have to be managed a whole nother level of data supply chain comes up, what do you guys think about that? >>I'll jump in. Oh, sorry. I'll jump in again. I think that, you know, there's, there's, um, some, some of the compliance requirements, um, around financial data are going to be applied to other types of data, probably health data. So immutability reproducibility, um, that is, uh, legally required. Um, also some of the privacy requirements that originated in Europe with GDPR are going to be replicated as more and more, um, types of data. And again, I'm always going to speak for health, but there's other types as well coming out of personal devices and that kind of stuff. So I think, you know, this idea of data as code is it's, it goes down to versioning and controlling and, um, that's, uh, that's sort of a real succinct way to say it that we didn't used to think about that. We just put it in our, you know, relational database and we were good to go, but, um, versioning and controlling in the global ecosystem is kind of, uh, where I'm focusing my efforts. >>It brings up a good question. If databases, if data is going to be part of the development process has to be addressable, which means horizontally scalable. That means it has to be accessible and open. How do you make that work and not foreclose it with a lot of restrictions? >>I think the use of data catalogs and appropriate tagging and categorization, you know, I think, you know, everyone's heard of the term data swamp, and I think that just came about because that everyone saw like, oh, wow, S3, you know, infinite storage. We just, you know, throw whatever in there for as long as we want. And I think at times, you know, the proliferation of S3 buckets, um, and the like, you know, we've just seen, uh, perhaps security, not maintained as well as it could have been. And I think that's kind of where data platform engineering teams have really sort of, uh, come into the, for, you know, creating a governance set of buckets like formation on top. But I think that's kind of where we need to see a lot more work with appropriate tags and also the automatic publishing of metadata into data catalogs so that, um, folks can easily search and address particular data sets and also control the access. You know, for instance, you've got some PII data, perhaps really only your marketing folks should be looking at email addresses and the like not perhaps your finance folks. So I think, you know, there's, there's a lot to be leveraged there in formation and other solutions, >>Alex, let's back up and talk about what's in it for the customer, right. Let's zoom back and saying reality is I just got to get my data to make sure it's secure always on and not going to be hackable. And I just got to get my data available on river performance. So then, then I got to start thinking about, okay, how do I intersect it? So what should teams be thinking about right now as I look up all their data options or databases across their enterprise? >>Yeah, it's, it's a, it's a good question. I just, you know, I think Peter made some good points there and you can think of history as sort of ebbing and flowing between centralization and decentralization a lot of times. And you know, when storage was expensive, data was going to be sort of centralized and Maine maintained, sort of a, you know, by the, uh, the people that are in charge of it. But then when, when S3 comes along, it really decreases storage. Now we can do a lot more experiments on it. We can store a lot more of our data, keep it around and do different things on it. You know, now we've got regulations again, we were, we gotta, we gotta be more realistic about, about keeping that data secure and make sure we're, we're doing the right things with it. So it's, we're gonna probably go through a period of, of centralization as we work out some of this tooling around, you know, tagging and, and ethical AI that, that both Peter. And when we're talking about here and maybe get us into that, that next wearable world of de-centralization again. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, the other extreme, >>Where are we in the market right now from progress standpoint, because data lakes don't want to be data swamps. You seeing lake formation as a data architecture, as an example, where are we with customers? What are they doing right now? Where would you put them in the progress bar of, of evolution towards the Nirvana of having this data sovereignty? And this data is code environment. Are they just now in the data lake store, everything real-time and historical? >>Well, I can jump in there. Um, SQL on files is the, is the driver. And so we know when Amazon got Athena, um, that really drove a lot of the customers to really realistically look at data lake technologies, but data warehouses are not going away. And the integration between the two is not seamless. No, we, we are partners with AWS, but we don't work for them. So we can tell you the truth here. Um, there's, there's work to it, but it really, for my customers, it really upped the ante around data lake, uh, because Athena and technologies like that, the serverless, um, SQL queries or the familiar quarry, um, uh, libraries really drove a movement away from either OLTB or OLAP, more expensive, more cumbersome structures, >>But they still need that. Oh, LTP, like if they have high latency issues, they want to be low latency. Can they have the best of both worlds? That's the question. >>I mean, I w I would say we're getting, you know, we're getting closer. We're always going to be, uh, you know, that technology is going to be moving forward, and then we'll just move the goalpost again, in terms of, of what we're asking from it. But I think, you know, the technology that's getting out there, you can get, get really well. And then, you know, just what I work in the dynamo DB world. So you can get really great low latency. So, you know, single digit millisecond LLTP response times on that. I think some of the analytics stuff has been a problem with that. And there, there are different solutions out there to where you can export dynamo to S3, and then you can be doing SQL on your FA your files with Athena Lakeland's talking about, or now you see, you know, rock set of partner here that that'll just ingest your dynamo, DB data, you know, make all those changes. So if you're doing a lot of, uh, changes to your data and dynamo is going to reflect in Roxanna, and then you can do analytics queries, you can do complex filters, different things like that. So, you know, I, I think we continue to push the envelope and then we moved the goalpost again. But, um, you know, I think we're in a, a lot better place than we were a few years ago, for sure. >>Where do you guys see this going relative to the next level? If data as code becomes that next agile, um, software defined environment with open source? Well, all of these new tools with serverless things happening with data lakes are built in with nice architectures with data warehouses, where does it go next? What happens next? If this becomes an agile environment, what's the impact? >>Well, I don't want to be so dominant, but I have, I feel strongly, so I'm going to jump in here. So, so I, um, I feel like, you know, now for my, my, my most computationally intensive workloads, I'm using GPS, I'm bursting to GPU for TensorFlow neural networks. So I've been doing quite a bit of exploration around Amazon bracket for QPS and it's early. Um, and it's specialty. It's not, you know, for everybody. And the learning curve again is pretty daunting, but, um, there are some use cases out there. I mean, I got ahold of a paper where some people did some, um, it was a Q CNN, um, quantum convolutional neural network for lung cancer images, um, from COVID patients and the, the, uh, the QP Hugh, um, algorithm pipeline performed more accurately and faster. So I think, um, bursting to quantum is something to pay attention to. >>Awesome. Peter, what's your take on what's next? >>Well, I think there's still, um, that, that was absolutely fascinating from Lynn, but I think also there's, there's, uh, you know, some more sort of low-level, uh, low-hanging fruit available in, in the data stack. I think there's a lot of, there's still a lot of challenges around the transformation there, getting our data from sort of raw landed data into business domains, and that sort of talks to a lot of what data mesh is all about. I think if we can somehow make that a little more frictionless, because that that's really where the like labor intensive work is. That's, that's kinda dominating, uh, data engineering teams and where we're sort of trying to push that, that workload back onto, um, you know, software engineering teams. >>Alice will give you the final word. What's the impact. What's the next step? What's it look like in the future? >>Yeah, for sure. I mean, I've never had the, uh, breaking a data center problem that wind's had, or the bursting the quantum problem, for sure. But, you know, if you're in that, you know, the pool I swim and of terabytes of data and below and things like that, I think it's a good time. It just like we saw, you know, like we were talking about dev ops and, and pushing, uh, you know, allowing software engineers to handle more of, of the operation stuff. I think the same thing with data can happen where, you know, software engineering teams can handle not just their code, not just, you know, deploying and operating it, but also thinking about their data around the code. And that doesn't mean you won't have people assist you within your organization. You won't have some specialists in there, but I think pushing more stuff, even onto the individual development teams where they have ownership of that. And they're thinking about it through all this different life cycle. I mean, I'm pretty bullish on that. And I think that's an exciting development >>Was that shift, what left with left is security. What does that mean to >>Shipped so much stuff left, but now, you know, the things that were at the end are back at the end again, but, uh, you know, at least we think we can think about that stuff early in the process, which is good, >>Great conversation, very provocative, very realistic and great impact on the future data as code is real, the developers I do believe will have a great operational role and the data stack concept and impacting things like quantum, it's all kind of lining up nicely. Um, and it's a great opportunity to be in this field from a science and policy standpoint. Um, data engineering is legit. It's going to continue to grow and thanks for unpacking that here on the queue. Appreciate it. Okay. Great panel D AWS heroes. They work with AWS and the ecosystem independently out there. They're in the trenches doing the front lines, cracking the code here with data as code season two, episode two of the ongoing series of the 80, but startups I'm John for your host. Thanks for watching.
SUMMARY :
remotely and look forward to see you in person at the next re-invent or other event. What trends do you see in the database space? So I do, uh, I do a lot of consulting work working with different people and, you know, often with, And really lot deep into the database side in terms of like cloud native impact, diversity of database and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. What are you working on? you know, put the pedal to the metal. What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically but security, you know, there's federated security is non-trivial and not well understood What are you working on and how does making sure that it's coherent across the company and a data platform, I have to ask you while you're here. So, you know, often times in the enterprise, you've got, uh, projects with So I'd like to ask each of you to answer this next question, which is how has the team dynamics Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using What do you see this data engineering impact from a personnel standpoint? and then the security aspects, and also, you know, the mechanisms How does the data engineering impact organizations from your standpoint? I think definitely, you know, gone are the days where you have a single relational database that is serving but it's interesting, you know, I look at a database world and you look at the solutions that are out there. which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really for them to go from 500 hours to five hours was good enough, but you know, edge and op operations, you know, IOT, world scenes, I could take it if you like. I mean, agility data is code is developer concept CICB I'd say, you know, some of those tools we're seeing come in from, from software, to be obviously, um, so this, this, this, um, metadata and versioning around you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets I think that, you know, there's, there's, um, How do you make that work and not foreclose it with a lot of restrictions? So I think, you know, there's, there's a lot to be leveraged there in formation And I just got to get my data available on river performance. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, Where would you put them in the progress bar of, of evolution towards the So we can tell you the truth here. the question. We're always going to be, uh, you know, that technology is going to be moving forward, so I, um, I feel like, you know, now for my, my, my most computationally intensive Peter, what's your take on what's next? but I think also there's, there's, uh, you know, some more sort of low-level, Alice will give you the final word. I think the same thing with data can happen where, you know, software engineering teams can handle What does that mean to Um, and it's a great opportunity to be
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lynn | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
New Zealand | LOCATION | 0.99+ |
Peter Hanson | PERSON | 0.99+ |
five hours | QUANTITY | 0.99+ |
500 hours | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Alex | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Alice | PERSON | 0.99+ |
each side | QUANTITY | 0.99+ |
Lynn Peter | PERSON | 0.99+ |
each | QUANTITY | 0.99+ |
Athena Lakeland | ORGANIZATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
John | PERSON | 0.99+ |
pandemic | EVENT | 0.98+ |
FinTech | ORGANIZATION | 0.98+ |
GDPR | TITLE | 0.98+ |
first | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
both worlds | QUANTITY | 0.97+ |
single machine | QUANTITY | 0.96+ |
10 years | QUANTITY | 0.96+ |
first time | QUANTITY | 0.96+ |
10 X | QUANTITY | 0.96+ |
CICB | ORGANIZATION | 0.94+ |
single | QUANTITY | 0.94+ |
John furry | PERSON | 0.93+ |
Lynn blankets | PERSON | 0.93+ |
80 | QUANTITY | 0.91+ |
Lindbergh Lega consulting | ORGANIZATION | 0.9+ |
LLTP | ORGANIZATION | 0.89+ |
one | QUANTITY | 0.87+ |
two different teams | QUANTITY | 0.87+ |
terabytes | QUANTITY | 0.86+ |
S3 | TITLE | 0.81+ |
COVID | ORGANIZATION | 0.79+ |
Alex | TITLE | 0.78+ |
Lakehouse | ORGANIZATION | 0.77+ |
few years ago | DATE | 0.77+ |
a millisecond | QUANTITY | 0.77+ |
single digit | QUANTITY | 0.76+ |
D AWS | ORGANIZATION | 0.76+ |
Startup Showcase S2 E2 | EVENT | 0.73+ |
a second | QUANTITY | 0.73+ |
Kubernetes | TITLE | 0.72+ |
Athena | ORGANIZATION | 0.71+ |
season two | QUANTITY | 0.7+ |
SQL | TITLE | 0.69+ |
OLTB | ORGANIZATION | 0.69+ |
Redshift | ORGANIZATION | 0.69+ |
CNN | ORGANIZATION | 0.68+ |
Cedar | ORGANIZATION | 0.66+ |
Hugh | PERSON | 0.66+ |
dynamo | ORGANIZATION | 0.65+ |
episode | QUANTITY | 0.63+ |
Q | ORGANIZATION | 0.63+ |
episode two | OTHER | 0.6+ |
Maine | LOCATION | 0.6+ |
Rahul Pathak Opening Session | AWS Startup Showcase S2 E2
>>Hello, everyone. Welcome to the cubes presentation of the 80 minutes startup showcase. Season two, episode two, the theme is data as code, the future of analytics. I'm John furry, your host. We had a great day lineup for you. Fast growing startups, great lineup of companies, founders, and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Pathak VP of analytics at AWS cube alumni. Right? We'll thank you for coming on and being the opening keynote for this awesome event. >>Yeah. And it's great to see you, and it's great to be part of this event, uh, excited to, um, to help showcase some of the great innovation that startups are doing on top of AWS. >>Yeah. We last spoke at AWS reinvent and, uh, a lot's happened there, service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio Cribble monks next Liccardo, a HANA imply all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace. Uh, pretty exciting. >>No, it's, uh, it's awesome. I mean, I think there's so much innovation happening and you know, the, the wonderful part of working with data is that the demand for services and products that help customers drive insight from data is just skyrocketing and has no sign of no sign of slowing down. And so it's a great time to be in the data business. >>It's interesting to see the theme of the show getting traction, because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, uh, machine learnings, uh, getting iterated on you, seeing more models, being trained differently with better insights, action ones that all kind of like working like code. And this is a whole nother way. People are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? >>Uh, I think it's spot on, I mean, I think the idea of data's code and bringing some of the repeatability of processes from software development into how people built it, applications is absolutely fundamental and especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on? When you build a better model, you need to be able to explain and reproduce it. So I think your insights are spot on and these ideas are showing up in all stages of the data work flow from ingestion to analytics to I'm out >>This next way is about modernization and going to the next level with cloud-scale. Uh, thank you so much for coming on and being the keynote presenter here for this great event. Um, I'll let you take it away. Reinventing businesses, uh, with ads analytics, right? We'll take it away. >>Okay, perfect. Well, folks, we're going to talk about, uh, um, reinventing your business with, uh, data. And if you think about it, the first wave of reinvention was really driven by the cloud. As customers were able to really transform how they thought about technology and that's well on her way. Although if you stop and think about it, I think we're only about five to 10% of the way done in terms of it span being on the cloud. So lots of work to do there, but we're seeing another wave of reinvention, which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic, it really only accelerated that trend. And so what we're seeing is, uh, you know, it's really about the survival of the most informed folks for the best data are able to react more quickly to what's happening. >>Uh, we've seen customers being able to scale up if they're in, say the delivery business or scale down, if they were in the travel business at the beginning of all of this, and then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so, um, you know, it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And, um, you know, when it comes to, uh, good examples of this FINRA is a great one. You may not have heard of them, but that the U S equities regulators, all trading that happens in equities, they keep track of they're look at about 250 billion records per day. >>Uh, the examiner, I was only EMR, which is our spark and Hadoop service, and they're processing 20 terabytes of data running across tens of thousands of nodes. And they're looking for fraud and bad actors in the market. So, um, you know, huge, uh, transformation journey for FINRA over the years of customer I've gotten to work with personally since really 2013 onward. So it's been amazing to see their journey, uh, Pinterest, not a great customer. I'm sure everyone's familiar with, but, um, you know, they're about visual search and discovery and commerce, and, um, they're able to scale their daily lot searches, um, really a factor of three X or more, uh, drive down their costs. And they're using the Amazon Opus search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end-to-end journey around, uh, data from ingestion to analytics and machine learning. And we will want to provide a comprehensive set of capabilities for ingestion, cataloging analytics, and then machine learning. And all of these are things that our partners and the startups that are run on us have available to them to build on as they build and deliver value for their customers. >>And, you know, the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure. And we provide services for that. It's about unifying data, wherever it lives, connecting it. So the customers can build a complete picture of their customers and business. And then it's about innovation and really using machine learning to bring all of this unified data, to bear on driving new innovation and new opportunities for customers. And what we're trying to do AWS is really provide a scalable and secure cloud platform that customers and partners can build on a unifying is about connecting data. And it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to that customer there and use. And the key to that is good foundational governance. >>Once you can define good access controls, you then are more comfortable setting data free. And, um, uh, the other part of it is, uh, data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today, uh, is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose-built data stores, whether that's databases on AWS databases, outside SAS products, uh, as well as things like data warehouses and machine learning systems, but really connecting data as key. Uh, and then, uh, innovation, uh, how can we bring to bear? And we imagine all processes with new technologies like AI and machine learning, and AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an imagine the sentiment of audio and do that in real-time that lets you then personalize and dynamically tailor experiences, all of which are super important to getting an edge, um, in, uh, in the modern marketplace. And so at AWS, we, when we think about connecting the dots across sources of data, allowing customers to use data, lakes, databases, analytics, and machine learning, we want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, you know, in an ideal world, we'll create a closed loop. So you create a new experience. You observe our customers interact with it, that generates more data, which is a data source that feeds into the system. >>And, uh, you know, on AWS, uh, thinking about a modern data strategy, uh, really at the core is a data lakes built on us three. And I'll talk more about that in a second. Then you've got services like Athena included, lake formation for managing that data, cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's where you can avoid compromising on performance functionality or scale. Uh, and then as I mentioned, unification and inter interconnecting, all of that data. So if you need to move data between these systems, uh, there's well-trodden pathways that allow you to do that, and then features built into services that enable that. >>And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at key, um, and you know, this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Uh, then we talk about purpose-built systems at the best possible functionality, performance, and cost. Uh, and then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvented the goal here is to really take away the need to manage infrastructure from customers. They can really focus about driving differentiated business value, integrated governance, and then machine learning pervasively, um, not just as an end product for data scientists, but also machine learning built into data, warehouses, visualization and a database. >>And so it's scalable data lakes. Uh, data three is really the foundation for this. One of our, um, original services that AWS really the backbone of so much of what we do, uh, really unmatched your ability, availability, and scale, a huge portfolio of analytics services, uh, both that we offer, but also that our partners and customers offer and really arbitrary skin. We've got individual customers and estimator in the expert range, many in the hundreds of petabytes. And that's just growing. You know, as I mentioned, we see roughly a 10 X increase in data volume every five years. So that's a exponential increase in data volumes, Uh, from a purpose-built perspective, it's the right tool for the right job, the red shift and data warehousing Athena for querying all your data. Uh, EMR is our managed sparking to do, uh, open search for log analytics and search, and then Kinesis and Amex care for CAFCA and streaming. And that's been another big trend is, uh, real time. Data has been exploding and customers wanting to make sense of that data in real time, uh, is another big deal. >>Uh, some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, um, using managed storage and it's led us and since types, uh, the three X better price performance, and what's out there available to all our customers and partners in EMR, uh, with things like spark, we're able to deliver two X performance of open source with a hundred percent compatibility, uh, almost three X and Presto, uh, with on two, which is our, um, uh, new Silicon chips on AWS, better price performance, about 10 to 12% better price performance, and 20% lower costs. And then, uh, all compatible source. So drop your jobs, then have them run faster and cheaper. And that translates to customer benefits for better margins for partners, uh, from a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. If we invent, we, uh, announced serverless redshifts EMR, uh, serverless, uh, Kinesis and Kafka, um, and these are all game changes for customers in terms of freeing our customers and partners from having to think about infrastructure and allowing them to focus on data. >>And, um, you know, when it comes to several assumptions in analytics, we've really got a very full and complete set. So, uh, whether that's around data warehousing, big data processing streaming, or cataloging or governance or visualization, we want all of our customers to have an option to run something struggles as well as if they have specialized needs, uh, uh, instances are available as well. And so, uh, really providing a comprehensive deployment model, uh, based on the customer's use cases, uh, from a governance perspective, uh, you know, like information is about easy build and management of data lakes. Uh, and this is what enables data sharing and self service. And, um, you know, with you get very granular access controls. So rule level security, uh, simple data sharing, and you can tag data. So you can tag a group of analysts in the year when you can say those only have access to the new data that's been tagged with the new tags, and it allows you to very, scaleably provide different secure views onto the same data without having to make multiple copies, another big win for customers and partners, uh, support transactions on data lakes. >>So updates and deletes. And time-travel, uh, you know, John talked about data as code and with time travel, you can look at, um, querying on different versions of data. So that's, uh, a big enabler for those types of strategies. And with blue, you're able to connect data in multiple places. So, uh, whether that's accessing data on premises in other SAS providers or, uh, clouds, uh, as well as data that's on AWS and all of this is, uh, serverless and interconnected. And, um, and really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So this API is all available for integration as well, but then from an AML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, um, you know, we've infused machine learning throughout our, by, um, the state of the art machine running that we offer through SageMaker. >>And so you've got a ML in Aurora and Neptune for broths. Uh, you can train machine learning models from SQL, directly from Redshift and a female. You can use free inference, and then QuickSight has built in forecasting built in natural language, querying all powered by machine learning, same with anomaly detection. And here are the ideas, you know, how can we up our systems get smarter at the surface, the right insights for our customers so that they don't have to always rely on smart people asking the right questions, um, and you know, uh, really it's about bringing data back together and making it available for innovation. And, uh, thank you very much. I appreciate your attention. >>Okay. Well done reinventing the business with AWS analytics rural. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end-to-end view of the data. That seems to be a theme serverless, uh, in there, uh, Mel integration. Um, but then you also mentioned picking the right tool for the job. So then you've got like all these things moving on, simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics, what's the best practice? How do they, what's the what's the high order bit here? >>Uh, so the basic hierarchy is, you know, historically legacy systems are rigid and inflexible, and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building a decoupled, scalable future proof architecture. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common KPIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes since it makes it easy to add in a new type of engine, as we invent a better one a few years from now. Uh, and then, uh, once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain, and once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So, uh, think about modernizing, moving to the cloud, setting up for the future, connecting data end to end, and then figuring out how to use that to your advantage. >>I know as you mentioned, modern data strategy gives you the best of both worlds. And you've mentioned, um, briefly, I want to get a little bit more, uh, insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews, these companies we're going to be hearing from today is open source. The role opens playing. Um, how do you see that integrating in? Because again, this is just like software, right? Open, uh, open source software, open source data. It seems to be a trend. What does open look like to you? How do you see that progressing? >>Uh, it's a great question. Uh, open operates on multiple dimensions, John, as you point out, there's open data formats. These are things like JSI and our care for analytics. This allows multiple engines tend to operate on data and it'll, it, it creates option value for customers. If you're going to data in an open format, you can use it with multiple technologies and that'll be future-proofed. You don't have to migrate your data. Now, if you're thinking about using a different technology. So that's one piece now that sort of software, um, also, um, really a big enabler for innovation and for customers. And you've got things like squat arc and Presto, which are popular. And I know some of the startups, um, you know, that we're talking about as part of the showcase and use these technologies, and this allows for really the world to contribute, to innovating and these engines and moving them forward together. And we're big believers in that we've got open source services. We contribute to open-source, we support open source projects, and that's another big part of what we do. And then there's open API is things like SQL or Python. Uh, again, uh, common ways of interacting with data that are broadly adopted. And this one, again, create standardization. It makes it easier for customers to inter-operate and be flexible. And so open is really present all the way through. And it's a big part, I think, of, uh, the present and the future. >>Yeah. It's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about, um, the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3, and you had a theme, the glue in lake formation kind of around S3. And then you had the constellation of, you know, Kinesis SageMaker and other things around it. And you said, you know, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other, other, other services around it around serverless. So one was more about the data lake with Athena glue and lake formation. The other one's about serverless. Explain that a little bit more for me, because I'm trying to understand where that fits. I get the data lake piece. Okay. Athena glue and lake formation enables it, and then you can pick and choose what you need on the serverless side. What does analytics in the center mean? >>So the idea there is that really, we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer, uh, has a serverless option for our customers. So, um, you could look at the bucket of analytics across things like Redshift or EMR or Athena, or, um, glue and league permission. You have the option to use instances or containers, but also to just not worry about infrastructure and just think declaratively about the data that you want to. >>Oh, so basically you're saying the analytics is going serverless everywhere. Talking about volumes, you mentioned 10 X volumes. Um, what are other stats? Can you share in terms of volumes? What are people seeing velocity I've seen data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity community have any kind of other kind of insights into those numbers? >>Yeah, I mean, I think from a stats perspective, um, you know, take Redshift, for example, customers are processing. So reading and writing, um, multiple exabytes of data there across from each shift. And, uh, you know, one of the things that we've seen in, uh, as time has progressed as, as data volumes have gone up and did a tapes have exploded, uh, you've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary, nested data into Redshift. Uh, we've also seen the seamless integration of data warehouses and data lakes. So, um, actually Redshift was one of the first to enable a straightforward acquiring of data. That's sitting in locally and drives as well as feed and that's managed on a stream and, uh, you know, those trends will continue. I think you'll kind of continue to see this, um, need to query data wherever it lives and, um, and, uh, allow, uh, leaks and warehouses and purpose-built stores to interconnect. >>You know, one of the things I liked about your presentation was, you know, kind of had the theme of, you know, modernize, unify, innovate, um, and we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future, some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and get in the data lakes and kind of blending that purpose built data store. And they get stuck there because they're so used to silos and their teams, and that's kind of holding back the machine learning side of it because the machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo brake busting is an issue. So what's your take on this part of the equation? >>Uh, so there's a few things I plan it. So you're absolutely right. I think that transition from some old data to interconnected data is always straightforward and it operates on a number of levels. You want to have the right technology. So, um, you know, we enable things like queries that can span multiple stores. You want to have good governance, you can connect across multiple ones. Uh, then you need to be able to get data in and out of these things and blue plays that role. So there's that interconnection on the technical side, but the other piece is also, um, you know, you want to think through, um, organizationally, how do you organize, how do you define it once data when they share it? And one of the asylees for enabling that sharing and, um, think about, um, some of the processes that need to get put in place and create the right incentives in your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it's, uh, it can be scary to open data up. And, uh, the key to that is to put good governance in place where you can ensure that data can be shared and distributed while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when, uh, customers really start to see the benefits of connecting all of it together, >>Right? And then we have a batch of startups here on this episode that are doing a lot of different things. Uh, some have, you know, new lake new lakes are forming observability lakes. You have CQL innovation on the front end data, tiering innovation at the data tier side, just a ton of innovation around this new data as code. How do you see as executive at AWS? You're enabling all this, um, where's the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow, um, and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code where's the opportunities for these startups and how can they continue to grow? >>Yeah, the, I mean, the opportunity is it's amazing, John, you know, we talked a little bit about this at the beginning, but the, there is no slow down insight for the volume of data that we're generating pretty much everything that we have, whether it's a watch or a phone or the systems that we interact with are generating data and, uh, you know, customers, uh, you know, we talk a lot about the things that'll stay the same over time. And so, you know, the data volumes will continue to go up. Customers are gonna want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And then we're going to want to be able to make decisions and innovate, uh, in a shorter cycle and run more experiments than they were able to do. >>And so I think as long as, and they're always going to want this data to be secure and well-protected, and so I think as long as we, and the startups that we work with can continue to push on making these things better. Can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get insight? And can I maintain a super high bar in security investments in these areas will just be off. Um, because, uh, the demand side of this equation is just in a great place, given what we're seeing in terms of theater and the architect for forum. >>I also love your comment about, uh, ML integration being the last leg of the equation here or less likely the journey, but you've got that enablement of the AIP solves a lot of problems. People can see benefits from good machine learning and AI is creating opportunities. Um, and also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff we're talking about security. So machine learning integration kind of connects all of this. Um, what's it all mean for the customers, >>For customers. That means that with machine learning and really enabling themselves to use machine learning, to make sense of data, they're able to find patterns that can represent new opportunities, um, quicker than ever before. And they're able to do it, uh, dynamically. So, you know, in a prior version of the world, we'd have little bit of systems and they would be relatively rigid and then we'd have to improve them. Um, with machine learning, this can be dynamic and near real time and you can customize them. So, uh, that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. Um, and you know, your ideas around, uh, data's code really come into play because machine learning needs to be repeatable and explainable. And that means versioning, uh, keeping track of everything that you've done from a code and data and learning and training perspective >>And data sets are updating the machine learning. You got data sets growing, they become code modules that can be reused and, uh, interrogated, um, security okay. Is a big as a big theme data, really important security is seen as one of our top use cases. Certainly now in this day and age, we're getting a lot of, a lot of breaches and hacks coming in, being defended. It brings up the open, brings up the data as code security is a good proxy for kind of where this is going. What's your what's take on that and your reaction to that. >>So I'm, I'm security. You can, we can never invest enough. And I think one of the things that we, um, you know, guide us in AWS is security, availability, durability sort of jobs, you know, 1, 2, 3, and, um, and it operates at multiple levels. You need to protect data and rest with encryption, good key management and good practices though. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom. And then you need to keep track of who did what and be able to verify and come back and prove that, uh, you know, uh, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, uh, you know, can I detect things that are happening that shouldn't be happening in near real time so they could put a stop to them. So I don't think any of us can ever invest enough in securing and protecting my data and our systems, and it is really fundamental or adding customer trust and it's just good business. So I think it is absolutely crucial. And we think about it all the time and are always looking for ways to raise >>Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a lot of these startups that are presenting, they're doing well. Business wise, they're being used by large enterprises and people buying their products and using their services for customers are implementing more and more of the hot startups products they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code this new future of analytics, what's your recommendation. >>So for customers who are out there, uh, recommend you take a look at, um, what, uh, the startups on AWS are building. I think there's tremendous innovation and energy, uh, and, um, there's really great technology being built on top of a rock solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace, uh, move to the cloud suite, modernized, you know, build a single picture of our data and, and figure out how to innovate and when >>Well, thanks for coming on. Appreciate your keynote. Thanks for the insight. And thanks for the conversation. Let's hand it off to the show. Let the show begin. >>Thank you, John pleasure, as always.
SUMMARY :
And we're going to kick it off here with our opening keynote with um, to help showcase some of the great innovation that startups are doing on top of AWS. service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction, because you start to see data being treated and especially so in machine learning where you need to think about the explainability of a model, Uh, thank you so much for coming on and being the keynote presenter here for this great event. And so what we're seeing is, uh, you know, it's really about the survival And so, um, you know, it's great to see the innovation that's happening to help customers make So, um, you know, huge, uh, transformation journey for FINRA over the years of customer And the key to that is good foundational governance. And you want to be able to connect data that's in data lakes with data And then you have the ability to use the right tool for the right job. And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at And that's been another big trend is, uh, real time. and freeing customers from the need to think about capacity management. those only have access to the new data that's been tagged with the new tags, and it allows you to And time-travel, uh, you know, John talked about data as code And here are the ideas, you know, how can we up our systems get smarter at the surface, I have to ask you some questions on the end-to-end Uh, so the basic hierarchy is, you know, historically legacy systems are I know as you mentioned, modern data strategy gives you the best of both worlds. And I know some of the startups, um, you know, that we're talking about as part of the showcase And then you had the other slide on the analytics at the center and you had Redshift and all the other, So the idea there is that really, we wanted to talk about the fact that if you zoom about volumes, you mentioned 10 X volumes. And, uh, you know, one of the things that we've seen And so the silo brake busting is an issue. side, but the other piece is also, um, you know, you want to think through, Uh, some have, you know, new lake new lakes are forming observability lakes. And so, you know, the data volumes will continue to go up. And so I think as long as, and they're always going to want this data to be secure and well-protected, Um, and also you also have mentioned the end to end with security piece. And they're able to do it, uh, that can be reused and, uh, interrogated, um, security okay. And then you need to keep track of who did what and be able Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a And so I encourage customers thinking about it to lean forward, And thanks for the conversation.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rahul Pathak | PERSON | 0.99+ |
John | PERSON | 0.99+ |
20 terabytes | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
2013 | DATE | 0.99+ |
20% | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
S3 | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
10 X | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
hundred percent | QUANTITY | 0.99+ |
SQL | TITLE | 0.98+ |
both | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
80 minutes | QUANTITY | 0.98+ |
each shift | QUANTITY | 0.98+ |
one piece | QUANTITY | 0.98+ |
about 80% | QUANTITY | 0.98+ |
Neptune | LOCATION | 0.98+ |
one | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
today | DATE | 0.97+ |
QuickSight | ORGANIZATION | 0.97+ |
three | QUANTITY | 0.97+ |
Redshift | TITLE | 0.97+ |
wave of reinvention | EVENT | 0.97+ |
first | EVENT | 0.96+ |
hundreds of petabytes | QUANTITY | 0.96+ |
HANA | TITLE | 0.96+ |
first | QUANTITY | 0.95+ |
both worlds | QUANTITY | 0.95+ |
Aurora | LOCATION | 0.94+ |
Amex | ORGANIZATION | 0.94+ |
SAS | ORGANIZATION | 0.94+ |
pandemic | EVENT | 0.94+ |
12% | QUANTITY | 0.93+ |
about 10 | QUANTITY | 0.93+ |
past couple of years | DATE | 0.92+ |
Kafka | TITLE | 0.92+ |
Kinesis | ORGANIZATION | 0.92+ |
Liccardo | TITLE | 0.91+ |
EMR | TITLE | 0.91+ |
about five | QUANTITY | 0.89+ |
tens of thousands of nodes | QUANTITY | 0.88+ |
Kinesis | TITLE | 0.88+ |
10% | QUANTITY | 0.87+ |
three X | QUANTITY | 0.86+ |
Athena | ORGANIZATION | 0.86+ |
about 250 billion records per | QUANTITY | 0.85+ |
U S | ORGANIZATION | 0.85+ |
CAFCA | ORGANIZATION | 0.84+ |
Silicon | ORGANIZATION | 0.83+ |
every five years | QUANTITY | 0.82+ |
Season two | QUANTITY | 0.82+ |
Athena | OTHER | 0.78+ |
single picture | QUANTITY | 0.74+ |
Wen Phan, Ahana & Satyam Krishna, Blinkit & Akshay Agarwal, Blinkit | AWS Startup Showcase S2 E2
(gentle music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase. The theme is Data as Code; The Future of Enterprise Data and Analytics. This is the season two, episode two of the ongoing series of covering the exciting startups in the AWS ecosystem around data analytics and cloud computing. I'm your host, John Furrier. Today we're joined by great guests here. Three guests. Wen Phan, who's a Director of Product Management at Ahana, Satyam Krishna, Engineering Manager at Blinkit, and we have Akshay Agarwal, Senior Engineer at Blinkit as well. We're going to get into the relationship there. Let's get into. We're going to talk about how Blinkit's using open data lake, data house with Presto on AWS. Gentlemen, thanks for joining us. >> Thanks for having us. >> So we're going to get into the deep dive on the open data lake, but I want to just quickly get your thoughts on what it is for the folks out there. Set the table. What is the open data lakehouse? Why it is important? What's in it for the customers? Why are we seeing adoption around this because this is a big story. >> Sure. Yeah, the open data lakehouse is really being able to run a gamut of analytics, whether it be BI, SQL, machine learning, data science, on top of the data lake, which is based on inexpensive, low cost, scalable storage. And more importantly, it's also on top of open formats. And this to the end customer really offers a tremendous range of flexibility. They can run a bunch of use cases on the same storage and great price performance. >> You guys have any other thoughts on what's your reaction to the lakehouse? What is your experience with it? What's going on with Blinkit? >> No, I think for us also, it has been the primary driver of how as a company we have shifted our completely delivery model from us delivering in one day to someone who is delivering in 10 minutes, right? And a lot of this was made possible by having this kind of architecture in place, which helps us to be more open-source, more... where the tools are open-source, we have an open table format which helps us be very modular in nature, meaning we can pick solutions which works best for us, right? And that is the kind of architecture that we want to be in. >> Awesome. Wen, you know last time we chat with Ahana, we had a great conversation around Presto, data. The theme of this episode is Data as Code, which is interesting because in all the conversations in these episodes all around developers, which administrators are turning into developers, there's a developer vibe with data. And with opensource, it's software. Now you've got data taking a similar trajectory as how software development was with code, but the people running data they're not developers, they're administrators, they're operators. Now they're turning into DataOps. So it's kind of a similar vibe going on with branches and taking stuff out of and putting it back in, and testing it. Datasets becoming much more stable, iterating on machine learning algorithm. This is a movement. What's your guys reaction before we get into the relationships here with you guys. But, what's your reaction to this Data as Code movement? >> Yeah, so I think the folks at Blinkit are doing a great job there. I mean, they have a pretty compact data engineering team and they have some pretty stringent SLAs, as well as in terms of time to value and reliability. And what that ultimately translates for them is not only flexibility but reliability. So they've done some very fantastic work on a lot of automation, a lot of integration with code, and their data pipelines. And I'm sure they can give the details on that. >> Yes. Satyam and Akshay, you guys are engineers' software, but this is becoming a whole another paradigm where the frontline coding and or work or engineer data engineering is implementing the operations as well. It's kind of like DevOps for data. >> For sure. Right. And I think whenever you're working, even as a software engineer, the understanding of business is equally important. You cannot be working on something and be away from business, right? And that's where, like I mentioned earlier, when we realized that we have to completely move our stack and start giving analytics at 10 minutes, right. Because when you're delivering in 10 minutes, your leaders want to take decisions in your real-time. That means you need to move with them. You need to move with business. And when you do that, the kind of flexibility these softwares give is what enables the businesses at the end of the day. >> Awesome. This is the really kind of like, is there going to be a book called agile data warehouses? I don't think so. >> I think so. (laughing) >> The agile cloud data. This is cool. So let's get into what you guys do. What is Blinkit up to? What do you guys do? Can you take a minute to explain the company and your product? >> Sure. I'll take that. So Blinkit is India's biggest 10 minute delivery platform. It pioneered the delivery model in the country with over 10 million Indian shopping on our platform, ranging from everything: grocery staples, vegetables, emergency services, electronics, and much more, right. It currently delivers over 200,000 orders every day, and is in a hurry to bring the future of farmers to everyone in India. >> What's the relationship with Ahana and Blinkit? Wen, what's the tie in? >> Yeah, so Blinkit had a pretty well formed stack. They needed a little bit more flexibility and control. They thought a managed service was the way to go. And here at Ahana, we provide a SaaS managed service for Presto. So they engaged us and they evaluated our offering. And more importantly, we're able to partner. As a early stage startup, we really rely on very strong partners with great use cases that are willing to collaborate. And the folks at Blinkit have been really great in helping us push our product, develop our product. And we've been very happy about the value that we've been able to deliver to them as well. >> Okay. So let's unpack the open data lakehouse. What is it? What's under the covers? Let's get into it. >> Sure. So if bring up a slide. Like I said before, it's really a paradigm on being able to run a gamut of analytics on top of the open data lake. So what does that mean? How did it come about? So on the left hand side of the slide, we are coming out of this world where for the last several decades, the primary workhorse for SQL based processing and reporting and dashboarding use cases was really the data warehouse. And what we're seeing is a shift due to the trends in inexpensive scalable storage, cloud storage. The proliferation of open formats to facilitate using this storage to get certain amounts of reliability and performance, and the adoption of frameworks that can operate on top of this cloud data lake. So while here at Ahana, we're primarily focused on SQL workloads and Presto, this architecture really allows for other types of frameworks. And you see the ML and AI side. And like to Satyam's point earlier, offers a great amount of flexibility modularity for many use cases in the cloud. So really, that's really the lakehouse, and people like it for the performance, the openness, and the price performance. >> How's the open-source open side of it playing in the open-source? It's kind of open formats. What is the open-source angle on this because there's a lot of different approaches. I'm hearing open formats. You know, you have data stores which are a big part of seeing that. You got SQL, you mentioned SQL. There's got a mishmash of opportunities. Is it all coexisting? Is it one tool to rule the world or is it interchangeable? What's the open-source angle? >> There's multiple angles and I'll let definitely Satyam add to what I'm saying. This was definitely a big piece for Blinkit. So on one hand, you have the open formats. And what really the open formats enable is multiple compute engines to work on that data. And that's very huge. 'Cause it's open, you're not locked in. I think the other part of open that is important and I think it was important to Blinkit was the governance around that. So in particular Presto is governed by the Linux Foundation. And so, as a customer of open-source technology, they want some assurances for things like how's it governed? Is the license going to change? So there's that aspect of openness that I think is very important. >> Yeah. Blinkit, what's the data strategy here with lakehouse and you guys? Why are you adopting this type of architecture? >> So adding to what... Yeah, I think adding to Wen said, right. When we are thinking in terms of all these OpenStacks, you have got these open table formats, everything which is deployed over cloud, the primary reason there is modularity. It's as simple as that, right. You can plug and play so many different table formats from one thing to another based on the use case that you're trying to serve, so that you get the most value out of data. Right? I'll give you a very simple example. So for us we use... not even use one single table format. It's not that one thing solves for everything, right? We use both Hudi and Iceberg to solve for different use cases. One is good for when you're working for a certain data site. Icebergs works well when you're in the SQL kind of interface, right. Hudi's still trying to reach there. It's going to go there very soon. So having the ability to plug and play different formats based on the use case helps you to grow faster, helps you to take decisions faster because you now you're not stuck on one thing. They will have to implement it. Right. So I think that's what it is great about this data lake strategy. Keeping yourself cost effective. Yeah, please. >> So the enablement is basically use case driven. You don't have to be rearchitecturing for use cases. You can simply plug can play based on what you need for the use case. >> Yeah. You can... and again, you can focus on your business use case. You can figure out what your business users need and not worry about these things because that's where Presto comes in, helps you stitch that data together with multiple data formats, give you the performance that you need and it works out the best there. And that's something that you don't get to with traditional warehouse these days. Right? The kind of thing that we need, you don't get that. >> I do want to add. This is just to riff on what Satyam said. I think it's pretty interesting. So, it really allowed him to take the best-of-breed of what he was seeing in the community, right? So in the case of table formats, you've got Delta, you've got Hudi, you've got Iceberg, and they all have got their own roadmap and it's kind of organic of how these different communities want to evolve, and I think that's great, but you have these end consumers like Blinkit who have different maybe use cases overlapping, and they're not forced to pick one. When you have an open architecture, they can really put together best-of-breed. And as these projects evolve, they can continue to monitor it and then make decisions and continue to remain agile based on the landscape and how it's evolving. >> So the agility is a key point. Flexibility and agility, and time to valuing with your data. >> Yeah. >> All right. Wen, I got to get in to why the Presto is important here. Where does that fit in? Why is Presto important? >> Yeah. For me, it all comes down to the use cases and the needs. And reporting and dashboarding is not going to go away anytime soon. It's a very common use case. Many of our customers like Blinkit come to us for that use case. The difference now is today, people want to do that particular use case on top of the modern data lake, on top of scalable, inexpensive, low cost storage. Right? In addition to that, there's a need for this low latency interactive ability to engage with the data. This is often arises when you need to do things in a ad hoc basis or you're in the developmental phase of building things up. So if that's what your need is. And latency's important and getting your arms around the problems, very important. You have a certain SLA, I need to deliver something. That puts some requirements in the technology. And Presto is a perfect for that ideal use case. It's ideal for that use case. It's distributed, it's scalable, it's in memory. And so it's able to really provide that. I think the other benefit for Presto and why we're bidding on Presto is it works well on the data lakes, but you have to think about how are these organizations maturing with this technology. So it's not necessarily an all or nothing. You have organizations that have maybe the data lake and it's augmented with other analytical data stores like Snowflake or Redshift. So Presto also... a core aspect is its ability to federate or connect and query across different data sources. So this can be a permanent thing. This could also be a transitionary thing. We have some customers that are moving and slowly shifting their data portfolio from maybe all data warehouse into 80% data lake. But it gives that optionality, it gives that ability to transition over a timeframe. But for all those reasons, the latency, the scalability, the federation, is why Presto for this particular use case. >> And you can connect with other databases. It can be purpose built database, could be whatever. Right? >> Sure. Yes, yes. Presto has a very pluggable architecture. >> Okay. Here's the question for the Blinkit team? Why did you choose Presto and what led you to Ahana? >> So I'll take this better, over this what Presto sits well in that reach is, is how it is designed. Like basically, Presto decouples your storage with the compute. Basically like, people can use any storage and Presto just works as a query engine for them. So basically, it has a constant connectors where you can connect with a real-time databases like Pinot or a Druid, along with your warehouses like Redshift, along with your data lake that's like based on Hudi or Iceberg. So it's like a very landscape that you can use with the Presto. And consumers like the analytics doesn't need to learn the SQL or different paradigms of the querying for different sources. They just need to learn a single source. And, they get a single place to consume from. They get a single consumer on their single destination to write on also. So, it's a homologous architecture, which allows you to put a central security like which Presto integrates. So it's also based on open architecture, that's Apache engine. And it has also certain innovative features that you can see based on caching, which reduces a lot of the cost. And since you have further decoupled your storage with the compute, you can further reduce your cost, because now the biggest part of our tradition warehouse is a storage. And the cost goes massively upwards with the amount of data that you've added. Like basically, each time that you add more data, you require more storage, and warehouses ask you to write the data in their own format. Over here since we have decoupled that, the storage cost have gone down. It's literally that your cost that you are writing, and you just pay for the compute, and you can scale in scale out based on the requirements. If you have high traffic, you scale out. If you have low traffic, you scale in. So all those. >> So huge cost savings. >> Yeah. >> Yeah. Cost effectiveness, for sure. >> Cost effectiveness and you get a very good price value out of it. Like for each query, you can estimate what's the cost for you based on that tracking and all those things. >> I mean, if you think about the other classic Iceberg and what's under the water you don't know, it's the hidden cost. You think about the tooling, right, and also, time it takes to do stuff. So if you have flexibility on choice, when we were riffing on this last time we chatted with you guys and you brought it up earlier around, you can have the open formats to have different use cases in different tools or different platforms to work on it. Redshift, you can use Redshift here, or use something over there. You don't have to get locking >> Absolutely. >> Satyam & Akshay: Yeah. >> Locking is a huge problem. How do you guys see that 'cause sounds like here there's not a lot of locking. You got the open formats, and you got choice. >> Yeah. So you get best of the both worlds. Like you get with Ahana or with the Presto, you can get the best of the both worlds. Since it's cloud native, you can easily deploy your clusters very easily within like five minutes. Your cluster is up, you can start working on it. You can deploy multiple clusters for multiple teams. You get also flexibility of adding new connectors since it's open and further it's also much more secure since it's based on cloud native. So basically, you can control your security endpoints very well. So all those things comes in together with this architecture. So you can definitely go more on the lakehouse architecture than warehousing when you want to deliver data value faster. And basically, you get the much more high value out of your data in a sorted template. >> So Satyam, it sounds like the old warehousing was like the application person, not a lot of usage, old, a lot of latency. Okay. Here and there. But now you got more speed to deploy clusters, scale up scale down. Application developers are as everyone. It's not one person. It's not one group. It's whenever you want. So, you got speed. You got more diversity in the data opportunities, and your coding. >> Yeah. I think data warehouses are a way to start for every organization who is getting into data. I don't think data warehousing is still a solution and will be a solution for a lot of teams which are still getting into data. But as soon as you start scaling, as you start seeing the cost going up, as you start seeing the number of use cases adding up, having an open format definitely helps. So, I would say that's where we are also heading into and that's how our journey as well started with Presto as well, why we even thought about Ahana, right. >> (John chuckles) >> So, like you mentioned, one of the things that happened was as we were moving to the lakehouse and the open table format, I think Ahana is one of the first ones in the market to have Hudi as a first class citizen completely supported with all the things which are not even present at the time of... even with Presto, right. So we see Ahana working behind the scenes, improving even some of the things already over the open-source ecosystem. And that's where we get the most value out of Ahana as well. >> This is the convergence of open-source magic and commercialization. Wen, because you think about Data as Code, reminds me, I hear, "Data warehouse, it's not going to go away." But you got cloud scale or scale. It reminds me of the old, "Oh yeah, I have a data center." Well, here comes the cloud. So, doesn't really kill the data center, although Amazon would say that the data center's going to be eliminated. No, you just use it for whatever you need it for. You use it for specific use cases, but everyone, all the action goes to the cloud for scale. The same things happen with data, and look at the open-source community. It's kind of coming together. Data as Code is coming together. >> Yeah, absolutely. >> Absolutely. >> I do want to again to connect on another dot in terms of cost and that. You know, we've been talking a little bit about price performance, but there's an implicit cost, and I think this was also very important to Blinkit, and also why we're offering a managed service. So one piece of it. And it really revolves around the people, right? So outside of the technology, the performance. One thing that Akshay brought up and it's another important piece that I should have highlighted a little bit more is, Presto exposes the ability to interact your data in a widely adopted way, which is basically ANSI SQL. So the ability for your practitioners to use this technology is huge. That's just regular Presto. In terms of a managed service, the guys at Blinkit are a great high performing team, but they have to be very efficient with their time and what they manage. And what we're trying to do is provide leverage for them. So take a lot of the heavy lifting away, but at the same time, figuring out the right things to expose so that they have that same flexibility. And that's been the balancing point that we've been trying to balance at Ahana, but that goes back to cost. How do I total cost of ownership? And that not doesn't include just the actual querying processing time, but the ability for the organization to go ahead and absorb the solution. And what does it cost in terms of the people involved? >> Yeah. Great conversation. I mean, this brings up the question of back in the data center, the cloud days, you had the concept of an SRE, which is now popular, site reliability engineer. One person does all the clusters and manages all the scale. Is the data engineer the new SRE for data? Are we seeing a similar trajectory? Just want to get your reaction. What do you guys think? >> Yes, so I would say, definitely. It depends on the teams and the sizes of that. We are high performing team so each automation takes bits on the pieces of the architecture, like where they want to invest in. And it comes out with the value of the engineer's time and basically like how much they can invest in, how much they need to configure the architecture, and how much time it'll take to time to market. So basically like, this is what I would also highlight as an engineer. I found Ahana like the... I would say as a Presto in a cloud native environment, or I think so there's the one in the market that seamlessly scales and then scales out. And further, with a team of us, I would say our team size like three to four engineers managing cluster day in day out, conferring, tuning and all those things takes a lot of time. And Ahana came in and takes it off our plate and the hands in a solution which works out of box. So that's where this comes in. Ahana it's also based on open-source community. >> So the time of the engineer's time is so valuable. >> Yeah. >> My take on it really in terms of the data engineering being the SRE. I think that can work, it depends on the actual person, and we definitely try to make the process as easy as possible. I think in Blinkit's case, you guys are... There are data platform owners, but they definitely are aware of the pipelines. >> John: Yeah. >> So they have very intimate knowledge of what data engineers do, but I think in their case, you guys, you're managing a ton of systems. So it's not just even Presto. They have a ton of systems and surfacing that interface so they can cater to all the data engineers across their data systems, I think is the big need for them. I know you guys you want to chime in. I mean, we've seen the architecture and things like that. I think you guys did an amazing job there. >> So, and to adding to Wen's point, right. Like I generally think what DevOps is to the tech team. I think, what is data engineer or the data teams are to the data organization, right? Like they play a very similar role that you have to act as a guardrail to ensure that everyone has access to the data so the democratizing and everything is there, but that has to also come with security, right? And when you do that, there are (indistinct) a lot of points where someone can interact with data. We have... And again, there's a mixed match of open-source tools that works well, as well. And there are some paid tools as well. So for us like for visualization, we use Redash for our ad hoc analysis. And we use Tableau as well whenever we want to give a very concise reporting. We have Jupyter notebooks in place and we have EMRs as well. So we always have a mixed batch of things where people can interact with data. And most of our time is spent in acting as that guardrail to ensure that everyone should have access to data, but it shouldn't be exploited, right. And I think that's where we spend most of our time in. >> Yeah. And I think the time is valuable, but that your point about the democratization aspect of it, there seems to be a bigger step function value that you're enabling and needs to be talked out. The 10x engineer, it's more like 50x, right? If you get it done right, the enablement downstream at the scale that we're seeing with this new trend is significant. It's not just, oh yeah, visualization and get some data quicker, there's actually real advantages on a multiple with that engineering. So, and we saw that with DevOps, right? Like, you do this right and then magic happens on the edges. So, yeah, it's interesting. You guys, congratulations. Great environment. Thanks for sharing the insight Blinkit. Wen, great to see you. Ahana again with Presto, congratulations. The open-source meets data engineering. Thanks so much. >> Thanks, John. >> Appreciate it. >> Okay. >> Thanks John. >> Thanks. >> Thanks for having us. >> This season two, episode two of our ongoing series. This one is Data as Code. This is theCUBE. I'm John furrier. Thanks for watching. (gentle music)
SUMMARY :
This is the season two, episode What is the open data lakehouse? And this to the end customer And that is the kind of into the relationships here with you guys. give the details on that. is implementing the operations as well. You need to move with business. This is the really kind of like, I think so. So let's get into what you guys do. and is in a hurry to bring And the folks at Blinkit the open data lakehouse. So on the left hand side of the slide, What is the open-source angle on this Is the license going to change? with lakehouse and you guys? So having the ability to plug So the enablement is and again, you can focus So in the case of table formats, So the agility is a key point. Wen, I got to get in and the needs. And you can connect Presto has a very pluggable architecture. and what led you to Ahana? And consumers like the analytics and you get a very good and also, time it takes to do stuff. and you got choice. best of the both worlds. like the old warehousing as you start seeing the cost going up, and the open table format, the data center's going to be eliminated. figuring out the right things to expose and manages all the scale. and the sizes of that. So the time of the it depends on the actual person, I think you guys did an amazing job there. So, and to adding Thanks for sharing the insight Blinkit. This is theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John Furrier | PERSON | 0.99+ |
Wen Phan | PERSON | 0.99+ |
Akshay Agarwal | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Ahana | PERSON | 0.99+ |
India | LOCATION | 0.99+ |
Blinkit | ORGANIZATION | 0.99+ |
Satyam Krishna | PERSON | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
Ahana | ORGANIZATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
Akshay | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 minutes | QUANTITY | 0.99+ |
Three guests | QUANTITY | 0.99+ |
Satyam | PERSON | 0.99+ |
Blinkit | PERSON | 0.99+ |
one day | QUANTITY | 0.99+ |
10 minute | QUANTITY | 0.99+ |
Redshift | TITLE | 0.99+ |
both worlds | QUANTITY | 0.99+ |
over 200,000 orders | QUANTITY | 0.99+ |
Presto | PERSON | 0.99+ |
over 10 million | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
10x | QUANTITY | 0.99+ |
Wen | PERSON | 0.98+ |
50x | QUANTITY | 0.98+ |
agile | TITLE | 0.98+ |
one piece | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
single destination | QUANTITY | 0.97+ |
One person | QUANTITY | 0.97+ |
each time | QUANTITY | 0.96+ |
each | QUANTITY | 0.96+ |
Presto | ORGANIZATION | 0.96+ |
one person | QUANTITY | 0.96+ |
single source | QUANTITY | 0.96+ |
Tableau | TITLE | 0.96+ |
one tool | QUANTITY | 0.96+ |
Icebergs | ORGANIZATION | 0.96+ |
Today | DATE | 0.95+ |
One | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.95+ |
Venkat Venkataramani, Rockset & Doug Moore, Command Alkon | AWS Startup Showcase S2 E2
(upbeat music) >> Hey everyone. Welcome to theCUBE's presentation of the AWS Startup Showcase. This is Data as Code, The Future of Enterprise Data and Analytics. This is also season two, episode two of our ongoing series with exciting partners from the AWS ecosystem who are here to talk with us about data and analytics. I'm your host, Lisa Martin. Two guests join me, one, a cube alumni. Venkat Venkataramani is here CEO & Co-Founder of Rockset. Good to see you again. And Doug Moore, VP of cloud platforms at Command Alkon. You're here to talk to me about how Command Alkon implemented real time analytics in just days with Rockset. Guys, welcome to the program. >> Thanks for having us. >> Yeah, great to be here. >> Doug, give us a little bit of a overview of Command Alkon, what type of business you are? what your mission is? That good stuff. >> Yeah, great. I'll pref it by saying I've been in this industry for only three years. The 30 years prior I was in financial services. So this was really exciting and eye opening. It actually plays into the story of how we met Rockset. So that's why I wanted to preface that. But Command Alkon is in the business, is in the what's called The Heavy Building Materials Industry. And I had never heard of it until I got here. But if you think about large projects like building buildings, cities, roads anything that requires concrete asphalt or just really big trucks, full of bulky materials that's the heavy building materials industry. So for over 40 years Command Alkon has been the north American leader in providing software to quarries and production facilities to help mine and load these materials and to produce them and then get them to the job site. So that's what our supply chain is, is from the quarry through the development of these materials, then out to the to a heavy building material job site. >> Got it, and now how historically in the past has the movement of construction materials been coordinated? What was that like before you guys came on the scene? >> You'll love this answer. So 'cause, again, it's like a step back in time. When I got here the people told me that we're trying to come up with the platform that there are 27 industries studied globally. And our industry is second to last in terms of automation which meant that literally everything is still being done with paper and a lot of paper. So when one of those, let's say material is developed, concrete asphalt is produced and then needs to get to the job site. They start by creating a five part printed ticket or delivery description that then goes to multiple parties. It ends up getting touched physically over 50 times for every delivery. And to give you some idea what kind of scale it is there are over 330 million of these type deliveries in north America every year. So it's really a lot of favor and a lot of manual work. So that was the state of really where we were. And obviously there are compelling reasons certainly today but even 3, 4, 5 years ago to automate that and digitize it. >> Wow, tremendous potential to go nowhere but up with the amount of paper, the lack of, of automation. So, you guys Command Alkon built a platform, a cloud software construction software platform. Talk to me of about that. Why you built it, what was the compelling event? I mean, I think you've kind of already explained the compelling event of all the paper but give us a little bit more context. >> Yeah. That was the original. And then we'll get into what happened two years ago which has made it even more compelling but essentially with everything on premises there's really in a huge amount of inefficiency. So, people have heard the enormous numbers that it takes to build up a highway or a really large construction project. And a lot of that is tied up in these inefficiencies. So we felt like with our significant presence in this market, that if we could figure out how to automate getting this data into the cloud so that at least the partners in the supply chain could begin sharing information. That's not on paper a little bit closer to real time that we could make has an impact on everything from the timing it takes to do a project to even the amount of carbon dioxide that's admitted, for example from trucks running around and being delayed and not being coordinated well. >> So you built the connect platform you started on Amazon DynamoDB and ran into some performance challenges. Talk to us about the, some of those performance bottlenecks and how you found Venkat and Rockset. >> So from the beginning, we were fortunate, if you start building a cloud three years ago you're you have a lot of opportunity to use some of the what we call more fully managed or serverless offerings from Amazon and all the cloud vendors have them but Amazon is the one we're most familiar with throughout the past 10 years. So we went head first into saying, we're going to do everything we can to not manage infrastructure ourselves. So we can really focus on solving this problem efficiently. And it paid off great. And so we chose dynamo as our primary database and it still was a great decision. We have obviously hundreds of millions of billions of these data points in dynamo. And it's great from a transactional perspective, but at some point you need to get the data back out. And what plays into the story of the beginning when I came here with no background basically in this industry, is that, and as did most of the other people on my team, we weren't really sure what questions were going to be asked of the data. And that's super, super important with a NoSQL database like dynamo. You sort of have to know in advance what those usage patterns are going to be and what people are going to want to get back out of it. And that's what really began to strain us on both performance and just availability of information. >> Got it. Venkat, let's bring you into the conversation. Talk to me about some of the challenges that Doug articulated the, is industry with such little automation so much paper. Are you finding that still out there for in quite a few industries that really have nowhere to go but up? >> I think that's a very good point. We talk about digital transformation 2.0 as like this abstract thing. And then you meet like disruptors and innovators like Doug, and you realize how much impact, it has on the real world. But now it's not just about disrupting, and digitizing all of these records but doing it at a faster pace than ever before, right. I think this is really what digital transformation in the cloud really enable tools you do that, a small team in a, with a very very big mission and responsibility like what Doug team have been, shepherding here. They're able to move very, very, very fast, to be able to kind of accelerate this. And, they're not only on the forefront of digitizing and transforming a very big, paper-heavy kind of process, but real-time analytics and real time reporting is a requirement, right? Nobody's wondering where is my supply chain three days ago? Are my, one of the most important thing in heavy construction is to keep running on a schedule. If you fall behind, there's no way to catch up because there's so many things that falls apart. Now, how do you make sure you don't fall behind, realtime analytics and realtime reporting on how many trucks are supposed to be delivered today? Halfway through the day, are they on track? Are they getting behind? And all of those things is not just able to manage the data but also be able to get reporting and analytics on that is a extremely important aspect of this. So this is like a combination of digital transformation happening in the cloud in realtime and realtime analytics being in the forefront of it. And so we are very, very happy to partner with digital disruptors like Doug and his team to be part of this movement. >> Doug, as Venkat mentioned, access to real time data is a requirement that is just simple truth these days. I'm just curious, compelling event wise was COVID and accelerator? 'Cause we all know of the supply chain challenges that we're all facing in one way or the other, was that part of the compelling event that had you guys go and say, we want to do DynamoDB plus Rockset? >> Yeah, that is a fantastic question. In fact, more so than you can imagine. So anytime you come into an industry and you're going to try to completely change or revolutionize the way it operates it takes a long time to get the message out. Sometimes years, I remember in insurance it took almost 10 years really to get that message out and get great adoption and then COVID came along. And when COVID came along, we all of a sudden had a situation where drivers and the foreman on the job site didn't want to exchange the paperwork. I heard one story of a driver taping the ticket for signature to the foreman on a broomstick and putting it out his windows so that he didn't get too close. It really was that dramatic. And again, this is the early days and no one really has any idea what's happening and we're all working from home. So we launched, we saw that as an opportunity to really help people solve that problem and understand more what this transformation would mean in the long term. So we launched internally what we called Project Lemonade obviously from, make lemonade out of lemons, that's the situation that we were in and we immediately made some enhancements to a mobile app and then launched that to the field. So that basically there's now a digital acceptance capability where the driver can just stay in the vehicle and the foreman can be anywhere, look at the material say it's acceptable for delivery and go from there. So yeah, it made a, it actually immediately caused many of our customers hundreds to begin, to want to push their data to the cloud for that reason just to take advantage of that one capability >> Project lemonade, sounds like it's made a lot of lemonade out of a lot of lemons. Can you comment Doug on kind of the larger trend of real time analytics and logistics? >> Yeah, obviously, and this is something I didn't think about much either not knowing anything about concrete other than it was in my driveway before I got here. And that it's a perishable product and you've got that basically no more than about an hour and a half from the time you mix it, put it in the drum and get it to the job site and pour it. And then the next one has to come behind it. And I remember I, the trend is that we can't really do that on paper anymore and stay on top of what has to be done we'll get into the field. So a foreman, I recall saying that when you're in the field waiting on delivery, that you have people standing around and preparing the site ready to make a pour that two minutes is an eternity. And so, working a real time is all always a controversial word because it means something different to anyone, but that gave it real, a real clarity to mean, what it really meant to have real time analytics and how we are doing and where are my vehicles and how is this job performing today? And I think that a lot of people are still trying to figure out how to do that. And fortunately, we found a great tool set that's allowing us to do that at scale. Thankfully, for Rockset primarily. >> Venkat talk about it from your perspective the larger trend of real time analytics not just in logistics, but in other key industries. >> Yeah. I think we're seeing this across the board. I think, whether, even we see a huge trend even within an enterprise different teams from the marketing team to the support teams to more and more business operations team to the security team, really moving more and more of their use cases from real time. So we see this, the industries that are the innovators and the pioneers here are the ones for whom real times that requirement like Doug and his team here or where, if it is all news, it's no news, it's useless, right? But I think even within, across all industries, whether it is, gaming whether it is, FinTech, Bino related companies, e-learning platforms, so across, ed tech and so many different platforms, there is always this need for business operations. Some, certain aspects certain teams within large organizations to, have to tell me how to win the game and not like, play Monday morning quarterback after the game is over. >> Right, Doug, let's go back at you, I'm curious with connects, have you been able to scale the platform since you integrated with Rockset? Talk to us about some of the outcomes that you've achieved so far? >> Yeah, we have, and of course we knew and we made our database selection with dynamo that it really doesn't have a top end in terms of how much information that we can throw at it. But that's very, very challenging when it comes to using that information from reporting. But we've found the same thing as we've scaled the analytics side with Rockset indexing and searching of that database. So the scale in terms of the number of customers and the amount of data we've been able to take on has been, not been a problem. And honestly, for the first time in my career, I can say that we've always had to add people every time we add a certain number of customers. And that has absolutely not been the case with this platform. >> Well, and I imagine the team that you do have is far more, sorry Venkat, far more strategic and able to focus on bigger projects. >> It, is, and, you've amazed at, I mean Venkat hit on a couple of points that it's in terms of the adoption of analytics. What we found is that we are as big a customer of this analytic engine as our customers are because our marketing team and our sales team are always coming to us. Well how many customers are doing this? How many partners are connected in this way? Which feature flags are turned on the platform? And the way this works is all data that we push into the platform is automatically just indexed and ready for reporting analytics. So we really it's no additional ad of work, to answer these questions, which is really been phenomenal. >> I think the thing I want to add here is the speed at which they were able to build a scalable solution and also how little, operational and administrative overhead that it has cost of their teams, right. I think, this is again, realtime analytics. If you go and ask hundred people, do you want fast analytics on realtime data or slow analytics on scale data, people, no one would say give me slow and scale. So, I think it goes back to again our fundamental pieces that you have to remove all the cost and complexity barriers for realtime analytics to be the new default, right? Today companies try to get away with batch and the pioneers and the innovators are forced to solve, I know, kind of like address some of these realtime analytics challenges. I think with the platforms like the realtime analytics platform, like Rockset, we want to completely flip it on its head. You can do everything in real time. And there may be some extreme situations where you're dealing with like, hundreds of petabytes of data and you just need an analyst to generate like, quarterly reports out of that, go ahead and use some really, really good batch base system but you should be able to get anything, and everything you want without additional cost or complexity, in real time. That is really the vision. That is what we are really enabling here. >> Venkat, I want to also get your perspective and Doug I'd like your perspective on this as well but that is the role of cloud native and serverless technologies in digital disruption. And what do you see there? >> Yeah, I think it's huge. I think, again and again, every customer, and we meet, Command Alkon and Doug and his team is a great example of this where they really want to spend as much time and energies and calories that they have to, help their business, right? Like what, are we accomplishing trying to accomplish as a business? How do we enable, how do we build better products? How do we grow revenue? How do we eliminate risk that is inherent in the business? And that is really where they want to spend all of their energy not trying to like, install some backend software, administer build IDL pipelines and so on and so forth. And so, doing serverless on the compute side of that things like AWS lambda does and what have you. And, it's a very important innovation but that isn't, complete the story or your data stack also have to become serverless. And, that is really the vision with Rockset that your entire realtime analytics stack can be operating and managing. It could be as simple as managing a serverless stack for your compute environments like your APS servers and what have you. And so I think that is going to be a that is for here to stay. This is a path towards simplicity and simplicity scales really, really well, right? Complexity will always be the killer that'll limit, how far you can use this solution and how many problems can you solve with that solution? So, simplicity is a very, very important aspect here. And serverless helps you, deliver that. >> And Doug your thoughts on cloud native and serverless in terms of digital disruption >> Great point, and there are two parts to the scalability part. The second one is the one that's more subtle unless you're in charge of the budget. And that is, with enough effort and enough money that you can make almost any technology scale whether it's multiple copies of it, it may take a long time to get there but you can get there with most technologies but what is least scalable, at least that I as I see that this industry is the people, everybody knows we have a talent shortage and these other ways of getting the real time analytics and scaling infrastructure for compute and database storage, it really takes a highly skilled set of resources. And the more your company grows, the more of those you need. And that is what we really can't find. And that's actually what drove our team in our last industry to even go this way we reached a point where our growth was limited by the people we could find. And so we really wanted to break out of that. So now we had the best of both scalable people because we don't have to scale them and scalable technology. >> Excellent. The best of both worlds. Isn't it great when those two things come together? Gentlemen, thank you so much for joining me on "theCUBE" today. Talking about what Rockset and Command Alkon are doing together better together what you're enabling from a supply chain digitization perspective. We appreciate your insights. >> Great. Thank you. >> Thanks, Lisa. Thanks for having us. >> My pleasure. For Doug Moore and Venkat Venkatramani, I'm Lisa Martin. Keep it right here for more coverage of "theCUBE", your leader in high tech event coverage. (upbeat music)
SUMMARY :
Good to see you again. what type of business you are? and to produce them and then And to give you some idea Talk to me of about that. And a lot of that is tied and how you found Venkat and Rockset. and as did most of the that really have nowhere to go but up? and his team to be part of this movement. and say, we want to do and then launched that to the field. kind of the larger trend and get it to the job site and pour it. the larger trend of real time analytics team to the support teams And that has absolutely not been the case and able to focus on bigger projects. that it's in terms of the and the pioneers and the but that is the role of cloud native And so I think that is going to be a And that is what we really can't find. and Command Alkon are doing Thank you. Moore and Venkat Venkatramani,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
Doug Moore | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Venkat Venkataramani | PERSON | 0.99+ |
Command Alkon | ORGANIZATION | 0.99+ |
Rockset | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
Doug Moore | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Two guests | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
27 industries | QUANTITY | 0.99+ |
two minutes | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Venkat | ORGANIZATION | 0.99+ |
north America | LOCATION | 0.99+ |
Monday morning | DATE | 0.99+ |
two parts | QUANTITY | 0.99+ |
over 50 times | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
over 330 million | QUANTITY | 0.99+ |
Venkat Venkatramani | PERSON | 0.99+ |
hundred people | QUANTITY | 0.99+ |
three days ago | DATE | 0.99+ |
two things | QUANTITY | 0.99+ |
over 40 years | QUANTITY | 0.99+ |
two years ago | DATE | 0.98+ |
three years ago | DATE | 0.98+ |
second | QUANTITY | 0.98+ |
five part | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Venkat | PERSON | 0.97+ |
hundreds | QUANTITY | 0.97+ |
30 years prior | DATE | 0.97+ |
both worlds | QUANTITY | 0.97+ |
Today | DATE | 0.97+ |
three years | QUANTITY | 0.96+ |
one story | QUANTITY | 0.95+ |
DynamoDB | TITLE | 0.94+ |
almost 10 years | QUANTITY | 0.94+ |
hundreds of millions of billions | QUANTITY | 0.93+ |
dynamo | ORGANIZATION | 0.92+ |
second one | QUANTITY | 0.91+ |
about an hour and a half | QUANTITY | 0.9+ |
theCUBE | ORGANIZATION | 0.9+ |
NoSQL | TITLE | 0.89+ |
3 | DATE | 0.87+ |
Bino | ORGANIZATION | 0.85+ |
past 10 years | DATE | 0.84+ |
every year | QUANTITY | 0.84+ |
Doug | ORGANIZATION | 0.83+ |
Analytics | TITLE | 0.83+ |
5 years ago | DATE | 0.82+ |
north American | OTHER | 0.81+ |
Startup Showcase | EVENT | 0.81+ |
Saket Saurabh, Next | AWS Startup Showcase S2 E2
[Music] welcome everyone to thecube's presentation of the aws startup showcase data as code this is season two episode two of our ongoing series covering exciting startups in the aws ecosystem to talk about data and analytics i'm your host lisa martin i have a cube alumni here with me socket sarah the ceo and founder of nexla he's here to talk about a future of automated data engineering socket welcome back great to see you lisa thank you for having me pleasure to be here again let's dig into nexla's mission ready to use data in the hands of every user what does that mean that means that you know every organization what what are they trying to do with data they want to make use of data they want to make decisions from data they want to make data a part of their business right the challenge is that every function in an organization today needs to leverage data whether it is finance whether it is hr whether it is marketing sales or product the problem for companies is that for each of these users into each of these teams the data is not ready for them to use as it is there is a lot that goes on before the data can be in their hands and it's in the tools that they like to work with and that's where a lot of data engineering happens today i would say that is by far one of the biggest bottlenecks today for companies in accelerating their business and being you know truly data-driven so talk to me about what makes nexla unique when you're in customer conversations as every company these days in every industry has to be a data company what do you tell them about what differentiates you yeah one of the biggest challenges out there is that the variety of data that companies work with is growing tremendously you know every sas application you use becomes a data source every type of database every type of user event anything can be a source of data now it is a tremendous engineering challenge for companies to make the data usable and the biggest challenge there is people companies just cannot have enough people to write that code to make the data engineering happen and where we come in with a very unique value is how to start thinking about making this whole process much faster much more automated at the end of the day lisa time to value and time to results is by far the number one thing on top of mind for customers time to value is critical we're all thin on patients these days whether we're in our consumerizer our business lives but being able to get access to data to make intelligent decisions whether it's on something that you're going to buy or a product or service you're going to deliver is really critical give me a snapshot of some of the users of nexla yeah the users of nexla are actually across different industries one of the main one of the interesting things is that the data challenges whether you are in financial services whether you are in retail e-commerce whether you are in healthcare they are very similar is basically getting connected to all these data systems and having the data now what people do with the data is very specific to their industry so for example within the e-commerce world or retail world itself you know companies from the likes of bed bath beyond and forever 21 and poshmark which are retailers or e-commerce companies they use nexla today to bring a lot of data in uh so do delivery companies like dodash and instacart and you know so do for example logistics providers like you know narwhal or customer loyalty and customer data companies like yacht pro so across the board for example just in retail we cover a whole bunch of companies got it now let's dig into you're here to talk about the future of automated data engineering talk to me about data engineering what is it let's define it and crack it open yeah um data engineering is i would say by far one of the hottest areas of work today the one of the hardest people to hire if you're looking for one data engineering is basically um all the code you know the process and the people that is basically connecting to their system so just to give a very practical example right for um for somebody in e-commerce let's say a take-off case of door dash right it's extremely important for them to have data as to which stores have what products what is available is this something they can list for people to go and buy is this something that they can therefore deliver right this is data that changes all the time now imagine them getting data from hundreds of different merchants across the board so it is the task of data engineering to then consume that data from all these different places different formats different apis different systems and then somehow unify all the data so that it can be used by the applications that they are building so data engineering in this case becomes taking data from different places and making it useful again back to what i was talking about ready to use data it is a lot of code it's a lot of people not just that it is something that runs every single day so it means it has monitoring it has reliability um it has performance it has every aspect of engineering as we know going into it you mentioned it's a hot topic which it is but it's also really challenging to accomplish how does nexla help enable that yeah data engineering is quite interesting in that one it is difficult to implement you know the the necessary sort of pieces but it is also very repetitive at some level right i mean when you connect to say 10 systems and get data from them you know that's not the end of it you have 10 more and 10 more and 10 more and then at some point you have thousands of such you know data connectivity and data flows happening it's hard to maintain them right as well so the way nexla gets into the whole picture is looking at what can we understand about data what can we observe about the data systems what can be done from that and then start to automate certain pieces of data engineering so that we are helping those teams just accelerate a lot faster and it i would say comes down to more people being able to do these tasks rather than only very very specialized people more people being able to do the tasks more users kind of democratization of data really there can you talk to us in more detail about how naxa is automating data engineering yeah i think um you know i think this is best shared through a visual so let me walk you through that a little bit as to how we automated engineering right so if we think about data engineering three of the most core components are many parts to it but three of the most core components of that are integrating with data systems preparing and transforming data and then monitoring that right so automating data engineering happens in you know three different ways first of all connecting connecting to data is is basically about the gateway to data the ability to read and write data from different systems this is where the data journey starts but it is extremely complex because people have to write code to connect to different systems one part that we have automated is generating these connectors so that you don't have to write code for that also making them bi-directional is extremely valuable because now you can read and write from any system the second part is that the gateway the connector has read the data but how do you represent it to the user so anybody can understand it and that's where the concept of data product comes in so we also look at auto generating data products these become the common language and entity that people can understand and work with and then the third part is taking all this automation and bringing the human in the loop no automation is perfect and therefore bringing the human in the loop means that somebody who is an expert in data who can look at it and understand it can now do things which only data systems experts were able to do before so bringing in that user of data directly into the into the picture is one important part but let's not forget data challenges are very diverse and very complex so the same system also becomes accessible to the engineers who are experts in that and now both of these can work together while an engineer will come through apis and sdk and command interfaces a data user comes in through a nice no code user interface and all of these things coming together are what is accelerating back to that time to value that really everybody cares about so if i'm in marketing and i'm a data user i'm able to have a collaborative workflow with the data engineer yeah yeah for the first time that is actually possible and everybody's focuses on their expertise and their know-how so you know um somebody who for example in financial services really understands portfolio and transactions and different type of asset classes they have the data in front of them the engineers who understand the underlying real-time data feeds and those they are still involved in the loop but now they are not doing that back and forth you know as the user of data i'm not going to the engineer saying hey can you do this for me can you get the data here and that back and forth is not only time taking it's frustrating and the number one hold back right yeah that and that's time that nobody has to waste as we know for many reasons talk to me about when you look into your crystal ball which i'm sure you have one what is the future of of data engineering from nexus perspective you talked about the automation what's the future hold i think the future of data engineering becomes that we up level this at a point where um companies don't have to be slowed down for it um i think a lot of tooling is already happening the way to think about this is that here in 2022 if we think that our data challenges are you know like x they will be a thousand x in five years right i mean this complexity is just increasing very rapidly so we think that this becomes one of those fundamental layers you know and you know as i was saying maybe the last time this is like the road you know you don't feel it you just move on it you do your job you build your products deliver your services as a company this just works for you um and that's where i think the future is and that's where i think the future should be we all need to work towards that we're not there yet not there yet a lot of a lot of potential a lot of opportunity and a lot of momentum speaking of momentum i want to talk about data mesh that is a topic of a lot of excitement a lot of discussion let's unpack that yeah i think uh you know the idea that data should be democratized that people should get access to the data and it's all coming back to that sort of basic concept of scale companies can scale only when more people can do the relevant jobs without depending on each other right so the idea of data democratization has been there for a long time but you know recently in the last couple of years the concept of data mesh was introduced by zamak digani and thoughtworks and that has really caught the attention of people and the imagination of leadership as well the idea that data should be available as a product you know that democratization can happen what is the entity of the democratization that's data presented as a product that people can use and collaborate is extremely powerful um i think a lot of companies are gravitating towards that and that's why it's exciting this is promising a future that is you know possible so second speaking of data products we talked a little bit about this last time but can you really help us understand see smell touch feel what a data product is and give us that context yeah absolutely i think uh best to orient ourselves with the general thinking of how we consider something as a product right a product is something that we find ready to use for example this table that i'm using right now made out of raw materials wood metal screws somebody designed it somebody produced it and i'm using it right now when we think about data products we think about data as the raw material so for example a spreadsheet an api a database query those are the raw raw materials what is a data product is something that further enriches and enhances that entity to be much more usable ready to use right um let me illustrate that with a little bit of a visual actually and that might help okay um the idea of the data product and this is how a data product looks like in next lab for a user to write as you see the concept of a data product is something that first of all it's a logical entity this simply means that it's not a new copy of data just like containers or logical compute units you know these data products are logical entities but they represent data in the same consistent fashion regardless of where the data comes from what format it is in they provide the user the idea of what the structure of data is what the sample data looks like what the characteristics of data are it allows people to have some documentation around it what does the data mean what do these attributes you know mean and how to interpret them how to validate that data something that users often know in an industry how is my data looking like well this value can never be negative because it's a price for example right um then the ability to take these data products that you know we automate by generating as i was mentioning earlier automatically creating these data products taking these data products to create new data products now that's something that's very unique about data you could take data off about an order for a from a company and say well the order data has an order id and a user id but i need to look up shipping address so i can combine user and order data to get that information in one place so you know creating new data products giving people access hey i've designed a data product i think you'll find it useful you can go use that as it is you don't have to go from scratch so all of those things together make a data product something that people can find ready to use again and this is this is also usable by the again that example where i'm in marketing uh or i'm in sales this is available to me as a general user as a general user in the tool of your choice so you can say oh no i am most familiar with using data in a spreadsheet i would like it there or i prefer my data in a tableau or a looker to visualize it and you can have it there so these data products give multiple interfaces for the end user to make use of it got it i like it you're meeting the user where they are with relevant data that helps them understand so much more contextually i'm curious when you're in customer conversations customers that come to you saying saka we need to build the data mesh how is nexl relevant they're how what is your conversation like yeah when people want to build a data mesh they're really looking for how their organization will scale into the future uh there are multiple components to building a data mesh there's a tooling part of it the technology portion there are people and processes right i mean unless you train people in certain processes and say hey when you build a data product you know make sure you have taken care of privacy or compliance to certain rules or who do you give access to is something you have to follow some rules about so we provide the technology component of it and then the people and process is something that companies you know then as they adopt and do that right so the concept of data product becomes core to building the data mesh having governance on it uh having all this be self-serve it's an essential part of that so that's where we come into the picture as a as a technology component to the whole story and working to deliver on that mission to getting data in the hands of every user you mentioned i want to dig into in the last few minutes here that we have uh the target audience you mentioned a few by name big names customers that nexla has you i heard retail i heard e-commerce i think i heard logistics but talk to me about the target customer for nexla any verticals in particular or any company's sizes in particular as well yeah you know the one of the top three banks in the country is a big user of nexla as part of their data stack uh we actually sit as part of their enterprise-wide ai platform providing data to their data scientists um we're not allowed to share their name unfortunately but um you know there are multiple other companies in asset management area for example they work with a lot of data in markets portfolio and so on um the leading medical devices companies using nexla data scientists there are using data coming in real time or streaming for medical devices to train and um and combine that with other data to do sort of clinical trial related research that they do um we have you know the companies for example linkedin is an excellent customer linkedin is by far the largest social network um their marketing team leverages nexla to bring data from different type of systems together as well um you know so are companies in education space like nerdy is a public company that uses nexla for you know student enrollment education data as they collaborate with school districts for example um you know there are companies across the board in marketing live brand you know for example uses nexla so we are um we are you know from who uses nexla is today mostly mid to large to very large enterprises today leverage nexla as a very critical component and often mission critical data for which they leverage us do you see that changing anytime soon as every company these days has to be a data company we expect that as consumers whether it's my grocery store um or my local coffee shop that they've got to use data to deliver me that personalized experience do you see the target audience kind of shifting down to more into mid-market smb space for next level oh yeah absolutely look we started the journey of the company with the thinking that the most complex data challenges exist in the large enterprise and if we can make it no code self-serve easy to use for them we can bring the same high-end technology to everybody and this is exactly why we recently launched in the amazon marketplace so anybody can go there get access to nexla and start to use it and you will see more and more of that happen where we will be bringing even some free versions of our product available so you're absolutely right every company needs to leverage data and i think people are getting much better at it you know especially in the last couple of years i've seen that teams have become much more sophisticated yes even if you are a coffee shop and you're running campaigns you know getting people yelp reviews and so on this data that you can use and understand better your demographic your customer and run your business better so one day yes we will absolutely be in the hands of every single person here a lot more opportunity to delight a lot more consumers and customers socket thank you so much for joining me on the program during the startup showcase you did a great job of helping us understand the future of automated data engineering we appreciate your insights thank you so much lisa it's a pleasure talking to you likewise for soccer sarah i'm lisa martin you're watching thecube's coverage of the aws startup showcase season two episode two stick around more great content coming up from the cube the leader in hybrid tech event coverage [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
10 systems | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
Saket Saurabh | PERSON | 0.99+ |
lisa martin | PERSON | 0.99+ |
2022 | DATE | 0.99+ |
lisa | PERSON | 0.99+ |
sarah | PERSON | 0.99+ |
second part | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
third part | QUANTITY | 0.99+ |
one part | QUANTITY | 0.99+ |
nexla | ORGANIZATION | 0.99+ |
naxa | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
dodash | ORGANIZATION | 0.98+ |
hundreds of dif | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
first time | QUANTITY | 0.98+ |
five years | QUANTITY | 0.98+ |
narwhal | ORGANIZATION | 0.97+ |
both | QUANTITY | 0.97+ |
AWS | ORGANIZATION | 0.96+ |
instacart | ORGANIZATION | 0.96+ |
yacht pro | ORGANIZATION | 0.95+ |
ORGANIZATION | 0.94+ | |
one | QUANTITY | 0.94+ |
first | QUANTITY | 0.94+ |
aws | ORGANIZATION | 0.94+ |
one important part | QUANTITY | 0.92+ |
nexla | TITLE | 0.91+ |
one place | QUANTITY | 0.91+ |
every single day | QUANTITY | 0.91+ |
zamak digani | PERSON | 0.9+ |
three different ways | QUANTITY | 0.89+ |
amazon | ORGANIZATION | 0.89+ |
last couple of years | DATE | 0.88+ |
last couple of years | DATE | 0.88+ |
second | QUANTITY | 0.87+ |
poshmark | ORGANIZATION | 0.87+ |
sarah the ceo | PERSON | 0.85+ |
nexus | ORGANIZATION | 0.83+ |
season two | QUANTITY | 0.8+ |
a lot of people | QUANTITY | 0.8+ |
Showcase | EVENT | 0.79+ |
every function | QUANTITY | 0.79+ |
one day | QUANTITY | 0.78+ |
three banks | QUANTITY | 0.77+ |
10 more | QUANTITY | 0.77+ |
number one | QUANTITY | 0.76+ |
ferent | ORGANIZATION | 0.73+ |
lot of data | QUANTITY | 0.73+ |
thousand | QUANTITY | 0.73+ |
core components | QUANTITY | 0.7+ |
single person | QUANTITY | 0.69+ |
S2 E2 | EVENT | 0.67+ |
one of the biggest bottlenecks | QUANTITY | 0.67+ |
lot of companies | QUANTITY | 0.6+ |
episode two | QUANTITY | 0.59+ |
thecube | ORGANIZATION | 0.56+ |
challenges | QUANTITY | 0.53+ |
Jon Dahl, Mux | AWS Startup Showcase S2 E2
(upbeat music) >> Welcome, everyone, to theCUBE's presentation of the AWS Startup Showcase. And this episode two of season two is called "Data as Code," the ongoing series covering exciting new startups in the AWS ecosystem. I'm John Furrier, your host of theCUBE. Today, we're excited to be joined by Jon Dahl, who is the co-founder and CEO of MUX, a hot new startup building cloud video for developers, video with data. John, great to see you. We did an interview on theCube Conversation. Went into big detail of the awesomeness of your company and the trend that you're on. Welcome back. >> Thank you, glad to be here. >> So, video is everywhere, and video for pivot to video, you hear all these kind of terms in the industry, but now more than ever, video is everywhere and people are building with it, and it's becoming part of the developer experience in applications. So people have to stand up video into their code fast, and data is code, video is data. So you guys are specializing this. Take us through that dynamic. >> Yeah, so video clearly is a growing part of how people are building applications. We see a lot of trends of categories that did not involve video in the past making a major move towards video. I think what Peloton did five years ago to the world of fitness, that was not really a big category. Now video fitness is a huge thing. Video in education, video in business settings, video in a lot of places. I think Marc Andreessen famously said, "Software is eating the world" as a pretty, pretty good indicator of what the internet is actually doing to the economy. I think there's a lot of ways in which video right now is eating software. So categories that we're not video first are becoming video first. And that's what we help with. >> It's not obvious to like most software developers when they think about video, video industries, it's industry shows around video, NAB, others. People know, the video folks know what's going on in video, but when you start to bring it mainstream, it becomes an expectation in the apps. And it's not that easy, it's almost a provision video is hard for a developer 'cause you got to know the full, I guess, stack of video. That's like low level and then kind of just basic high level, just play something. So, in between, this is a media stack kind of dynamic. Can you talk about how hard it is to build video for developers? How is it going to become easier? >> Yeah, I mean, I've lived this story for too long, maybe 13 years now, when I first build my first video stack. And, you know, I'll sometimes say, I think it's kind of a miracle every time a video plays on the internet because the internet is not a medium designed for video. It's been hijacked by video, video is 70% of internet traffic today in an unreliable, sort of untrusted network space, which is totally different than how television used to work or cable or things like that. So yeah, so video is hard because there's so many problems from top to bottom that need to be solved to make video work. So you have to worry about video compression encoding, which is a complicated topic in itself. You have to worry about delivering video around the world at scale, delivering it at low cost, at low latency, with good performance, you have to worry about devices and how every device, Android, iOS, web, TVs, every device handles video differently and so there's a lot of work there. And at the end of the day, these are kind of unofficial standards that everyone's using. So one of the miracles is like, if you want to watch a video, somehow you have to get like Apple and Google to agree on things, which is not always easy. And so there's just so many layers of complexity that are behind it. I think one way to think about it is, if you want to put an image online, you just put an image online. And if you want to put video online, you build complex software, and that's the exact problem that MUX was started to help solve. >> It's interesting you guys have almost creating a whole new category around video infrastructure. And as you look at, you mentioned stack, video stack. I'm looking at a market where the notion of a media stack is developing, and you're seeing these verticals having similar dynamics with cloud. And if you go back to the early days of cloud computing, what was the developer experience or entrepreneurial experience, you had to actually do a lot of stuff before you even do anything, provision a server. And this has all kind of been covered in great detail in the glory of Agile and whatnot. It was expensive, and you had that actually engineer before you could even stand up any code. Now you got video that same thing's happening. So the developers have two choices, go do a bunch of stuff complex, building their own infrastructure, which is like building a data center, or lean in on MUX and say, "Hey, thank you for doing all that years of experience building out the stacks to take that hard part away," but using APIs that they have. This is a developer focused problem that you guys are solving. >> Yeah, that's right. my last company was a company called Zencoder, that was an API to video encoding. So it was kind of an API to a small part of what MUX does today, just one of those problems. And I think the thing that we got right at Zencoder, that we're doing again here at MUX, was building four developers first. So our number one persona is a software developer. Not necessarily a video expert, just we think any developer should be able to build with video. It shouldn't be like, yeah, got to go be a specialist to use this technology, because it should become just of the internet. Video should just be something that any developer can work with. So yeah, so we build for developers first, which means we spend a lot of time thinking about API design, we spend a lot of time thinking about documentation, transparent pricing, the right features, great support and all those kind of things that tend to be characteristics of good developer companies. >> Tell me about the pipe lining of the products. I'm a developer, I work for a company, my boss is putting pressure on me. We need video, we have all this library, it's all stacking up. We hired some people, they left. Where's the video, we've stored it somewhere. I mean, it's a nightmare, right? So I'm like, okay, I'm cloud native, I got an API. I need to get my product to market fast, 'cause that is what Agile developers want. So how do you describe that acceleration for time to market? You mentioned you guys are API first, video first. How do these customers get their product into the market as fast as possible? >> Yeah, well, I mean the first thing we do is we put what we think is probably on average, three to four months of hard engineering work behind a single API call. So if you want to build a video platform, we tell our customers like, "Hey, you can do that." You probably need a team, you probably need video experts on your team so hire them or train them. And then it takes several months just to kind of to get video flowing. One API call at MUX gives you on-demand video or live video that works at scale, works around the world with good performance, good reliability, a rich feature set. So maybe just a couple specific examples, we worked with Robin Hood a few years ago to bring video into their newsfeed, which was hugely successful for them. And they went from talking to us for the first time to a big launch in, I think it was three months, but the actual code time there was like really short. I want to say they had like a proof of concept up and running in a couple days, and then the full launch in three months. Another customer of ours, Bandcamp, I think switched from a legacy provider to MUX in two weeks in band. So one of the big advantages of going a little bit higher in the abstraction layer than just building it yourself is that time to market. >> Talk about this notion of video pipeline 'cause I know I've heard people I talk about, "Hey, I just want to get my product out there. I don't want to get stuck in the weeds on video pipeline." What does that mean for folks that aren't understanding the nuances of video? >> Yeah, I mean, it's all the steps that it takes to publish video. So from ingesting the video, if it's live video from making sure that you have secure, reliable ingest of that live feed potentially around the world to the transcoding, which is we talked a little bit about, but it is a, you know, on its own is a massively complicated problem. And doing that, well, doing that well is hard. Part of the reason it's hard is you really have to know where you're publishing too. And you might want to transcode video differently for different devices, for different types of content. You know, the pipeline typically would also include all of the workflow items you want to do with the video. You want to thumbnail a video, you want clip, create clips of the video, maybe you want to restream the video to Facebook or Twitter or a social platform. You want to archive the video, you want it to be available for downloads after an event. If it's just a, if it's a VOD upload, if it's not live in the first place. You have all those things and you might want to do simulated live with the video. You might want to actually record something and then play it back as a live stream. So, the pipeline Ty typically refers to everything from the ingest of the video to the time that the bits are delivered to a device. >> You know, I hear a lot of people talking about video these days, whether it's events, training, just want peer to peer experience, video is powerful, but customers want to own their own platform, right? They want to have the infrastructure as a service. They kind of want platform as a service, this is cloud talk now, but they want to have their own capability to build it out. This allows them to get what they want. And so you see this, like, is it SaaS? Is it platform? People want customization? So kind of the general purpose video solution does it really exist or doesn't? I mean, 'cause this is the question. Can I just buy software and work or is it going to be customized always? How do you see that? Because this becomes a huge discussion point. Is it a SaaS product or someone's going to make a SaaS product? >> Yeah, so I think one of the most important elements of designing any software, but especially when you get into infrastructure is choosing an abstraction level. So if you think of computing, you can go all the way down to building a data center, you can go all the way down to getting a colo and racking a server like maybe some of us used to do, who are older than others. And that's one way to run a server. On the other extreme, you have just think of the early days of cloud competing, you had app engine, which was a really fantastic, really incredible product. It was one push deploy of, I think Python code, if I remember correctly, and everything just worked. But right in the middle of those, you had EC2, which was, EC2 is basically an API to a server. And it turns out that that abstraction level, not Colo, not the full app engine kind of platform, but the API to virtual server was the right abstraction level for maybe the last 15 years. Maybe now some of the higher level application platforms are doing really well, maybe the needs will shift. But I think that's a little bit of how we think about video. What developers want is an API to video. They don't want an API to the building blocks of video, an API to transcoding, to video storage, to edge caching. They want an API to video. On the other extreme, they don't want a big application that's a drop in white label video in a box like a Shopify kind of thing. Shopify is great, but developers don't want to build on top of Shopify. In the payments world developers want Stripe. And that abstraction level of the API to the actual thing you're getting tends to be the abstraction level that developers want to build on. And the reason for that is, it's the most productive layer to build on. You get maximum flexibility and also maximum velocity when you have that API directly to a function like video. So, we like to tell our customers like you, you own your video when you build on top of MUX, you have full control over everything, how it's stored, when it's stored, where it goes, how it's published, we handle all of the hard technology and we give our customers all of the flexibility in terms of designing their products. >> I want to get back some use case, but you brought that up I might as well just jump to my next point. I'd like you to come back and circle back on some references 'cause I know you have some. You said building on infrastructure that you own, this is a fundamental cloud concept. You mentioned API to a server for the nerds out there that know that that's cool, but the people who aren't super nerdy, that means you're basically got an interface into a server behind the scenes. You're doing the same for video. So, that is a big thing around building services. So what wide range of services can we expect beyond MUX? If I'm going to have an API to video, what could I do possibly? >> What sort of experience could you build? >> Yes, I got a team of developers saying I'm all in API to video, I don't want to do all that transit got straight there, I want to build experiences, video experiences on my app. >> Yeah, I mean, I think, one way to think about it is that, what's the range of key use cases that people do with video? We tend to think about six at MUX, one is kind of the places where the content is, the prop. So one of the things that use video is you can create great video. Think of online courses or fitness or entertainment or news or things like that. That's kind of the first thing everyone thinks of, when you think video, you think Netflix, and that's great. But we see a lot of really interesting uses of video in the world of social media. So customers of ours like Visco, which is an incredible photo sharing application, really for photographers who really care about the craft. And they were able to bring video in and bring that same kind of Visco experience to video using MUX. We think about B2B tools, videos. When you think about it, all video is, is a high bandwidth way of communicating. And so customers are as like HubSpot use video for the marketing platform, for business collaboration, you'll see a lot of growth of video in terms of helping businesses engage their customers or engage with their employees. We see live events obviously have been a massive category over the last few years. You know, we were all forced into a world where we had to do live events two years ago, but I think now we're reemerging into a world where the online part of a conference will be just as important as the in-person component of a conference. So that's another big use case we see. >> Well, full disclosure, if you're watching this live right now, it's being powered by MUX. So shout out, we use MUX on theCUBE platform that you're experiencing in this. Actually in real time, 'cause this is one application, there's many more. So video as code, is data as code is the theme, that's going to bring up the data ops. Video also is code because (laughs) it's just like you said, it's just communicating, but it gets converted to data. So data ops, video ops could be its own new category. What's your reaction to that? >> Yeah, I mean, I think, I have a couple thoughts on that. The first thought is, video is a way that, because the way that companies interact with customers or users, it's really important to have good monitoring and analytics of your video. And so the first product we ever built was actually a product called MUX video, sorry, MUX data, which is the best way to monitor a video platform at scale. So we work with a lot of the big broadcasters, we work with like CBS and Fox Sports and Discovery. We work with big tech companies like Reddit and Vimeo to help them monitor their video. And you just get a huge amount of insight when you look at robust analytics about video delivery that you can use to optimize performance, to make sure that streaming works well globally, especially in hard to reach places or on every device. That's we actually build a MUX data platform first because when we started MUX, we spent time with some of our friends at companies like YouTube and Netflix, and got to know how they use data to power their video platforms. And they do really sophisticated things with data to ensure that their streams well, and we wanted to build the product that would help everyone else do that. So, that's one use. I think the other obvious use is just really understanding what people are doing with their video, who's watching what, what's engaging, those kind of things. >> Yeah, data is definitely there. You guys mentioned some great brands that are working with you guys, and they're doing it because of the developer experience. And I'd like you to explain, if you don't mind, in your words, why is the MUX developer experience so good? What are some of the results you're seeing from your customers? What are they saying to you? Obviously when you win, you get good feedback. What are some of the things that they're saying and what specific develop experiences do they like the best? >> Yeah, I mean, I think that the most gratifying thing about being a startup founder is when your customers like what you're doing. And so we get a lot of this, but it's always, we always pay attention to what customers say. But yeah, people, the number one thing developers say when they think about MUX is that the developer experience is great. I think when they say that, what they mean is two things, first is it's easy to work with, which helps them move faster, software velocity is so important. Every company in the world is investing and wants to move quickly and to build quickly. And so if you can help a team speed up, that's massively valuable. The second thing I think when people like our developer experience is, you know, in a lot of ways that think that we get out of the way and we let them do what they want to do. So well, designed APIs are a key part of that, coming back to abstraction, making sure that you're not forcing customers into decisions that they actually want to make themselves. Like, if our video player only had one design, that that would not be, that would not work for most developers, 'cause developers want to bring their own design and style and workflow and feel to their video. And so, yeah, so I think the way we do that is just think comprehensively about how APIs are designed, think about the workflows that users are trying to accomplish with video, and make sure that we have the right APIs, make sure they're the right information, we have the right webhooks, we have the right SDKs, all of those things in place so that they can build what they want. >> We were just having a conversation on theCUBE, Dave Vellante and I, and our team, and I'd love to get you a reaction to this. And it's more and more, a riff real quick. We're seeing a trend where video as code, data as code, media stack, where you're starting to see the emergence of the media developer, where the application of media looks a lot like kind of software developer, where the app, media as an app. It could be a chat, it could be a peer to peer video, it could be part of an event platform, but with all the recent advances, in UX designers, coders, the front end looks like an emergence of these creators that are essentially media developers for all intent and purpose, they're coding media. What's your reaction to that? How do you see that evolving? >> I think the. >> Or do you agree with it? >> It's okay. >> Yeah, yeah. >> Well, I think a couple things. I think one thing, I think this goes along through saying, but maybe it's disagreement, is that we don't think you should have to be an expert at video or at media to create and produce or create and publish good video, good audio, good images, those kind of things. And so, you know, I think if you look at software overall, I think of 10 years ago, the kind of DevOps movement, where there was kind of a movement away from specialization in software where the same software developer could build and deploy the same software developer maybe could do front end and back end. And we want to bring that to video as well. So you don't have to be a specialist to do it. On the other hand, I do think that investments and tooling, all the way from video creation, which is not our world, but there's a lot of amazing companies out there that are making it easier to produce video, to shoot video, to edit, a lot of interesting innovations there all the way to what we do, which is helping people stream and publish video and video experiences. You know, I think another way about it is, that tool set and companies doing that let anyone be a media developer, which I think is important. >> It's like DevOps turning into low-code, no-code, eventually it's just composability almost like just, you know, "Hey Siri, give me some video." That kind of thing. Final question for you why I got you here, at the end of the day, the decision between a lot of people's build versus buy, "I got to get a developer. Why not just roll my own?" You mentioned data center, "I want to build a data center." So why MUX versus do it yourself? >> Yeah, I mean, part of the reason we started this company is we have a pretty, pretty strong opinion on this. When you think about it, when we started MUX five years ago, six years ago, if you were a developer and you wanted to accept credit cards, if you wanted to bring payment processing into your application, you didn't go build a payment gateway. You just probably used Stripe. And if you wanted to send text messages, you didn't build your own SMS gateway, you probably used Twilio. But if you were a developer and you wanted to stream video, you built your own video gateway, you built your own video application, which was really complex. Like we talked about, you know, probably three, four months of work to get something basic up and running, probably not live video that's probably only on demand video at that point. And you get no benefit by doing it yourself. You're no better than anyone else because you rolled your own video stack. What you get is risk that you might not do a good job, maybe you do worse than your competitors, and you also get distraction where you've just taken, you take 10 engineers and 10 sprints and you apply it to a problem that doesn't actually really give you differentiated value to your users. So we started MUX so that people would not have to do that. It's fine if you want to build your own video platform, once you get to a certain scale, if you can afford a dozen engineers for a VOD platform and you have some really massively differentiated use case, you know, maybe, live is, I don't know, I don't have the rule of thumb, live videos maybe five times harder than on demand video to work with. But you know, in general, like there's such a shortage of software engineers today and software engineers have, frankly, are in such high demand. Like you see what happens in the marketplace and the hiring markets, how competitive it is. You need to use your software team where they're maximally effective, and where they're maximally effective is building differentiation into your products for your customers. And video is just not that, like very few companies actually differentiate on their video technology. So we want to be that team for everyone else. We're 200 people building the absolute best video infrastructure as APIs for developers and making that available to everyone else. >> John, great to have you on with the showcase, love the company, love what you guys do. Video as code, data as code, great stuff. Final plug for the company, for the developers out there and prospects watching for MUX, why should they go to MUX? What are you guys up to? What's the big benefit? >> I mean, first, just check us out. Try try our APIs, read our docs, talk to our support team. We put a lot of work into making our platform the best, you know, as you dig deeper, I think you'd be looking at the performance around, the global performance of what we do, looking at our analytics stack and the insight you get into video streaming. We have an emerging open source video player that's really exciting, and I think is going to be the direction that open source players go for the next decade. And then, you know, we're a quickly growing team. We're 60 people at the beginning of last year. You know, we're one 50 at the beginning of this year, and we're going to a add, we're going to grow really quickly again this year. And this whole team is dedicated to building the best video structure for developers. >> Great job, Jon. Thank you so much for spending the time sharing the story of MUX here on the show, Amazon Startup Showcase season two, episode two, thanks so much. >> Thank you, John. >> Okay, I'm John Furrier, your host of theCUBE. This is season two, episode two, the ongoing series cover the most exciting startups from the AWS Cloud Ecosystem. Talking data analytics here, video cloud, video as a service, video infrastructure, video APIs, hottest thing going on right now, and you're watching it live here on theCUBE. Thanks for watching. (upbeat music)
SUMMARY :
Went into big detail of the of terms in the industry, "Software is eating the world" People know, the video folks And if you want to put video online, And if you go back to the just of the internet. lining of the products. So if you want to build a video platform, the nuances of video? all of the workflow items you So kind of the general On the other extreme, you have just think infrastructure that you own, saying I'm all in API to video, So one of the things that use video is it's just like you said, that you can use to optimize performance, And I'd like you to is that the developer experience is great. you a reaction to this. that to video as well. at the end of the day, the absolute best video infrastructure love the company, love what you guys do. and the insight you get of MUX here on the show, from the AWS Cloud Ecosystem.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Marc Andreessen | PERSON | 0.99+ |
Jon Dahl | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
70% | QUANTITY | 0.99+ |
CBS | ORGANIZATION | 0.99+ |
13 years | QUANTITY | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Jon | PERSON | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
10 engineers | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
three | QUANTITY | 0.99+ |
Vimeo | ORGANIZATION | 0.99+ |
Discovery | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
10 sprints | QUANTITY | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Fox Sports | ORGANIZATION | 0.99+ |
60 people | QUANTITY | 0.99+ |
200 people | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
two things | QUANTITY | 0.99+ |
four months | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Siri | TITLE | 0.99+ |
iOS | TITLE | 0.99+ |
three months | QUANTITY | 0.99+ |
six years ago | DATE | 0.99+ |
EC2 | TITLE | 0.99+ |
first thought | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Bandcamp | ORGANIZATION | 0.99+ |
next decade | DATE | 0.99+ |
five years ago | DATE | 0.99+ |
first product | QUANTITY | 0.99+ |
Data as Code | TITLE | 0.99+ |
MUX | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
five times | QUANTITY | 0.99+ |
Visco | ORGANIZATION | 0.99+ |
Android | TITLE | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
first time | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
Zencoder | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
last year | DATE | 0.98+ |
10 years ago | DATE | 0.98+ |
ORGANIZATION | 0.98+ | |
two choices | QUANTITY | 0.98+ |
Robin Hood | PERSON | 0.97+ |
two years ago | DATE | 0.97+ |
Twilio | ORGANIZATION | 0.97+ |
HubSpot | ORGANIZATION | 0.96+ |
one application | QUANTITY | 0.96+ |
One | QUANTITY | 0.96+ |
Shopify | ORGANIZATION | 0.96+ |
one design | QUANTITY | 0.96+ |
one thing | QUANTITY | 0.96+ |
Stripe | ORGANIZATION | 0.95+ |
first video | QUANTITY | 0.95+ |
second thing | QUANTITY | 0.95+ |
one way | QUANTITY | 0.94+ |
Agile | TITLE | 0.94+ |
one push | QUANTITY | 0.93+ |
first thing | QUANTITY | 0.92+ |