Karl Mattson, Noname Security | AWS re:Inforce 2022
>>Hello, Ron. Welcome to AWS reinforce here. Live in Boston, Massachusetts. I'm John feer, host of the cube. We're here at Carl Matson. CSO at no name security. That's right, no name security, no name securities, also a featured partner at season two, episode four of our upcoming eightish startup showcase security themed event happening in the end of August. Look for that at this URL, AWS startups.com, but we're here at reinforc Carl. Thanks for joining me today. Good to see >>You. Thank you for having us, John. >>So this show security, it's not as packed as the eight of us summit was in New York. That just happened two weeks ago, 19,000 people here, more focused crowd. Lot at stake operations are under pressure. The security teams are under a lot of pressure as apps drive more and more cloud native goodness. As we say, the gen outta the bottle, people want more cloud native apps. Absolutely. That's put a lot of pressure on the ops teams and the security teams. That's the core theme here. How do you see it happening? How do you see this unfolding? Do you agree with that? And how would you describe today's event? >>Well, I think you're, you're spot on. I think the, the future of it is increasingly becoming the story of developers and APIs becoming the hero, the hero of digital transformation, the hero of public cloud adoption. And so this is really becoming much more of a developer-centric discussion about where we're moving our applications and, and where they're hosted, but also how they're designed. And so there's a lot of energy around that right now around focusing security capabilities that really appeal to the sensibilities and the needs of, of modern applications. >>I want to get to know name security a second, and let you explain what you guys do. Then I'll have a few good questions for you to kind of unpack that. But the thing about the structural change that's happened with cloud computing is kind of, and kind of in the past now, DevOps cloud scale, large scale data, the rise of the super cloud companies like snowflake capital, one there's examples of companies that don't even have CapEx investments building on the cloud. And in a way, our, the success of DevOps has created another sea of problems and opportunities that is more complexity as the benefits of DevOps and open source, continue to rise, agile applications that have value can be quantified. There's no doubt with the pandemic that's value there. Yeah. Now you have the collateral damage of success, a new opportunity to abstract away, more complexity to go to the next level. Yep. This is a big industry thing. What are the key opportunities and areas as this new environment, cuz that's the structural change happening now? Yep. What's the key dynamics right now. That's driving this new innovation and what are some of those problem areas that are gonna be abstracted away that you see? >>Well, the, the first thing I I'd suggest is is to, to lean into those structural changes and take advantage of them where they become an advantage for governance, security risk. A perfect example is automation. So what we have in microservices, applications and cloud infrastructures and new workloads like snowflake is we have workloads that want to talk, they want to be interoperated with. And because of that, we can develop new capabilities that take advantage of those of those capabilities. And, and so we want to have on our, on our security teams in particular is we wanna have the talent and the tools that are leaning into and capitalizing on exactly those strengths of, of the underlying capabilities that you're securing rather than to counter that trend, that the, the security professional needs to get ahead of it and, and be a part of that discussion with the developers and the infrastructure teams. >>And, and again, the tructure exchange could kill you too as well. I mean, some benefits, you know, data's the new oil, but end of the day it could be a problematic thing. Sure. All right. So let's get that. No names talk about the company. What you guys do, you have an interesting approach, heavily funded, good success, good buzz. What's going on with the company? Give the quick overview. >>Well, we're a company that's just under three years old and, and what APIs go back, of course, a, a decade or more. We've all been using APIs for a long time, but what's really shifted over the last couple of years is the, is the transition of, of applications and especially business critical processes to now writing on top of public facing APIs where API used to be the behind the scenes interconnection between systems. Now those APIs are exposed to their public facing. And so what we focus on as a company is looking at that API as a, as a software endpoint, just like any other endpoint in our environments that we're historically used to. That's an endpoint that needs full life cycle protection. It needs to be designed well secure coding standards for, for APIs and tested. Well, it also has to be deployed into production configured well and operated well. And when there's a misuse or an attack in progress, we have to be able to protect and identify the, the risks to that API in production. So when you add that up, we're looking at a full life cycle view of the API, and it's really it's about time because the API is not new yet. We're just starting to now to apply like actual discipline and, and practices that help keep that API secure. >>Yeah. It's interesting. It's like what I was saying earlier. They're not going anywhere. They're not going, they're the underpinning, the underlying benefit of cloud yes. Cloud native. So it's just more, more operational stability, scale growth. What are some of the challenges that, that are there and what do you guys do particularly to solve it? You're protecting it. Are you scaling it? What specifically are you guys addressing? >>But sure. So I think API security, even as a, as a discipline is not new, but I think the, the, the traditional look at API security looks only at, at the quality of the source code. Certainly quality of the source code of API is, is sort of step one. But what we see in, in practices is most of the publicly known API compromises, they weren't because of bad source code that they because of network misconfiguration or the misapplication of policy during runtime. So a great example of that would be developer designs, an API designs. It in such a way that Gar that, that enforces authentication to be well designed and strong. And then in production, those authentication policies are not applied at a gateway. So what we add to the, we add to the, to the conversation on API security is helping fill all those little gaps from design and testing through production. So we can see all of the moving parts in the, the context of the API to see how it can be exploited and, and how we can reduce risk in independent of. >>So this is really about hardening the infrastructure yep. Around cuz the developer did their job in that example. Yep. So academic API is well formed working, but something didn't happen on the network or gateway box or app, you know, some sort of network configuration or middleware configuration. >>Absolutely. So in our, in our platform, we, we essentially have sort of three functional areas. There's API code testing, and then we call next is posture management posture. Management's a real thing. If we're talking about a laptop we're talking about, is it up to date with patches? Is it configured? Well, is it secure network connectivity? The same is true with APIs. They have to be managed and cared for by somebody who's looking at their posture on the network. And then of course then there's threat defense and run time protection. So that posture management piece, that's really a new entrant into the discussion on API security. And that's really where we started as a company is focusing on that sort of acute gap of information, >>Posture, protection, >>Posture, and protection. Absolutely >>Define that. What does that, what does posture posture protection mean? How would you define that? >>Sure. I think it's a, it's identifying the inherent risk exposure of an API. Great example of that would be an API that is addressable by internal systems and external systems at the same time. Almost always. That is, that is an error. It's a mistake that's been made so well by, by identifying that misconfiguration of posture, then we can, we can protect that API by restricting the internet connectivity externally. That's just a great example of posture. We see almost every organization has that and it's never intended. >>Great, great, great call out. Thanks for sharing. All right, so I'm a customer. Yep. Okay. Look at, Hey, I already got an app firewall API gateway. Why do I need another tool? >>Well, first of all, web application firewalls are sort of essential parts of a security ecosystem. An API management gateway is usually the brain of an API economy. What we do is we, we augment those platforms with what they don't do well and also when they're not used. So for example, in, in any environment, we, we aspire to have all of our applications or APIs protected by web application firewall. First question is, are they even behind the web? Are they behind the w at all? We're gonna find that the WAFF doesn't know if it's not protecting something. And then secondary, there are attack types of business logic in particular of like authentication policy that a WAFF is not gonna be able to see. So the WAFF and the API management plan, those are the key control points and we can help make those better. >>You know what I think is cool, Carl, as you're bringing up a point that we're seeing here and we've seen before, but now it's kind of coming at the visibility. And it was mentioned in the keynote by one of the presenters, Kurt, I think it was who runs the platform. This idea of reasoning is coming into security. So the idea of knowing the topology know that there's dynamic stuff going on. I mean, topes aren't static anymore. Yep. And now you have more microservices. Yep. More APIs being turned on and off this runtime is interesting. So you starting to see this holistic view of, Hey, the secret sauce is you gotta be smarter. Yep. And that's either machine learning or AI. So, so how does that relate to what you guys do? Does it, cuz it sounds like you've got something of that going on with the product. Is that fair or yeah. >>Yeah, absolutely. So we, yeah, we talked about posture, so that's, that's really the inherent quality or secure posture of a, of an API. And now let's talk about sending traffic through that API, the request and response. When we're talking about organizations that have more APIs than they have people, employees, or, or tens of thousands, we're seeing in some customers, the only way to identify anomalous traffic is through machine learning. So we apply a machine learning model to each and every API in independently for itself because we wanna learn how that API is supposed to be behave. Where is it supposed to be talking? What kind of data is it supposed to be trafficking in, in, in all its facets. So we can model that activity and then identify the anomaly where there's a misuse, there's an attacker event. There's an, an insider employee is doing something with that API that's different. And that's really key with APIs is, is that no, a no two APIs are alike. Yeah. They really do have to be modeled individually rather than I can't share my, my threat signatures for my API, with your organization, cuz your APIs are different. And so we have to have that machine learning approach in order to really identify that >>Anomaly and watch the credentials, permissions. Absolutely all those things. All right. Take me through the life cycle of an API. There's pre-production postproduction what do I need to know about those two, those two areas with respect to what you guys do? >>Sure. So the pre-production activities are really putting in the hands of a developer or an APSEC team. The ability to test that API during its development and, and source code testing is one piece, but also in pre-production are we modeling production variables enough to know what's gonna happen when I move it into production? So it's one thing to have secure source code, of course, but then it's also, do we know how that API's gonna interact with the world once it's sort of set free? So the testing capabilities early life cycle is really how we de-risk in the long term, but we all have API ecosystems that are existing. And so in production we're applying the, all of those same testing of posture and configuration issues in runtime, but really what it, it may sound cliche to say, we wanna shift security left, but in APIs that's, that's a hundred percent true. We want to keep moving our, our issue detection to the earliest possible point in the development of an API. And that gives us the greatest return in the API, which is what we're all looking for is to capitalize on it as an agent of transformation. >>All right, let's take the customer perspective. I'm the customer, Carl, Carl, why do I need you? And how are you different from the competition? And if I like it, how do I get started? >>Sure. So the, the, the first thing that we differentiate selves from the customer is, or from our competitors is really looking at the API as an entire life cycle of activities. So whether it's from the documentation and the design and the secure source code testing that we can provide, you know, pre-development, or pre-deployment through production posture, through runtime, the differentiator really for us is being a one-stop shop for an entire API security program. And that's very important. And as that one stop shop, the, the great thing about that when having a conversation with a customer is not every customer's at the same point in their journey. And so if, if a customer discussion really focuses on their perhaps lack of confidence in their code testing, maybe somebody else has a lack of confidence in their runtime detection. We can say yes to those conversations, deliver value, and then consider other things that we can do with that customer along a whole continuum of life cycle. And so it allows us to have a customer conversation where we don't need to say, no, we don't do that. If it's an API, the answer is, yes, we do do that. And that's really where we, you know, we have an advantage, I think, in, in looking at this space and, and, and being able to talk with pretty much any customer in any vertical and having a, having a solution that, that gives them something value right away. >>And how do I get started? I like it. You sold me on, on operationalizing it. I like the one stop shop. I, my APIs are super important. I know that could be potential exposure, maybe access, and then lateral movement to a workload, all kinds of stuff could happen. Sure. How do I get started? What do I do to solve >>This? Well, no name, security.com. Of course we, we have, you know, most customers do sandboxing POVs as part of a trial period for us, especially with, you know, being here at AWS is wonderful because these are customers who's with whom we can integrate with. In a matter of minutes, we're talking about literally updating an IAM role. Permission is the complexity of implementation because cloud friendly workloads really allow us to, to do proofs of concept and value in a matter of minutes to, to achieve that value. So whether it's a, a dedicated sandbox for one customer, whether it's a full blown POC for a complicated organization, you know, whether it's here at AWS conference or, or, or Nona security.com, we would love to do a, do a, like a free demo test drive and assessment. >>Awesome. And now you guys are part of the elite alumni of our startup showcase yep. Where we feature the hot startups, obviously it's the security focuses episodes about security. You guys have been recognized by the industry and AWS as, you know, making it, making it happen. What specifically is your relationship with AWS? Are you guys doing stuff together? Cuz they're, they're clearly integrating with their partners. Yeah. I mean, they're going to companies and saying, Hey, you know what, the more we're integrated, the better security everyone gets, what are you doing with Amazon? Can you share any tidbits? You don't have to share any confidential information, but can you give us a little taste of the relationship? >>Well, so I think we have the best case scenario with our relationship with AWSs is, is as a, as a very, very small company. Most of our first customers were AWS customers. And so to develop the, the, the initial integrations with AWS, what we were able to do is have our customers, oftentimes, which are large public corporations, go to AWS and say, we need, we need that company to be through your marketplace. We need you to be a partner. And so that partnership with, with AWS has really grown from, you know, gone from zero to 60 to, you know, miles per hour in a very short period of time. And now being part of the startup program, we have a variety of ways that a customer can, can work with us from a direct purchase through the APS marketplace, through channel partners and, and VA, we really have that footprint now in AWS because our customers are there and, and they brought our customers to AWS with us. >>It's it nice. The customers pulls you to AWS. Yes. Its pulls you more customers. Yep. You get kind of intermingled there, provide the value. And certainly they got, they, they hyperscale so >>Well, that creates depth of the relationship. So for example, as AWS itself is evolving and changing new services become available. We are a part of that inner circle. So to speak, to know that we can make sure that our technology is sort of calibrated in advance of that service offering, going out to the rest of the world. And so it's a really great vantage point to be in as a startup. >>Well, Carl, the CISO for no name security, you're here on the ground. You partner with AWS. What do you think of the show this year? What's the theme. What's the top story one or two stories that you think of the most important stories that people should know about happening here in the security world? >>Well, I don't think it's any surprise that almost every booth in the, in the exhibit hall has the words cloud native associated with it. But I also think that's, that's, that's the best thing about it, which is we're seeing companies and, and I think no name is, is a part of that trend who have designed capabilities and technologies to take advantage and lean into what the cloud has to offer rather than compensating. For example, five years ago, when we were all maybe wondering, will the cloud ever be as secure as my own data center, those days are over. And we now have companies that have built highly sophisticated capabilities here in the exhibit hall that are remarkably better improvements in, in securing the cloud applications in, in our environments. So it's a, it's a real win for the cloud. It's something of a victory lap. If, if you hadn't already been there, you should be there at this point. >>Yeah. And the structural change is happening now that's clear and I'd love to get your reaction if you agree with it, is that the ops on security teams are now being pulled up to the level that the developers are succeeding at, meaning that they have to be in the boat together. Yes. >>Oh, lines of, of reporting responsibility are becoming less and less meaningful and that's a good thing. So we're having just in many conversations with developers or API management center of excellence teams to cloud infrastructure teams as we are security teams. And that's a good thing because we're finally starting to have some degree of conversions around where our interests lie in securing cloud assets. >>So developers ops security all in the boat together, sync absolutely together or win together. >>We, we, we win together, but we don't win on day one. We have to practice like we as organizations we have to, we have to rethink our, we have to rethink our tech stack. Yeah. But we also have to, you have to rethink our organizational models, our processes to get there, to get >>That in, keep the straining boat in low waters. Carl, thanks for coming on. No name security. Why the name just curious, no name. I love that name. Cause the restaurant here in Boston that used to be of all the people that know that. No name security, why no name? >>Well, it was sort of accidental at, in the, in the company's first few weeks, the there's an advisory board of CISOs who provides feedback on, on seed to seed companies on their, on their concept of, of where they're gonna build platforms. And, and so in absence of a name, the founders and the original investor filled out a form, putting no name as the name of this company that was about to develop an API security solution. Well, amongst this board of CSOs, basically there was unanimous feedback that the, what they needed to do was keep the name. If nothing else, keep the name, no name, it's a brilliant name. And that was very much accidental, really just a circumstance of not having picked one, but you know, a few weeks passed and all of a sudden they were locked in because sort of by popular vote, no name was, >>Was formed. Yeah. And now the legacy, the origination story is now known here on the cube call. Thanks for coming on. Really appreciate it. Thank you, John. Okay. We're here. Live on the floor show floor of AWS reinforced in Boston, Massachusetts. I'm John with Dave ALO. Who's out and about getting the stories in the trenches in the analyst meeting. He'll be right back with me shortly day tuned for more cube coverage. After this short break.
SUMMARY :
I'm John feer, host of the cube. And how would you describe today's event? developers and APIs becoming the hero, the hero of digital transformation, the hero of public cloud and kind of in the past now, DevOps cloud scale, large scale data, And because of that, we can develop new capabilities that take advantage of those of those capabilities. And, and again, the tructure exchange could kill you too as well. the risks to that API in production. What are some of the challenges that, that are there and what do you guys do particularly to So a great example of that would be developer designs, happen on the network or gateway box or app, you know, some sort of network configuration that's really a new entrant into the discussion on API security. Posture, and protection. How would you define that? systems and external systems at the same time. All right, so I'm a customer. So the WAFF and the API management plan, those are the key control points and So, so how does that relate to what you guys do? And so we have to have that machine learning approach in order to those two areas with respect to what you guys do? So it's one thing to have secure source code, of course, but then it's also, do we know how that API's And how are you different from the competition? and the design and the secure source code testing that we can provide, you know, pre-development, I like the one stop shop. the complexity of implementation because cloud friendly workloads really allow us to, to do proofs of concept and You guys have been recognized by the industry and AWS as, you know, And so that partnership with, with AWS has really grown from, you know, The customers pulls you to AWS. Well, that creates depth of the relationship. What's the top story one or two stories that you think of the most important stories capabilities here in the exhibit hall that are remarkably better improvements in, that the developers are succeeding at, meaning that they have to be in the boat together. API management center of excellence teams to cloud infrastructure teams as we are security teams. So developers ops security all in the boat together, sync absolutely together But we also have to, you have to rethink our organizational models, our processes to get there, Why the name just curious, no name. and so in absence of a name, the founders and the original investor filled Who's out and about getting the stories in the trenches
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
AWSs | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Carl | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Ron | PERSON | 0.99+ |
Karl Mattson | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Boston | LOCATION | 0.99+ |
Kurt | PERSON | 0.99+ |
19,000 people | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
today | DATE | 0.99+ |
First question | QUANTITY | 0.99+ |
DevOps | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
tens of thousands | QUANTITY | 0.99+ |
Dave ALO | PERSON | 0.99+ |
one piece | QUANTITY | 0.99+ |
five years ago | DATE | 0.99+ |
two areas | QUANTITY | 0.99+ |
two stories | QUANTITY | 0.99+ |
60 | QUANTITY | 0.98+ |
two weeks ago | DATE | 0.98+ |
zero | QUANTITY | 0.98+ |
eightish | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
end of August | DATE | 0.97+ |
first customers | QUANTITY | 0.97+ |
security.com | OTHER | 0.96+ |
eight | QUANTITY | 0.96+ |
John feer | PERSON | 0.95+ |
a decade | QUANTITY | 0.94+ |
Nona security.com | ORGANIZATION | 0.94+ |
one customer | QUANTITY | 0.93+ |
day one | QUANTITY | 0.93+ |
CapEx | ORGANIZATION | 0.93+ |
each | QUANTITY | 0.93+ |
first thing | QUANTITY | 0.92+ |
WAFF | TITLE | 0.91+ |
one thing | QUANTITY | 0.91+ |
one | QUANTITY | 0.9+ |
under three years old | QUANTITY | 0.9+ |
first few weeks | QUANTITY | 0.89+ |
hundred percent | QUANTITY | 0.89+ |
weeks | QUANTITY | 0.88+ |
three functional | QUANTITY | 0.84+ |
APS | ORGANIZATION | 0.82+ |
pandemic | EVENT | 0.82+ |
one stop | QUANTITY | 0.76+ |
one- | QUANTITY | 0.74+ |
second | QUANTITY | 0.71+ |
years | DATE | 0.69+ |
last couple | DATE | 0.69+ |
step one | QUANTITY | 0.66+ |
CISOs | ORGANIZATION | 0.64+ |
episode four | OTHER | 0.64+ |
2022 | DATE | 0.63+ |
APSEC | ORGANIZATION | 0.62+ |
season two | OTHER | 0.6+ |
Carl Matson | ORGANIZATION | 0.57+ |
every | QUANTITY | 0.54+ |
startups.com | OTHER | 0.53+ |
IAM | TITLE | 0.46+ |
Greg Rokita, Edmunds.com & Joel Minnick, Databricks | AWS re:Invent 2021
>>We'll come back to the cubes coverage of AWS reinvent 2021, the industry's most important hybrid event. Very few hybrid events, of course, in the last two years. And the cube is excited to be here. Uh, this is our ninth year covering AWS reinvent this the 10th reinvent we're here with Joel Minnick, who the vice president of product and partner marketing at smoking hot company, Databricks and Greg Rokita, who is executive director of technology at Edmonds. If you're buying a car or leasing a car, you gotta go to Edmund's. We're gonna talk about busting data silos, guys. Great to see you again. >>Welcome. Welcome. Glad to be here. >>All right. So Joel, what the heck is a lake house? This is all over the place. Everybody's talking about lake house. What is it? >>And it did well in a nutshell, a Lakehouse is the ability to have one unified platform to handle all of your traditional analytics workloads. So your BI and reporting Trisha, the lake, the workloads that you would have for your data warehouse on the same platform as the workloads that you would have for data science and machine learning. And so if you think about kind of the way that, uh, most organizations have built their infrastructure in the cloud today, what we have is generally customers will land all their data in a data lake and a data lake is fantastic because it's low cost, it's open. It's able to handle lots of different kinds of data. Um, but the challenges that data lakes have is that they don't necessarily scale very well. It's very hard to govern data in a data lake house. It's very hard to manage that data in a data lake, sorry, in a, in a data lake. >>And so what happens is that customers then move the data out of a data lake into downstream systems and what they tend to move it into our data warehouses to handle those traditional reporting kinds of workloads that they have. And they do that because data warehouses are really great at being able to have really great scale, have really great performance. The challenge though, is that data warehouses really only work for structured data. And regardless of what kind of data warehouse you adopt, all data warehouse and platforms today are built on some kind of proprietary format. So once you've put that data into the data warehouse, that's, that is kind of what you're locked into. The promise of the data lake house was to say, look, what if we could strip away all of that complexity and having to move data back and forth between all these different systems and keep the data exactly where it is today and where it is today is in the data lake. >>And then being able to apply a transaction layer on top of that. And the Databricks case, we do that through a technology and open source technology called data lake, or sorry, Delta lake. And what Delta lake allows us to do is when you need it, apply that performance, that reliability, that quality, that scale that you would expect out of a data warehouse directly on your data lake. And if I can do that, then what I'm able to do now is operate from one single source of truth that handles all of my analytics workloads, both my traditional analytics workloads and my data science and machine learning workloads, and being able to have all of those workloads on one common platform. It means that now not only do I get much, much more simple in the way that my infrastructure works and therefore able to operate at much lower costs, able to get things to production much, much faster. >>Um, but I'm also able to now to leverage open source in a much bigger way being that lake house is inherently built on an open platform. Okay. So I'm no longer locked into any kind of data format. And finally, probably one of the most, uh, lasting benefits of a lake house is that all the roles that have to take that have to touch my data for my data engineers, to my data analyst, my data scientists, they're all working on the same data, which means that collaboration that has to happen to go answer really hard problems with data. I'm now able to do much, much more easy because those silos that traditionally exist inside of my environment no longer have to be there. And so Lakehouse is that is the promise to have one single source of truth, one unified platform for all of my data. Okay, >>Great. Thank you for that very cogent description of what a lake house is now. Let's I want to hear from the customer to see, okay, this is what he just said. True. So actually, let me ask you this, Greg, because the other problem that you, you didn't mention about the data lake is that with no schema on, right, it gets messy and Databricks, I think, correct me if I'm wrong, has begun to solve that problem, right? Through series of tooling and AI. That's what Delta liked us. It's a man, like it's a managed service. Everybody thought you were going to be like the cloud era of spark and Brittany Britain, a brilliant move to create a managed service. And it's worked great. Now everybody has a managed service, but so can you paint a picture at Edmonds as to what you're doing with, maybe take us through your journey the early days of a dupe, a data lake. Oh, that sounds good. Throw it in there, paint a picture as to how you guys are using data and then tie it into what y'all just said. >>As Joel said, that they'll the, it simplifies the architecture quite a bit. Um, in a modern enterprise, you have to deal with a variety of different data sources, structured semi-structured and unstructured in the form of images and videos. And with Delta lake and built a lake, you can have one system that handles all those data sources. So what that does is that basically removes the issue of multiple systems that you have to administer. It lowers the cost, and it provides consistency. If you have multiple systems that deal with data, you always arise as the issue as to which data has to be loaded into which system. And then you have issues with consistency. Once you have issues with consistency, business users, as analysts will stop trusting your data. So that was very critical for us to unify the system of data handling in the one place. >>Additionally, you have a massive scalability. So, um, I went to the talk with from apple saying that, you know, he can process two years worth of data. Instead of just two days in an Edmonds, we have this use case of backfilling the data. So often we changed the logic and went to new. We need to reprocess massive amounts of data with the lake house. We can reprocess months worth of data in, in a matter of minutes or hours. And additionally at the data lake houses based on open, uh, open standards, like parquet that allowed us, allowed us to basically hope open source and third-party tools on top of the Delta lake house. Um, for example, a Mattson, we use a Matson for data discovery, and finally, uh, the lake house approach allows us for different skillsets of people to work on the same source data. We have analysts, we have, uh, data engineers, we have statisticians and data scientists using their own programming languages, but working on the same core of data sets without worrying about duplicating data and consistency issues between the teams. >>So what, what is, what are the primary use cases where you're using house Lakehouse Delta? >>So, um, we work, uh, we have several use cases, one of them more interesting and important use cases as vehicle pricing, you have used the Edmonds. So, you know, you go to our website and you use it to research vehicles, but it turns out that pricing and knowing whether you're getting a good or bad deal is critical for our, uh, for our business. So with the lake house, we were able to develop a data pipeline that ingests the transactions, curates the transactions, cleans them, and then feeds that curated a curated feed into the machine learning model that is also deployed on the lake house. So you have one system that handles this huge complexity. And, um, as you know, it's very hard to find unicorns that know all those technologies, but because we have flexibility of using Scala, Java, uh, Python and SQL, we have different people working on different parts of that pipeline on the same system and on the same data. So, um, having Lakehouse really enabled us to be very agile and allowed us to deploy new sources easily when we, when they arrived and fine tune the model to decrease the error rates for the price prediction. So that process is ongoing and it's, it's a very agile process that kind of takes advantage of the, of the different skill sets of different people on one system. >>Because you know, you guys democratized by car buying, well, at least the data around car buying because as a consumer now, you know, I know what they're paying and I can go in of course, but they changed their algorithms as well. I mean, the, the dealers got really smart and then they got kickbacks from the manufacturer. So you had to get smarter. So it's, it's, it's a moving target, I guess. >>Great. The pricing is actually very complex. Like I, I don't have time to explain it to you, but knowing, especially in this crazy market inflationary market where used car prices are like 38% higher year over year, and new car prices are like 10% higher and they're changing rapidly. So having very responsive pricing model is, is extremely critical. Uh, you, I don't know if you're familiar with Zillow. I mean, they almost went out of business because they mispriced their, uh, their houses. So, so if you own their stock, you probably under shorthand of it, but, you know, >>No, but it's true because I, my lease came up in the middle of the pandemic and I went to Edmonds, say, what's this car worth? It was worth like $7,000. More than that. Then the buyout costs the residual value. I said, I'm taking it, can't pass up that deal. And so you have to be flexible. You're saying the premises though, that open source technology and Delta lake and lake house enabled that flexible. >>Yes, we are able to ingest new transactions daily recalculate our model within less than an hour and deploy the new model with new pricing, you know, almost real time. So, uh, in this environment, it's very critical that you kind of keep up to date and ingest their latest transactions as they prices change and recalculate your model that predicts the future prices. >>Because the business lines inside of Edmond interact with the data teams, you mentioned data engineers, data scientists, analysts, how do the business people get access to their data? >>Originally, we only had a core team that was using Lakehouse, but because the usage was so powerful and easy, we were able to democratize it across our units. So other teams within software engineering picked it up and then analysts picked it up. And then even business users started using the dashboarding and seeing, you know, how the price has changed over time and seeing other, other metrics within the, >>What did that do for data quality? Because I feel like if I'm a business person, I might have context of the data that an analyst might not have. If they're part of a team that's servicing all these lines of business, did you find that the data quality, the collaboration affected data? >>Th the biggest thing for us was the fact that we don't have multiple systems now. So you don't have to load the data. Whenever you have to load the data from one system to another, there is always a lag. There's always a delay. There is always a problematic job that didn't do the copy correctly. And the quality is uncertain. You don't know which system tells you the truth. Now we just have one layer of data. Whether you do reports, whether you're data processing or whether you do modeling, they all read the same data. And the second thing is that with the dashboarding capabilities, people that were not very technical that before we could only use Tableau and Tableau is not the easiest thing to use as if you're not technical. Now they can use it. So anyone can see how our pricing data looks, whether you're an executive, whether you're an analyst or a casual business users, >>But Hey, so many questions, you guys are gonna have to combat. I'm gonna run out of time, but you now allow a consumer to buy a car directly. Yes. Right? So that's a new service that you launched. I presume that required new data. We give, we >>Give consumers offers. Yes. And, and that offer you >>Offered to buy my league. >>Exactly. And that offer leverages the pricing that we develop on top of the lake house. So the most important thing is accurately giving you a very good offer price, right? So if we give you a price, that's not so good. You're going to go somewhere else. If we give you price, that's too high, we're going to go bankrupt like Zillow debt, right. >>It took to enable that you're working off the same dataset. Yes. You're going to have to spin up a, did you have to inject new data? Was there a new data source that we're working on? >>Once we curate the data sources and once we clean it, we see the directly to the model. And all of those components are running on the lake house, whether you're curating the data, cleaning it or running the model. The nice thing about lake house is that machine learning is the first class citizen. If you use something like snowflake, I'm not going to slam snowflake here, but you >>Have two different use case. You have >>To, you have to load it into a different system later. You have to load it into a different system. So like good luck doing machine learning on snowflake. Right. >>Whereas, whereas Databricks, that's kind of your raison d'etre >>So what are your, your, your data engineer? I feel like I should be a salesman or something. Yeah. I'm not, I'm not saying that. Just, just because, you know, I was told to, like, I'm saying it because of that's our use case, >>Your use case. So question for each of you, what, what business results did you see when you went to kind of pre lake house, post lake house? What are the, any metrics you can share? And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, what can you tell us? Well, >>Uh, before their lake house, we had two different systems. We had one for processing, which was still data breaks. And the second one for serving and we iterated over Nateeza or Redshift, but we figured that maintaining two different system and loading data from one to the other was a huge overhead administration security costs. Um, the fact that you had to consistency issues. So the fact that you can have one system, um, with, uh, centralized data, solves all those issues. You have to have one security mechanism, one administrative mechanism, and you don't have to load the data from one system to the other. You don't have to make compromises. >>It's scale is not a problem because of the cloud, >>Because you can spend clusters at will for different use cases. So your clusters are independent. You have processing clusters that are not affecting your serving clusters. So, um, in the past, if you were running a serving, say on Nateeza or Redshift, if you were doing heavy processing, your reports would be affected, but now all those clusters are separated. So >>Consumer data consumer can take that data from the producer independ >>Using its own cluster. Okay. >>Yeah. I'll give you the final word, Joel. I know it's been, I said, you guys got to come back. This is what have you seen broadly? >>Yeah. Well, I mean, I think Greg's point about scale. It's an interesting one. So if you look at cross the entire Databricks platform, the platform is launching 9 million VMs every day. Um, and we're in total processing over nine exabytes a month. So in terms of just how much data the platform is able to flow through it, uh, and still maintain a extremely high performance is, is bar none out there. And then in terms of, if you look at just kind of the macro environment of what's happening out there, you know, I think what's been most exciting to watch or what customers are experiencing traditionally or, uh, on the traditional data warehouse and kinds of workloads, because I think that's where the promise of lake house really comes into its own is saying, yes, I can run these traditional data warehousing workloads that require a high concurrency high scale, high performance directly on my data lake. >>And, uh, I think probably the two most salient data points to raise up there is, uh, just last month, Databricks announced it's set the world record for the, for the, uh, TPC D S 100 terabyte benchmark. So that is a place where Databricks at the lake house architecture, that benchmark is built to measure data warehouse performance and the lake house beat data warehouse and sat their own game in terms of overall performance. And then what's that spends from a price performance standpoint, it's customers on Databricks right now are able to enjoy that level of performance at 12 X better price performance than what cloud data warehouses provide. So not only are we jumping on this extremely high scale and performance, but we're able to do it much, much more efficiently. >>We're gonna need a whole nother section second segment to talk about benchmarking that guys. Thanks so much, really interesting session and thank you and best of luck to both join the show. Thank you for having us. Very welcome. Okay. Keep it right there. Everybody you're watching the cube, the leader in high-tech coverage at AWS reinvent 2021
SUMMARY :
Great to see you again. Glad to be here. This is all over the place. and reporting Trisha, the lake, the workloads that you would have for your data warehouse on And regardless of what kind of data warehouse you adopt, And what Delta lake allows us to do is when you need it, that all the roles that have to take that have to touch my data for as to how you guys are using data and then tie it into what y'all just said. And with Delta lake and built a lake, you can have one system that handles all Additionally, you have a massive scalability. So you have one system that So you had to get smarter. So, so if you own their stock, And so you have to be flexible. less than an hour and deploy the new model with new pricing, you know, you know, how the price has changed over time and seeing other, other metrics within the, lines of business, did you find that the data quality, the collaboration affected data? So you don't have to load But Hey, so many questions, you guys are gonna have to combat. So the most important thing is accurately giving you a very good offer did you have to inject new data? I'm not going to slam snowflake here, but you You have To, you have to load it into a different system later. Just, just because, you know, I was told to, And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, So the fact that you can have one system, So, um, in the past, if you were running a serving, Okay. This is what have you seen broadly? So if you look at cross the entire So not only are we jumping on this extremely high scale and performance, but we're able to do it much, Thanks so much, really interesting session and thank you and best of luck to both join the show.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joel | PERSON | 0.99+ |
Greg | PERSON | 0.99+ |
Joel Minnick | PERSON | 0.99+ |
$7,000 | QUANTITY | 0.99+ |
Greg Rokita | PERSON | 0.99+ |
38% | QUANTITY | 0.99+ |
two days | QUANTITY | 0.99+ |
10% | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two years | QUANTITY | 0.99+ |
one system | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Scala | TITLE | 0.99+ |
apple | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
SQL | TITLE | 0.99+ |
ninth year | QUANTITY | 0.99+ |
last month | DATE | 0.99+ |
lake house | ORGANIZATION | 0.99+ |
two different systems | QUANTITY | 0.99+ |
Tableau | TITLE | 0.99+ |
2021 | DATE | 0.99+ |
9 million VMs | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
less than an hour | QUANTITY | 0.99+ |
Lakehouse | ORGANIZATION | 0.98+ |
12 X | QUANTITY | 0.98+ |
Delta | ORGANIZATION | 0.98+ |
Delta lake house | ORGANIZATION | 0.98+ |
one layer | QUANTITY | 0.98+ |
one common platform | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
AWS | ORGANIZATION | 0.97+ |
Zillow | ORGANIZATION | 0.97+ |
Brittany Britain | PERSON | 0.97+ |
Edmunds.com | ORGANIZATION | 0.97+ |
two different system | QUANTITY | 0.97+ |
Edmonds | ORGANIZATION | 0.97+ |
over nine exabytes a month | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
Lakehouse Delta | ORGANIZATION | 0.96+ |
Delta lake | ORGANIZATION | 0.95+ |
Trisha | PERSON | 0.95+ |
data lake | ORGANIZATION | 0.94+ |
Mattson | ORGANIZATION | 0.92+ |
second segment | QUANTITY | 0.92+ |
each | QUANTITY | 0.92+ |
Matson | ORGANIZATION | 0.91+ |
two most salient data points | QUANTITY | 0.9+ |
Edmonds | LOCATION | 0.89+ |
100 terabyte | QUANTITY | 0.87+ |
one single source | QUANTITY | 0.86+ |
first class | QUANTITY | 0.85+ |
Nateeza | TITLE | 0.85+ |
one security | QUANTITY | 0.85+ |
Redshift | TITLE | 0.84+ |