Ed Bailey, Cribl | AWS Startup Showcase S2 E2
(upbeat music) >> Welcome everyone to theCUBE presentation of the AWS Startup Showcase, the theme here is Data as Code. This is season two, episode two of our ongoing series covering the exciting startups from the AWS ecosystem. And talk about the future of data, future of analytics, the future of development and all kind of cool stuff in Multicloud. I'm your host, John Furrier. Today we're joined by Ed Bailey, Senior Technology, Technical Evangelist at Cribl. Thanks for coming on the queue here. >> I thank you for the invitation, thrilled to be here. >> The theme of this session is the observability lake, which I love by the way I'm getting into that in a second. A breach investigation's best friend, which is a great topic. Couple of things, one, I like the breach investigation angle, but I also like this observability lake positioning, because I think this is a teaser of what's coming, more and more data usage where it's actually being applied specifically for things here, it's observability lake. So first, what is an observability lake? Why is it important? >> Why it's important is technology professionals, especially security professionals need data to make decisions. They need data to drive better decisions. They need data to understand, just to achieve understanding. And that means they need everything. They don't need what they can afford to store. They don't need not what vendor is going to let them store. They need everything. And I think as a point of the observability lake, because you couple an observability pipeline with the lake to bring your enterprise of data, to make it accessible for analytics, to be able to use it, to be able to get value from it. And I think that's one of the things that's missing right now in the enterprises. Admins are being forced to make decisions about, okay, we can't afford to keep this, we can afford to keep this, they're missing things. They're missing parts of the picture. And by bringing, able to bring it together, to be able to have your cake and eat it too, where I can get what I need and I can do it affordably is just, I think that's the future, and it just drives value for everyone. >> And it just makes a lot of sense data lake or the earlier concert, throw everything into the lake, and you can figure it out, you can query it, you can take action on it real time, you can stream it. You can do all kinds of things with it. Verb observability is important because it's the most critical thing people are doing right now for all kinds of things from QA, administration, security. So this is where the breach piece comes in. I like that's part of the talk because the breached investigation's best friend, it implies that you got the secret sourced to behind it, right? So, what is the state of the breach investigation today? What's going on with that? Because we know breaches, we see 'em out there, but like, why is this the best friend of a breach investigator? >> Well, and this is unfortunate, but typically there's an enormous delay between breach and detection. And right now, there's an IBM study, I think it's 287 days, but from the actual breach to detection and containment. It's an enormous amount of time. And the key is so when you do detect a breach, you're bringing in your instant, your response team, and typically without an observability lake, without Cribl solutions around observability pipeline, you're going to have an incomplete picture. The incident response team has to first to understand what's the scope of the breach. Is it one server? Is it three servers? Is it all the servers? You got to understand what's been compromised, what's been the end, what's the impact? How did the breach occur in the first place? And they need all the data to stitch that together, and they need it quickly. The more time it takes to get that data, the more time it takes for them to finish their analysis and contain the breach. I mean, hence the, I think about an 87, 90 days to contain a breach. And so by being able to remove the friction, by able to make it easier to achieve these goals, what shouldn't be hard, but making, by removing that friction, you speed up the containment and resolution time. Not to mention for many system administrators, they don't simply have the data because they can afford to store the data in their SIEM. Or they have to go to their backup team to get a restore which can take days. And so that's-- It's just so many obstacles to getting resolution right now. >> I mean, it's just, you're crawling through glass there, right? Because you think about it like just the timing aspect. Where is the data? Where is it stored and relevant and-- >> And do you have it at all? >> And you have it at all, and then, you know, that person doesn't work anywhere, they change jobs. I mean, who is keeping track of all this? You guys have now, this capability where you can come in and do the instrumentation with the observability lake without a lot of change to the environment, which is not the way it used to be. Used to be, buy a tool, build a platform. Cribl has a solution that eases the struggles with the enterprise. What specifically is that pain point? And what do you guys do specifically? >> Well, I'll start out with kind of example, what drew me to Cribl, so back in 2018. I'm running the Splunk team for a very large multinational. The complexity of that, we were dealing with the complexity of the data, the demands we were getting from security and operations were just an enormous issue to overcome. I had vendors come to me all the time that will solve your problems, but that means you got to move to our platform where you have to get rid of Splunk or you have to do this, and I'm losing something. And what Cribl stream brought into, was I could put it between my sources and my destinations and manage my data. And I would have flow control over the data. I don't have to lose anything. I could keep continuing use our existing analytics tools, and that sense of power and control, and I don't have to lose anything. I was like, there's something wrong here. This is too good to be true. And so what we're talking about now in terms of breach investigation, is that with Cribl stream, I can create a clone of my data to an object store. So this is in, this is almost any object store. So it can be AWS, it could be the other vendor object stores. It could be on-prem object stores. And then I can house my data, I can house all my data at the cheapest possible price. So instead of eating up my most expensive storage, I put all my data in my object store. And I only put the data I need for the detections in my SIEM. So if, and hopefully never, but if you do have a breach, lock stream has a wonderful UI that makes a trivial to then pick my data out of my object store and restore it back into my SIEM so that my IR team has to develop a complete picture of how the breach happen. What's the scope? What is their lateral movement and answer those questions. And it just, it takes the friction away. Just like you said, just no more crawling over glass. You're running to your solution. >> You mentioned object store, and you're streaming that in. You talk about the Cribble stream tool. I'm assuming there when you're streaming the pipeline stuff, but is there a schema involved? Is there database challenges? What, how do you guys look at that? I know you're vendor agnostic. I like that piece, you plug in and you leverage all the tools that are out there, Splunk, Datadog, whatever. But how about on the database side, what's the impact there? >> Well, so I'm assuming you're talking about the object store itself, so we don't have to apply the schema. We can fit the data to whichever the object store is. We structure the data so it makes it easier to understand. For example, if I want to see communications from one IP to another IP, we structure it to make it easier to see that and query that, but it is just, we're-- Yeah, it's completely vendor neutral and this makes it so simple, so simple to enable, I think-- >> So no pre-defined schema needed. >> No, not at all. And this, it made it so much easier. I think we enabled this for the enterprise. I think it took us three hours to do, and we were able to then start, I mean, start cutting our retention costs dramatically. >> Yeah, it's great when you get that kind of value, time to value critical and all the skeptics fall to the sides pretty quickly. (chuckles) I got to ask you, well, go ahead. >> So I say, I mean, previously, I would have to go to our backup team. We'd have to open up a ticket, we'd have to have a bridge, then we'd have to go through the process of pulling tape and being, it could take, you know, hours, hours if not days to restore the amount of data we needed. And just it, you know, we were able to run to our goals, and solve business problems instead of focusing on the process steps of getting things done. >> Right, so take me through the architecture here and some customer examples, 'cause you have the Cribble streaming there, observability pipeline. That's key, you mentioned that. >> Yes. >> And then they build out these observability lakes from that. So what is the impact of that? Can you share the customers that are using that solution? What are they seeing for benefits? What are some of the impact? Can you give us some specifics? >> I mean, I can't share with all the exact customer names. I can definitely give you some examples. Like referenceable conference would be TransUnion, so that I came from TransUnion. I was one of the first customers and it solved enormous number of problems for us. Autodesk is another great example. The idea that we're able to automate and data practices. I mean, just for example, what we were talking about with backups. We'd have to, you have to put a lot of time into managing your backups in your inner analytics platforms, you have to. And then you're locked into custom database schemas, you're locked into vendors. And it's also, it's still, it's expensive. So being able to spend a few hours, dramatically cut your costs, but still have the data available, and that's the key. I didn't have to make compromises, 'cause before I was having to say, okay, we're going to keep this, we're going to just drop this and hope for the best. And we just don't, we just didn't have to do that anymore. I think for the same thing for TransUnion and Autodesk, the idea that we're going to lower our cost, we're going to make it easier for our administrators to do their job and so they can spend more time on business value fundamentals, like responding to a breach. You're going to spend time working with your teams, getting value observability solutions and stop spending time on writing custom solutions using to open source tools. 'Cause your engineering time is the most precious asset for any enterprise and you got to focus your engineering time on where it's needed the most. >> Yeah, and they can't underestimate the hassle and cost of ownership, of swapping out pre-existing stuff, just for the sake of having a functionality. I mean that's a big-- >> It's pain and that's a big thing about lock stream is that being vendor neutral is so important. If you want to use the Splunk universal forwarder, that's great. If you want to use Beats, that's awesome. If you want to use Fluentd, even better. If you want to use all three, you can do that too. It's the customer choice and we're saying to people, use what suits your needs. And if you want to write some of your data to elastic, that's great. Some of your data to Splunk, that's even better. Some of it to, pick your pick, fine as well or Exabeam. You have the choices to put together, put your own solutions together and put your data where you need it to be. We're not asking you only in our ecosystem to work with only our partners. We're letting you pick and choose what suits your business. >> Yeah, you know, that's the direction I was just talking about the Amazon folks around their serverless. You know, you can use any tool, you know, you can, they have that core architecture for everything, the S3 and then pick whatever you want to use. SageMaker, just that other thing. This is the new way. That's the way it has to be to be effective. How do you guys handle that? What's been the reaction from customers? Do they like, roll their eyes and doubt you guys, or can you do it? Are they skeptical? How fast can you convert 'em over? (chuckles) >> Right, and that's always the challenge. And that's, I mean, the best part of my day is talking to customers. I love hearing and feedback, what they like, what they don't and what they need. And of course I was skeptical. I didn't believe it when I first saw it because I was like this, you know, because I'm, I was used to being locked in. I was used to having to put a lot of effort, a lot of custom code, like, what do you mean? It's this easy? I believe I did the first, this is 2018, and I did our first demos, like 30 minutes in, and I cut about 1/2 million dollars out of our license in the first 30 minutes in our first demo. And I was stunned because I mean, it's like, this is easy. >> Yeah, I mean-- >> Yeah, exactly. I mean, this is, and then this is the future. And then for example, we needed to bring in so like the security team wanted to bring in a UBA solution that wasn't part of the vendor ecosystem that we were in. And I was like, not a problem. We're going to use log stream. We're going to clone a copy of our data to the UBA solution. We were able to get value from this UBA solution in weeks. What typically is a six month cycle to start getting value. And it just, it was just too easy and the best part of it. And the thing is, it just struck me was my engineers can now spend their time on delivering value instead of integrations and moving data around. >> Yeah, and also we can spend more time preventing breaches. But what's interesting is counterintuitive here is that, if you, as you add more flexibility and choice, you'd think it'd be harder to handle a breach, right? So, now let's go back to the scenario. Now you guys, say an organization has a breach, and they have the observability pipeline, They got the lake in place, your observability lake, take me through the investigation. How easy is it, what happens? How they start it, what goes on? >> So, once your SOC detects a breach, then they bring in the idea. Typically you're going to bring in your incident response team. So what we did, and this is one more way that we removed that friction, we cleaned up the glass, is we delegate to the instant response team, the ability to restore, we call it-- So if Cribl calls it replay, we play data at our object store back into your SIEM. There's a very nice UI that gives you the ability to say, "I want data from this time period, at this time period, I want it to be all the data." Or the ability to filter and say, "I want this, just this IP." For example, if I detected, okay, this IP has been breached then I'm going to pull all the data that mentions this IP and this timeframe, hit a button and it just starts. And then it's going to restore how as fast your IOPS are for your solution. And then it's back in your tool, it's back in your tool. One of the things I also want to mention is we have an amazing enrichment capability. So one of the things that we would do is we would've pipelines so as the data comes out of the object store, it hits the pipeline, and then we enrich it. We hit use GoIP information, perverse and NAS. It gets processed through threat Intel feed. So the data's already enriched and ready for the incident response people to do their job. And so it just, it bamboozle the friction of getting to the point where I can start doing my job. >> You know, at this theme, this episode for this showcase is about Data as Code. And which is, you know, we've been, I've been saying this on theCUBES for since it was being around 13 years ago, that developers are going to be dealing with data like they deal with software code, and you're starting to see, you mentioned enrichment. Where do you see Data as Code going? How relevant in it now, because we really talking about when you add machine learning in here, that has to be enriched, and iterated on too. We're talking about taking things off a branch and putting it back into the core. This is a data discussion, this isn't software, but it sounds the same. >> Right, and this is something that the irony is that, I remember first time saying it to an auditor. I was constantly going with auditors, and that's what I described is I'm going to show you the code that manages the data. This is the data's code that's going to show you how we transform it, how we secure it, where the data goes, how it's enriched. So you can see the whole story, the data life cycle in one place. And that's how we handled our orders. And I think that is enormously, you know, positive because it's so easy to be confused. It's so easy to have complexity to get in the way of progress. And by being able to represent your Data as Code, it's a step forward 'cause the amount of data and the complexity of data, it's not getting simpler, it's getting more complex. So we need to come up with better ways to handle it. >> Now you've been on both sides of the fence. You've been in the trenches as customer, now you're a supplier with Great Solution. What are people doing with this data engineering roles? Because it's not enough data engineering. I mean, 'cause if you say Data as Code, if you believe that to be true and many people do, we do. And you looked at the history of infrastructure risk code that enabled DevOps, AIOps, MLOps, DataOps, it's happening, right? So data stack ops is coming. Obviously security is huge in this. How does that data engineering role evolve? Because it just seems more and more that there's going to be a big push towards an SRE version of data, right? >> I completely agree. I was working with a customer yesterday, and I spent a large part of our conversation talking about implementing development practices for administrators. It's a new role. It's a new way to think of things 'cause traditionally your Splunk or elastic administrators is talking about operating systems and memory and talking about how to use proprietary tools in the vendor, that's just not quite the same. And so we started talking about, you need to have, you need to start getting used to code reviews. Yeah, the idea of getting used to making sure everything has a comment, was one thing I told him was like, you know, if you have a function has to have a comment, just by default, just it has to. Yeah, the standards of how you write things, how you name things all really start to matter. And also you got to start adding, considering your skillset. And this is some mean probably one of the best hire I ever made was I hired a guy with a math degree, because I needed his help to understand how do machine learning works, how to pick the best type of algorithm. And I think this is going to evolve, that you're going to be just away from the gray bearded administrator to some other gray bearded administrator with a math degree. >> It's interesting, it's a step function. You have a data engineer who's got that kind of capabilities, like what the SRA did with infrastructure. The step function of enablement, the value creation from really good data engineering, puts the democratization playback on the table, and changes, >> Thank you very much John. >> And changes that entire landscape. How do you, what's your reaction to that? >> I completely agree 'cause so operational data. So operational security data is the most volatile data in the enterprise. It changes on a whim, you have developers who change things. They don't tell you what happens, vendor doesn't tell you what happened, and so that idea, that life cycle of managing data. So the same types of standards of disciplines that database administrators have done for years is going to have, it has to filter down into the operational areas, and you need tooling that's going to give you the ability to manage that data, manage it in flight in real time, in order to drive detections, in order to drive response. All those business value things we've been talking about. >> So I got to ask you the larger role that you see with observability lakes we were talking before we came on camera live here about how exciting this kind of concept is, and you were attracted to the company because of it. I love the observability lake concept because it puts all that data in one spot, you can manage it. But you got machine learning in AI around the corner that also can help. How has all this changed in the landscape of data security and things because it makes a lot of sense, and I can only see it getting better with machine learning. >> Yeah, definitely does. >> Totally, and so the core issue, and I don't want to say, so when you talk about observability, most people have assumptions around observability is only an operational or an application support process. It's also security process. The idea that you're looking for your unknown, unknowns. This is what keeps security administrators up at night is I'm being attacked by something I don't know about. How do you find those unknown? And that's where your machine learning comes in. And that's where that you have to understand there's so many different types of machine learning algorithms, where the guy that I hired, I mean, had started educating me about the umpteen number of algorithms and how it applies to different data and how you get different value, how you have to test your data constantly. There's no such thing as the magical black box of machine learning that gives you value. You have to implement, but just like the developer practices to keep testing and over and over again, data scientists, for example. >> The best friend of a machine learning algorithm is data, right? You got to keep feeding that data, and when the data sets are baked and secure and vetted, even better, all cool. Had great stuff, great insight. Congratulations Cribl, Great Solution. Love the architecture, love the pipelining of the observability data and streaming that in to a lake. Great stuff. Give a plug for the company where you guys are at, where people can get information. I know you guys got a bunch of live feeds on YouTube, Twitch, here in theCUBE. Where else can people find you? Give the plug. >> Oh, please, please join our slack community, go to cribl.io/community. We have an amazing community. This was another thing that drew me to the company is have a large group of people who are genuinely excited about data, about managing data. If you want to try Cribl out, we have some great tool. Try Cribl tools out. We have a cloud platform, one terabyte up free data. So go to cribl.io/cloud or cribl.cloud, sign up for, you know, just never times out. You're not 30 day, it's forever up to one terabyte. Try out our new products as well, Cribl Edge. And then finally come watch Nick Decker and I, every Thursday, 2:00 PM Eastern. We have live streams on Twitter, LinkedIn and YouTube live. And so just my Twitter handle is EBA 1367. Love to have, love to chat, love to have these conversations. And also, we are hiring. >> All right, good stuff. Great team, great concepts, right? Of course, we're theCUBE here. We got our video lake coming on soon. I think I love this idea of having these video. Hey, videos data too, right? I mean, we've got to keep coming to you. >> I love it, I love videos, it's awesome. It's a great way to communicate, it's a great way to have a conversation. That's the best thing about us, having conversations. I appreciate your time. >> Thank you so much, Ed, for representing Cribl here on the Data as Code. This is season two episode two of the ongoing series covering the hottest, most exciting startups from the AWS ecosystem. Talking about the future data, I'm John Furrier, your host. Thanks for watching. >> Ed: All right, thank you. (slow upbeat music)
SUMMARY :
And talk about the future of I thank you for the I like the breach investigation angle, to be able to have your I like that's part of the talk And the key is so when Where is the data? and do the instrumentation And I only put the data I need I like that piece, you We can fit the data to for the enterprise. I got to ask you, well, go ahead. and being, it could take, you know, hours, the Cribble streaming there, What are some of the impact? and that's the key. just for the sake of You have the choices to put together, This is the new way. I believe I did the first, this is 2018, And the thing is, it just They got the lake in place, the ability to restore, we call it-- and putting it back into the core. is I'm going to show you more that there's going to be And I think this is going to evolve, the value creation from And changes that entire landscape. that's going to give you the So I got to ask you the Totally, and so the core of the observability data and that drew me to the company I think I love this idea That's the best thing about Cribl here on the Data as Code. Ed: All right, thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Ed | PERSON | 0.99+ |
Ed Bailey | PERSON | 0.99+ |
TransUnion | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
2018 | DATE | 0.99+ |
Autodesk | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
three hours | QUANTITY | 0.99+ |
287 days | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
30 day | QUANTITY | 0.99+ |
six month | QUANTITY | 0.99+ |
first demo | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
Cribl | ORGANIZATION | 0.99+ |
first demos | QUANTITY | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
Twitch | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
three servers | QUANTITY | 0.99+ |
Splunk | ORGANIZATION | 0.99+ |
one spot | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
30 minutes | QUANTITY | 0.98+ |
Cribl | PERSON | 0.98+ |
UBA | ORGANIZATION | 0.98+ |
one place | QUANTITY | 0.98+ |
one terabyte | QUANTITY | 0.98+ |
first 30 minutes | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
SRA | ORGANIZATION | 0.97+ |
Today | DATE | 0.97+ |
one more way | QUANTITY | 0.97+ |
about 1/2 million dollars | QUANTITY | 0.96+ |
one server | QUANTITY | 0.96+ |
ORGANIZATION | 0.96+ | |
Beats | ORGANIZATION | 0.96+ |
Nick Decker | PERSON | 0.96+ |
Cribl | TITLE | 0.95+ |
today | DATE | 0.94+ |
Cribl Edge | TITLE | 0.94+ |
first customers | QUANTITY | 0.94+ |
87, 90 days | QUANTITY | 0.93+ |
Thursday, 2:00 PM Eastern | DATE | 0.92+ |
around 13 years ago | DATE | 0.91+ |
first time | QUANTITY | 0.89+ |
three | QUANTITY | 0.87+ |
cribl.io/community | OTHER | 0.87+ |
Intel | ORGANIZATION | 0.87+ |
cribl.cloud | TITLE | 0.86+ |
Datadog | ORGANIZATION | 0.85+ |
S3 | TITLE | 0.84+ |
Cribl stream | TITLE | 0.82+ |
cribl.io/cloud | TITLE | 0.81+ |
Couple of things | QUANTITY | 0.78+ |
two | OTHER | 0.78+ |
episode | QUANTITY | 0.74+ |
AWS Startup Showcase | EVENT | 0.72+ |
lock | TITLE | 0.72+ |
Exabeam | ORGANIZATION | 0.71+ |
Startup Showcase S2 E2 | EVENT | 0.69+ |
season two | QUANTITY | 0.67+ |
Multicloud | TITLE | 0.67+ |
up to one terabyte | QUANTITY | 0.67+ |