Shia Liu, Scalyr | Scalyr Innovation Day 2019

>> from San Matteo. It's the Cube covering scaler Innovation Day Brought to you by scaler. >> I'm John for the Cube. We are here in San Mateo, California, for special Innovation Day with scaler at their headquarters. Their new headquarters here. I'm here. She here. Lou, Who's Xia Liu? Who's the software engineering team? Good to see you. Thanks for joining. >> Thank you. >> So tell us, what do you do here? What kind of programming? What kind of engineering? >> Sure. Eso i'ma back and suffer engineer at scaler. What I work on from the day to day basis is building our highly scaleable distributed systems and serving our customers fast queries. >> What's the future that you're building? >> Yeah. So one of the project that I'm working on right now is it will help our infrastructure to move towards a more stateless infrastructure s o. The project itself is a meta data storage component and a series of AP ice that Comptel are back and servers where to find a lock file. That might sound really simple, but at the massive scale of ours, it is actually a significant challenge to do it fast and reliably. >> And you're getting date is a big challenge or run knows that data is the new oil date is the goal. Whatever the people saying, the states is super important. You guys have a unique architecture around data ingest What's so unique about it? You mind sharing? >> Of course, s O. We have a lot of things that we do or don't do. Uniquely. I would like to start with the ingestion front of things and what we don't do on that front. So we don't do keywords indexing which most other extinct existing solutions, too. By not doing that, not keeping the index files up to date with every single log message that's incoming. We saved a lot of time and resource, actually, from the moment that our customers applications generate a logline Teo that logline becoming available to for search in scaler. You y that takes just a couple of seconds on DH on other existing solutions. That can take hours. >> So that's the ingests I What about the query side? Because you got in just now. Query. What's that all about? >> Yeah, of course. Actually. Do you mind if we go to black board a little bit? >> Take a look. >> Okay. Grab a chart real quick. Um, so we have a lot of servers around here. We have, uh, Q >> servers. Let's see. >> These are accused servers and, um, a lot of back and servers, Um, just to reiterate on the interest inside a little bit. When locks come in, they will hit one of these Q servers, and you want them Any one of them. And the Q server will kind of batch the log messages together and then pick one of the bag and servers at random and send the batch of locks. Do them any Q can reach any back in servers. And that's how we kind of were able to handle gigs of laughs. How much ever log that you give us way in jazz? Dozens of terabytes of data on a daily basis. Um, and then it is this same farm of back and servers. That's kind of helping us on the query funds crave front. Um, our goal is when a query comes in, we summon all of these back and servers at once. We get all of their computation powers, all of their CPU cores, to serve this one queer Ari, And that is just a massively scalable multi tenant model and in my mind is really economies of scale at its best. >> So scales huge here. So they got the decoupled back in and accused Q system. But yet they're talking to each other. So what's the impact of the customer? What some of the order of magnitude scale we're talking about here? >> Absolutely. So for on the loch side, we talked about seconds response time from logs being generated, too. They see the lock show up and on the query side, um, the median response time of our queries is under 100 milli second. And we defined that response time from the moment the customer hit in the return button on their laptop to they see results show up and more than 90% of our queries return results in under one second. >> So what's the deployment model for the customers? So I'm a customer. Oh, that sounds great. Leighton sees a huge issue one of low late and seek. His legacy is really the lag issue for data. Do I buy it as a service on my deploying boxes? What does this look like here? >> Nope. Absolutely. Adult were 100 plan cloud native. All of this is actually in our cloud infrastructure and us a customer. You just start using us as a sulfur is a service, and when you submit a query, all of our back and servers are at your service. And what's best about this model is that asks Keller's business girls. We will add more back and servers at more computation power and you as a customer's still get all of that, and you don't need to pay us any extra for the increased queries. >> What's the customer news case for this given you, given example of who would benefit from this? >> Absolutely. So imagine your e commerce platform and you're having this huge black Friday sales. Seconds of time might mean millions of revenues to you, And you don't wantto waste any time on the logging front to debug into your system to look at your monitoring and see where the problem is. If you ever have a problem, so we give you a query response time on the magnitude of seconds versus other is existing solutions. Maybe you need to wait for minutes anxiously in front of your computer. >> She What's the unique thing here? This looks like a really good actor, decoupling things that might make sense. But what's the What's the secret sauce? You? What's the big magic here? >> Yeah, absolutely. So anyone can kind of do a huge server farm Route Fours query approach. But the 1st 80% of a brute force algorithm is easy. It's really the last 20%. That's kind of more difficult, challenging and really differentiate. That's from the rest of others. Solutions. So to start with, we make every effort we can teo identify and skip the work that we don't have to do. S O. Maybe we can come back to your seats. >> Cut. >> Okay, so it's so it's exciting. >> Yeah. So we there are a couple things we do here to skip the work that we don't have to do. As we always say, the fastest queries are those we don't even have to run, which is very true. We have this Colin, our database that wee boat in house highly performance for our use case that can lead us only scan the columns that the customer cares about and skipped all the rest. And we also build a data structure called bloom Filters And if a query term does not occur in those boom filters, we can just skip the whole data set that represents >> so that speed helps on the speed performance. >> Absolutely. Absolutely. If we don't even have to look at that data set, >> You know, I love talking to suffer engineers, people on the cutting edge because, you know, you guys were startup. Attracting talent is a big thing, and people love to work on hard problems. What's the hard problem that you guys are solving here? >> Yeah, absolutely. S o we we have this huge server farm at at our disposal. It's, however, as we always say, the key to brute force algorithms is really to recruit as much force as possible as fast as we can. If you have hundreds thousands, of course lying around. But you don't have an effective way to some of them around when you need them. Then there's no help having them around 11 of the most interesting things that my team does is we developed this customised scatter gather algorithm in order to assign the work in a way that faster back and servers will dynamically compensate for slower servers without any prior knowledge. And I just love that >> how fast is going to get? >> Well, I have no doubt that will one day reach light speed. >> Specialist. Physics is a good thing, but it's also a bottleneck. Just what? Your story. How did you get into this? >> Yeah, s o. I joined Scaler about eight months ago as an ap s server, Actually. Sorry. As an FBI engineer, actually eso during my FBI days. I use scaler, the product very heavily. And it just became increasingly fascinated about the speed at which our queria runs. And I was like, I really want to get behind the scene and see what's going on in the back end. That gives us such fast query. So here I am. Two months ago, I switched the back and team. >> Well, congratulations. And thanks for sharing that insight. >> Thank you, John. Thank >> jumper here with Cuban Sites Day and Innovation Day here in San Mateo. Thanks for watching

Published Date : May 30 2019

SUMMARY :

Day Brought to you by scaler. I'm John for the Cube. basis is building our highly scaleable distributed systems and serving That might sound really simple, but at the massive scale of ours, Whatever the people saying, not keeping the index files up to date with every single log message that's incoming. So that's the ingests I What about the query side? Yeah, of course. so we have a lot of servers around here. And the Q server will kind of batch the log messages together and What some of the order of magnitude scale we're So for on the loch side, we talked about seconds His legacy is really the lag issue for data. for the increased queries. so we give you a query response time on the magnitude of seconds versus She What's the unique thing here? the work that we don't have to do. the work that we don't have to do. If we don't even have to look at that data set, What's the hard problem that you guys are solving here? of the most interesting things that my team does is we developed this customised How did you get into this? behind the scene and see what's going on in the back end. And thanks for sharing that insight. Thanks for watching

ENTITIES

Entity	Category	Confidence
San Mateo	LOCATION	0.99+
FBI	ORGANIZATION	0.99+
John	PERSON	0.99+
Xia Liu	PERSON	0.99+
Comptel	ORGANIZATION	0.99+
Colin	PERSON	0.99+
Two months ago	DATE	0.99+
Lou	PERSON	0.99+
San Mateo, California	LOCATION	0.99+
more than 90%	QUANTITY	0.99+
Keller	PERSON	0.98+
millions	QUANTITY	0.98+
Cuban Sites Day	EVENT	0.98+
black Friday	EVENT	0.98+
under 100 milli second	QUANTITY	0.97+
1st 80%	QUANTITY	0.97+
Shia Liu	PERSON	0.97+
Dozens of terabytes of data	QUANTITY	0.96+
hundreds thousands	QUANTITY	0.96+
under one second	QUANTITY	0.95+
Innovation Day	EVENT	0.94+
one	QUANTITY	0.94+
Innovation Day	EVENT	0.92+
around 11	QUANTITY	0.88+
San Matteo	ORGANIZATION	0.87+
Seconds	QUANTITY	0.85+
20%	QUANTITY	0.83+
one day	QUANTITY	0.83+
eight months ago	DATE	0.81+
Scalyr	PERSON	0.78+
Leighton	PERSON	0.77+
Route Fours	OTHER	0.75+
single log message	QUANTITY	0.75+
100	QUANTITY	0.74+
Scalyr Innovation Day 2019	EVENT	0.73+
couple of seconds	QUANTITY	0.73+
about	DATE	0.61+
Cube	ORGANIZATION	0.57+
seconds	QUANTITY	0.56+
plan	ORGANIZATION	0.51+
minutes	QUANTITY	0.49+
Scaler	ORGANIZATION	0.49+
scaler	TITLE	0.38+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Cuban Sites Day: