Jeff Mathis, Scalyr & Steve Newman, Scalyr | Scalyr Innovation Day 2019
from San Mateo its the cube covering scalar innovation day brought to you by scaler but I'm John four with the cube we are here in San Mateo California official innovation day at Skylar's headquarters with Steve Neumann the founder of scalar and Jeff Mathis a software engineer guys thanks for joining me today thanks for having us thanks great to have you here so you guys introduced power queries what is all this about yes so the vision for scalar is to become the platform users trust when they want to observe their systems and power queries is a really important step along that journey power queries provide new insights into data with a powerful and expressive query language that's still easy to use so why is this important so we like to scaler we like to think that we're all about speed and a lot of what we're known for is the kind of the raw performance of the query engine that we've built that's sitting underneath this product which is one measure of speed but really we like to think of speed as the time from a question in someone's head to an answer on their screen and so the whole kind of user journey is part of that and you know kind of traditionally in our product we've we provided a set of basic capabilities for searching and counting and graphing that are kind of very easy for people to access and so you can get in quickly pose your question get an answer without even having to learn a query language and and that's been great but there are sometimes the need goes a little bit beyond that the question that some wants to ask is a little bit more complicated or the data needs a little bit of massaging and it just goes beyond the boundaries what you can do in kind of those basic you know sort of basic set of predefined abilities and so that's where we wanted to take a step forward and you know kind of create this more advanced language for for those more advanced cases you know I love the name power query so they want power and it's got to be fast and good so that aside you know queries been around people know search engines search technology discovery finding stuff but as ai/an comes around and more scales and that the system this seems to be a lot more focus on like inference into intuiting what's happening this has been a big trend what do you what's your opinion on that because this has become a big opportunity using data we've seen you know file companies go public we know who they are and they're out there but there's more data coming I mean it's not like it's stopping anytime soon so what's the what's the innovation that that just gonna take power queries to the next level yes so one of the features that I'm really excited about in the future of power queries is our autocomplete feature we've taken a lot of inspiration from just what your navbar does in the browser so the idea is to have a context-sensitive predictive autocomplete feature that's going to take into account a number of individual the syntactic context of where you are in the query what fields you have available to you what fields you've searched recently those kinds of factors Steve what's your take before we get to the customer impact what's the what's the difference it different what's weird whereas power queries gonna shine today and tomorrow so it's some it was a kind of both an interesting and fun challenge for us to design and build this because you're you know we're trying to you know by definition this is for the you know the more advanced use cases the more you know when you need something more powerful and so a big part of the design question for us is how do we how do we let people you know do more sophisticated things with their logs when the when they have that that use case while still making it some you know kind of preserving that that's speed and ease of use that that we like to think we're known for and and in particular you know they've been you know something where you know step one is go you know read this 300 page reference manual and you know learn this complicated query language you know if that was the approach then you know then we would have failed before we started and we had we have the benefit of a lot of hindsight you know there a lot of different sister e of people manipulating data you know working with these sophisticated different and different kinds of systems so there are you know we have users coming to us who are used to working with other other log management tools we have users or more comfortable than SQL we have users who really you know their focus is just a more conventional programming languages especially because you know one of the constituencies we serve our you know it's a trend nowadays that development engineers are responsible also for keeping their code working well in production so they're not experts in this stuff they're not log management experts they're not you know uh telemetry experts and we want them to be able to come in and kind of casual you know coming casually to this tool and get something done but we had all that context of drawn with these different history of languages that people are used to so we came up with about a dozen use cases that we thought kind of covered the spectrum of you know what would people bring bring people into a scenario like this and we actually game to those out well how would you solve this particular question if we were using an SQL like approach or an approach based on this tool or which based on that tool and so we we did this like big exploration and we were able to boil down boil everything down to about ten fairly simple commands that they're pretty much covered the gamut by comparison you know there are there other solutions that have over a hundred commands and it obviously if it's just a lot to learn there at the other end of the spectrum um SQL really does all this with one command select and it's incredibly powerful but you also really have to be a wizard sometimes to kind of shoehorn that into yeah even though sequels out there people know that but people want it easier ultimately machines are gonna be taking over you get the ten commands you almost couldn't get to the efficiency level simplifying the use cases what's the customer scenario looked like what's that why is design important what's what's in it for the customer yeah absolutely so the user experience was a really important focus for us when designing power queries we knew from the start that if tool took you ten minutes to relearn every time you wanted to use it then the query takes ten minutes to execute it doesn't take seconds to execute so one of the ways we approached this problem was to make sure we're constantly giving the user feedback that starts as soon you load the page you've immediately got access to some of the documentation you need you use the feature if you have type in correct syntax you'll get feedback from the system about how to fix that problem and so really focusing on the user experience was a big part of the yeah people gonna factor in the time it takes to actually do the query write it up if you have to code it up and figure it out that's time lag right there you want be as fast as possible interesting design point radical right absolutely so Steve how does it go fast Jeff how does it go fast what are you guys looking at here what's the magic so let me I'm going to step over to the whiteboard shock board here and we'll so chog in one hand Mike in the other will will evaluate my juggling skills but I wanted to start by showing an example of what one of these queries looks like you know I talked about how we kind of boil everything down to about 10 commands so so let's talk through a simple scenario let's say I'm running a tax site you know people come to our web site and they're you know they're putting their taxes together and they're downloading forms and tax laws are different in every state so I have different code that's running for you know you know people in California versus people in Michigan or whatever and I can you know it's easy to do things like graph the overall performance and error rate for my site but I might have a problem with the code for one specific state and it might not show up in those overall statistics very clearly so I don't know I want to get a sense of how well I'm how I am performing for each of the 50 states so I'm gonna and I'm gonna simplify this a little bit but you know I might have an access log for this system where we'll see entries like you know we're loading the tax form and it's for the state of California and the status code was 200 which means that was successful and then we load the tax form and the state is Texas and again that was a success and then we load the tax form for Michigan and the status was a 502 which is a server error and then you know and millions of these mixing with other kinds of logs from other parts of my system and so I want to pull up a report what percentage of requests are succeeding or failing by state and so let me sketch for it first with the query would look like for that and then I'll talk about how how we execute this at speed so so first of all I have to say what which you know of all my other you know I've drawn just the relevant logs but this is gonna be mixed in with all the other logs for my system I need to say which which logs I care about well maybe as simple as just calling out they all have the this page name in them tax form so that that's the first step of my query I'm searching for tax form and now I want to count these count how many of these there are how many of them succeeded or failed and I want to cluster that by state so I'm gonna clustering is with the group command so I'm gonna say I want to count the total number of requests which is just the count so count is a part of the language total is what I'm choosing to name that and I want to count the errors which is also going to be the count command but now I'm going to give it a condition I want to only count where the status is at least 500 and I rather you can see that but behind the plant is a 500 and I'm gonna group that by state so we're we're counting up how many of these values were above 500 and we're grouping it by this field and what's gonna come out of that is a table that'll say for each state the total number of requests the number of errors oh and sorry I actually left out a couple of steps but so it's but actually let's draw what this would give us so far so it's gonna show me for California maybe I had nine thousand one hundred and fifty two requests thirteen of them were errors for Texas I had and so on but I'm still not really there you know that might show me that California had you know maybe California had thirteen errors and Rodi had 12 errors but only there were only 12 requests for Rhode Island Rhode Island is broke you know I've broken my code for Rhode Island but it's only 12 errors because it's a smaller population so that's you know this analysis is still not quite gonna get me where I need to go so I can now add another command I've done this group now I'm gonna say I'm gonna say let which triggers a calculation let error rate equal errors divided by total and so that's going to give me the fraction and so for California you know that might be 0.01 or whatever but for Rhode Island it's gonna be one 100% of the requests are failing and then I can add another command to sort by the error rate and now my problem states are gonna pop to the top so real easy to use language it's great for the data scientists digging in their practitioners you don't need to be hard core coder to get into this exactly that's the idea you know groups or you know very simple commands that just directly you know kind of match the English description of what you're trying to do so then but you know yeah asked a great question then which is how do we take this whole thing and execute it quickly so I'm gonna erase here you're getting into speed now right so yeah bit like that how you get the speed exactly speed is good so simplicity to use I get that it's now speed becomes the next challenge exactly and the speed feeds into the simplicity also because you know step one for anything any tool like this is learning the tool yeah and that involves a lot of trial and error and if the trial and error involves waiting and then at the end of the wait for a query to run you learn that oh you did the query wrong that's very discouraging to people and so we actually think of speed really then becomes some ease of use but all right so how do we actually do this so you've got you know you'll have your whole mass of log data tax forms other forms internal services database logs that are you got your whole you know maybe terabytes of log data somewhere in there are the the really important stuff the tax form errors as well as all the other tax form logs mixed in with a bigger pile of everything else so step one is to filter from that huge pile of all your logs down to just the tax form logs and for that we were able to leverage our existing query engine and one of the main things that makes that engine there's kind of two things that make that that engine as fast it is as it is it's massively parallel so we we segment the data across hundreds of servers our servers so all this data is already distributed across all these servers and once your databases you guys build your own in-house ok got it exactly so this is on our system so we've already collected we're collecting the logs in real time so by the time the user comes and types in that query we already have the data and it's already spread out across all these service then the you know the first step of that query was just a search for tax form and so that's our existing query engine that's not the new thing we've built for power queries so that existing very highly optimized engine this server scans through these logs this service insula these logs each server does its share and they collectively produce a smaller set of data which is just the tax form logs and that's still distributed by the way so really each server is doing this independently and and is gonna continue locally doing the next step so so we're harnessing the horsepower of all these servers each page I only have to work with a small fraction of the data then the next step was that group command we were counting the requests counting the errors and rolling that up by state so that's the new engine we've built but again it each server can do just its little share so this server is gonna take whichever tax form logs it found and produce a little table of counts in it by state this server is gonna do the same thing so at each produce they're a little grouping table with just their share of the logs and then all of that funnels down to one central server where we do the later steps we do the division divide number of errors by total count and and then sort it but by now you know here we might have you might have trillions of log messages down to millions or billions of messages that are relevant to your query now we here we have 50 records you know just one for each state so suddenly the amount of data is very small and so the you know the later steps may be kind of interesting from a processing perspective but they're easy from a speed perspective so you solve a lot of database challenges by understanding kind of how things flow once you've got everything with the columnar database is there just give up perspective of like what if the alternative would be if we this is like I just drew this to a database and I'm running sequel trillions of log files I mean it's not trivial I mean it's a database problem then it's a user problem kind of combine what's order of magnitude difference if I was gonna do the old way yeah so I mean I mean the truth is there's a hundred old ways know how much pain yes they're healthy you know if you're gonna you know if you try to just throw this all into one you know SQL sir you know MySQL or PostgreSQL bytes of data and and by the way we're glossing over the data has to exist but also has to get into the system so you know in you know when you're checking you know am i letting everyone in Rhode Island down on the night before you know the 15th you need up to the moment information but the date you know your database is not necessarily even if it could hold the data it's not necessarily designed to be pulling that in in real time so you know just sort of a simple approach like let me spin up my SQL and throw all the data in it's it's just not even gonna happen I'm gonna have so now you're sharding the data or you're looking at some you know other database solution or ever in it it's a heavy lift either way it's a lot of extra effort taxing on the developers yeah you guys do the heavy lifting yeah okay what's next where's the scale features come in what do you see this evolving for the customers so you know so Jeff talked about Auto complete which you were really excited about because it's gonna again you know a lot of this is for the casual user you know they're you know they're a power user of you know JavaScript or Java or something you're they're building the code and then they've got to come in and solve the problem and get back to what they think of as their real job and so you know we think autocomplete and the way we're doing it we're we're really leveraging both the context of what you're typing as well as the history of what you and your team have done in queried in the past as well as the content of your data every think of it a little bit like the the browser location bar which somehow you type about two letters and it knows exactly which page you're looking for because it's relying on all those different kinds of cues yeah it seems like that this is foundational heavy-lift you myself minimize all that pain then you get the autocomplete start to get in a much more AI machine learning kicks in more intelligent reasoning you start to get a feel for the data it seems like yeah Steve thanks for sharing that there it is on the whiteboard I'm trying for a year thanks for watching this cube conversation
SUMMARY :
small and so the you know the later
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Mathis | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
ten minutes | QUANTITY | 0.99+ |
Michigan | LOCATION | 0.99+ |
Steve Neumann | PERSON | 0.99+ |
50 records | QUANTITY | 0.99+ |
Rhode Island | LOCATION | 0.99+ |
12 errors | QUANTITY | 0.99+ |
thirteen errors | QUANTITY | 0.99+ |
Steve | PERSON | 0.99+ |
Texas | LOCATION | 0.99+ |
nine thousand | QUANTITY | 0.99+ |
San Mateo | LOCATION | 0.99+ |
millions | QUANTITY | 0.99+ |
Steve Newman | PERSON | 0.99+ |
thirteen | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
MySQL | TITLE | 0.99+ |
two things | QUANTITY | 0.99+ |
ten commands | QUANTITY | 0.99+ |
50 states | QUANTITY | 0.99+ |
each page | QUANTITY | 0.99+ |
0.01 | QUANTITY | 0.99+ |
300 page | QUANTITY | 0.99+ |
Rhode Island | LOCATION | 0.99+ |
today | DATE | 0.99+ |
each server | QUANTITY | 0.99+ |
each server | QUANTITY | 0.99+ |
hundreds of servers | QUANTITY | 0.98+ |
500 | QUANTITY | 0.98+ |
first step | QUANTITY | 0.98+ |
over a hundred commands | QUANTITY | 0.98+ |
tomorrow | DATE | 0.98+ |
JavaScript | TITLE | 0.98+ |
Rodi | PERSON | 0.98+ |
502 | OTHER | 0.98+ |
one | QUANTITY | 0.98+ |
step one | QUANTITY | 0.97+ |
Mike | PERSON | 0.97+ |
PostgreSQL | TITLE | 0.97+ |
billions of messages | QUANTITY | 0.97+ |
12 requests | QUANTITY | 0.96+ |
both | QUANTITY | 0.96+ |
100% | QUANTITY | 0.96+ |
each state | QUANTITY | 0.96+ |
200 | OTHER | 0.95+ |
a year | QUANTITY | 0.95+ |
one command | QUANTITY | 0.95+ |
John | PERSON | 0.95+ |
first | QUANTITY | 0.95+ |
about a dozen use cases | QUANTITY | 0.95+ |
about ten fairly simple commands | QUANTITY | 0.95+ |
trillions of log messages | QUANTITY | 0.95+ |
SQL | TITLE | 0.95+ |
English | OTHER | 0.93+ |
about 10 commands | QUANTITY | 0.93+ |
one central server | QUANTITY | 0.92+ |
one measure | QUANTITY | 0.92+ |
above 500 | QUANTITY | 0.9+ |
one of the main things | QUANTITY | 0.89+ |
each | QUANTITY | 0.89+ |
one specific state | QUANTITY | 0.89+ |
15th | QUANTITY | 0.89+ |
scalar innovation day | EVENT | 0.88+ |
Skylar | ORGANIZATION | 0.88+ |
Scalyr | PERSON | 0.84+ |
at least 500 | QUANTITY | 0.84+ |