Thomas Hazel, ChaosSearchJSON Flex on ChaosSearch

[Thomas Hazel] - Hello, this is Thomas Hazel, founder CTO here at ChaosSearch. And tonight I'm going to demonstrate a new feature we are offering this quarter called JSON Flex. If you're familiar with JSON datasets, they're wonderful ways to represent information. You know, they're multidimensional, they have ability to set up arrays as attributes but those arrays are really problematic when you need to expand them or flatten them to do any type of elastic search or relational access, particularly when you're trying to do aggregations. And so the common process is to exclude those arrays or pick and choose that information. But with this new Chaos Flex capability, our system uniquely can index that data horizontally in a very small and efficient representation. And then with our Chaos Refinery, expand each attribute as you wish vertically, so you can do all the basic and natural constructs you would have done if you had, you know, a more straightforward, two dimensional, three dimensional type representation. So without further ado, I'mma get into this presentation of JSON Flex. Now, in this case, I've already set up the service to point to a particular S3 account that has CloudTrail data, one that is pretty problematic when it comes down to flattening data. And again, if you know CloudTrail, one row can become 10,000 as data gets flattened. So without further ado, let me jump right in. When you first log into the ChaosSearch service, you'll see a tab called 'Storage'. This is the S3 account, and I have variety of buckets. I have the refinery, it's a data refinery. This is where we create views or lenses into these index streams that you can do analysis that publishes it in elastic API as an index pattern or relational table in SQL Now a particular bucket I have here is a whole bunch of demonstration datasets that we have to show off our capabilities and our offering. In this bucket, I have CloudTrail data and I'm going to create what we call a 'object group'. An object group is a entry point, a filter of which files I want to index that data. Now, it can be statically there or a live streaming. These object groups had the ability to say, what type of data do you want to index on? Now through our wizard, you can type in, you know, prefix in this case, I want to type in CloudTrail, and you see here, I have a whole bunch of CloudTrail. I'mma choose one file to make it quick and easy. But this particular CloudTrail data will expand and we can show the capability of this horizontal to vertical expansion. So I walked through the wizard, as you can see here, we discovered JSON, it's a gzip file. Leave flattening unlimited 'cause we want to be able to expand infinitely. But this case, instead of doing default virtual, I'm going to horizontally represent this information. And this uniquely compresses the data in a way that can be stored efficiently on disc but then expanded in our data refinery on Pond Query or search requests. So I'mma create this object group. Now I'm going to call this, you know, 'JSON Flex test' and I could set up live indexing, SQS pops up but I'mma skip that and skip Retention and just create it. Once this object group is created, you kind of think of it as a virtual bucket, 'cause it does filter the data as you can see here. When I look at the view, I just see CloudTrail, but within the console, I can say start indexing. Now this is static data there could be a live stream and we set up workers to index this data. Whether it's one file, a million files or one terabyte, or one petabyte, we index the data. We discover all the schema, and as you see here, we discovered 104 columns. Now what's interesting is that we represent this expansion in a horizontal way. You know, if you know CloudTrail records zero, record one, record two. This can expand pretty dramatically if you fully flatten it but this case we horizontally representing it as the index. So when I go into the data refinery, I can create a view. Now, if you know the data refinery of ChaosSearch, you can bring multiple data streams together. You can do transformations virtually, you can do correlations, but in this case, I'm just going to take this one particular index stream, we call 'JSON Flex' and walk through a wizard, we try to simplify everything and select a particular attribute to expand. Now, again, we represent this in one row but if you had arrays and do all the permutations, it could go one to 100 to 10,000. We had one JSON audit that went from one row to 1 million rows. Now, clearly you don't want to create all those permutations, when you're tryna put into a database. With our unique index technology, you can do it virtually and sort horizontally. So let me just select 'Virtual' and walk through the wizard. Now, as I mentioned, we do all these different transformations changed schema, we're going to skip all that and select the order time, records event and say, 'create this'. I'm going to say, you know, 'JSON Flex View', I can set up caching, do a variety of things, I'm going to skip that. And once I create this, it's now available in the elastic API as an index pattern, as well as SQL via our Presto API dialect. And you can use Looker, Tableau, et cetera. But in this case, we go to this 'Analytics tab' and we built in the Kibana, open search tooling that is Apache Tonetto. And I click on discovery here and I'm going to select that particular view. Again, it looks like, oops, it looks like an index pattern, and I'mma choose, let's see here, let's choose 15 years from past and present and make sure I find where actually was timed. And what you'll see here is, you know, sure. It's just one particular data set has a variety of columns, but you see here is unlike that record zero, records one, now it's expanded. And so it has been expanded like a vertical flattening that you would traditionally do if you wanted to do anything that was an elastic or a relational construct, you know, that fit into a table format. Now the 'vantage of JSON Flex, you don't have that stored as a blob and use these proprietary JSON API's. You can use your native elastic API or your native SQL tooling to get access to it naturally without that expense of that explosion or without the complexity of ETLing it, and picking and choosing before you actually put into the database. That completes the demonstration of ChaosSearch new JSON Flex capability. If you're interested, come to ChaosSearch.io and set up a free trial. Thank you.

Published Date : Nov 15 2021

SUMMARY :

and as you see here, we

ENTITIES

Entity	Category	Confidence
Thomas Hazel	PERSON	0.99+
10,000	QUANTITY	0.99+
one terabyte	QUANTITY	0.99+
one file	QUANTITY	0.99+
104 columns	QUANTITY	0.99+
one petabyte	QUANTITY	0.99+
1 million rows	QUANTITY	0.99+
JSON Flex	TITLE	0.99+
ChaosSearch	ORGANIZATION	0.99+
one row	QUANTITY	0.99+
a million files	QUANTITY	0.99+
tonight	DATE	0.98+
Tableau	TITLE	0.98+
each attribute	QUANTITY	0.98+
first	QUANTITY	0.98+
SQL	TITLE	0.98+
S3	TITLE	0.98+
100	QUANTITY	0.98+
JSON	TITLE	0.98+
15 years	QUANTITY	0.98+
Presto	TITLE	0.97+
one	QUANTITY	0.96+
Looker	TITLE	0.95+
two	QUANTITY	0.93+
JSON Flex View	TITLE	0.92+
JSON API	TITLE	0.91+
Flex	TITLE	0.87+
zero	QUANTITY	0.87+
SQS	TITLE	0.86+
ChaosSearchJSON	ORGANIZATION	0.8+
this quarter	DATE	0.8+
CloudTrail	COMMERCIAL_ITEM	0.79+
Apache Tonetto	ORGANIZATION	0.72+
JSON	ORGANIZATION	0.69+
Chaos Flex	TITLE	0.69+
CloudTrail	TITLE	0.6+
ChaosSearch	TITLE	0.58+
ChaosSearch.io	TITLE	0.57+
data set	QUANTITY	0.56+
Kibana	ORGANIZATION	0.45+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for JSON Flex View: