UNLIST TILL 4/2 - Sizing and Configuring Vertica in Eon Mode for Different Use Cases

>> Jeff: Hello everybody, and thank you for joining us today, in the virtual Vertica BDC 2020. Today's Breakout session is entitled, "Sizing and Configuring Vertica in Eon Mode for Different Use Cases". I'm Jeff Healey, and I lead Vertica Marketing. I'll be your host for this Breakout session. Joining me are Sumeet Keswani, and Shirang Kamat, Vertica Product Technology Engineers, and key leads on the Vertica customer success needs. But before we begin, I encourage you to submit questions or comments during the virtual session, you don't have to wait, just type your question or comment in the question box below the slides, and click submit. There will be a Q&A session at the end of the presentation, we will answer as many questions as we're able to during that time, any questions we don't address, we'll do our best to answer them off-line. Alternatively, visit Vertica Forums, at forum.vertica.com, post your question there after the session. Our Engineering Team is planning to join the forums to keep the conversation going. Also as reminder, that you can maximize your screen by clicking the double arrow button in the lower-right corner of the slides, and yes, this virtual session is being recorded, and will be available to view on-demand this week. We'll send you a notification as soon as it's ready. Now let's get started! Over to you, Shirang. >> Shirang: Thanks Jeff. So, for today's presentation, we have picked Eon Mode concepts, we are going to go over sizing guidelines for Eon Mode, some of the use cases that you can benefit from using Eon Mode. And at last, we are going to talk about, some tips and tricks that can help you configure and manage your cluster. Okay. So, as you know, Vertica has two modes of operation, Eon Mode and Enterprise Mode. So the question that you may have is, which mode should I implement? So let's look at what's there in the Enterprise Mode. Enterprise Mode, you have a cluster, with general purpose compute nodes, that have locally at their storage. Because of this tight integration of compute and storage, you get fast and reliable performance all the time. Now, amount of data that you can store in Enterprise Mode cluster, depends on the total disk capacity of the cluster. Again, Enterprise Mode is more suitable for on premise and cloud deployments. Now, let's look at Eon Mode. To take advantage of cloud economics, Vertica implemented Eon Mode, which is getting very popular among our customers. In Eon Mode, we have compute and storage, that are separated by introducing S3 Bucket, or, S3 compliant storage. Now because of this separation of compute and storage, you can take advantages like mapping all dynamic scale-out and scale-in. Isolation of your workload, as well as you can load data in your cluster, without having to worry about the total disk capacity of your local nodes. Obviously, you know, it's obvious from what they accept, Eon Mode is suitable for cloud deployment. Some of our customers who take advantage of the features of Eon Mode, are also deploying it on premise, by introducing S3 compliant slash web storage. Okay? So, let's look at some of the terminologies used in Eon Mode. The four things that I want to talk about are, communal storage. It's a shared storage, or S3 compliant shared storage, a bucket that is accessible from all the nodes in your cluster. Shard, is a segment of data, stored on the communal storage. Subscription, is the binding with nodes and shards. And last, depot. Depot is a local copy or, a local cache, that can help query in group performance. So, shard is a segment of data stored in communal storage. When you create a Eon Mode cluster, you have to specify the shard count. Shard count decide the maximum number of nodes that will participate in your query. So, Vertica also will introduce a shard, called replica shard, that will hold the data for replicated projections. Subscriptions, as I said before, is a binding between nodes and shards. Each node subscribes to one or more shards, and a shard has at least two nodes that subscribe to it for case 50. Subscribing nodes are responsible for writing and reading from shard data. Also subscriber node holds up-to-date metadata for a catalog of files that are present in the shard. So, when you connect to Vertica node, Vertica will automatically assign you set of nodes and subscriptions that will process your query. There are two important system tables. There are node subscriptions, and session subscriptions, that can help you understand this a little bit more. So let's look at what's on the local disk of your Eon Mode cluster. So, on local disk, you have depot. Depot is a local file system cache, that can hold subset of the data, or copy of the data, in communal storage. Other things that are there, are temp storage, temp storage is used for storing data belonging to temporary tables, and, the data that spills through this, when you are processing queries. And last, is catalog. Catalog is a persistent copy of Vertica, catalog that is written to this. The writes happen at every commit. You only need the persistent copy at node startup. There is also a copy of Vertica catalog, stored in communal storage, called durability. The local copy is synced to the copy in communal storage via service, at the interval of five minutes. So, let's look at depot. Now, as I said before, depot is your file system cache. It's help to reduce network traffic, and slow performance of your queries. So, we make assumption, that when we load data in Vertica, that's the data that you may most frequently query. So, every data that is loaded in Vertica is first entering the depot, and then as a part of same transaction, also synced to communal storage for durability. So, when you query, when you run a query against Vertica, your queries are also going to find the files in the depot first, to be used, and if the files are not found, the queries will access files from communal storage. Now, the behavior of... you know, the new files, should first enter the depot or skip depot can be changed by configuration parameters that can help you skip depot when writing. When the files are not found in depot, we make assumption that you may need those files for future runs of your query. Which means we will fetch them asynchronously into the depot, so that you have those files for future runs. If that's not the behavior that you intend, you can change configuration around return, to tell Vertica to not fetch them when you run your query, and this configuration parameter can be set at database level, session level, query level, and we are also introducing a user level parameter, where you can change this behavior. Because the depot is going to be limited in size, compared to amount of data that you may store in your Eon cluster, at some point in time, your depot will be full, or hit the capacity. To make space for new data that is coming in, Vertica will evict some of the files that are least frequently used. Hence, depot is going to be your query performance enhancer. You want to shape the extent of your depot. And, so what you want to do is, to decide what shall be in your depot. Now Vertica provides some of the policies, called pinning policies, that can help you pin of statistics table or addition of a table, into a depot, at subcluster level, or at the database level. And Sumeet will talk about this a bit more in his future slides. Now look at some of the system tables that can help you understand about the size of the depot, what's in your depot, what files were evicted, what files were recently fetched into the depot. One of the important system tables that I have listed here is DC_FILE_READS. DC_FILE_READS can be used to figure out if your transaction or query fetched with data from depot, from communal storage, or component. One of the important features of Eon Mode is a subcluster. Vertica lets you divide your cluster into smaller execution groups. Now, each of the execution groups has a set of nodes together subscribed to all the shards, and can process your query independently. So when you connect one node in the subcluster, that node, along with other nodes in the subcluster, will only process your query. And because of that, we can achieve isolation as well as, you know, fetches, scale-out and scale-in without impacting what's happening on the cluster. The good thing about subclusters, is all the subclusters have access to the communal storage. And because of this, if you load data in one subcluster, it's accessible to the queries that are running in other subclusters. When we introduced subclusters, we knew that our customers would really love these features, and, some of the things that we were considering is, we knew that our customers would dynamically scale out and in, lots of-- they would add and remove lots of subclusters on demand, and we had to provide that ab-- we had to give this feature, or provide ability to add and remove subclusters in a fast and reliable way. We knew that during off-peak hours, our customers would shut down many of their subclusters, that means, more than half of the nodes could be down. And we had to make adjustment to our quorum policy which requires at least half of the nodes to be up for database to stay up. We also were aware that customers would add hundreds of nodes in the cluster, which means we had to make adjustments to the catalog and commit policy. To take care of all these three requirements we introduced two types of subclusters, primary subclusters, and secondary subclusters. Primary subclusters is the one that you get by default when you create your first Eon cluster. The nodes in the primary subclusters are always up, that means they stay up and participate in the quorum. The nodes in the primary subcluster are responsible for processing commits, and also maintain a persistent copy, of catalog on disk. This is a subcluster that you would use to process all your ETL jobs, because the topper more also runs on the node, in the primary subcluster. If you want now at this point, have another subcluster, where you would like to run queries, and also, build this cluster up and down depending on the demand or the, depending on the workload, you would create a new subcluster. And this subcluster will be off-site secondary in nature. Now secondary subclusters have nodes that don't participate in quorums, so if these nodes are down, Vertica has no impact. These nodes are also not responsible for processing commit, though they maintain up-to-date copies of the catalog in memory. They don't store catalog on disk. And these are subclusters that you can add and remove very quickly, without impacting what is running on the other subclusters. We have customers running hundreds of nodes, subclusters with hundreds of nodes, and subclusters of size like 64 node, and they can bring this subcluster up and down, or add and remove, within few minutes. So before I go into the sizing of Eon Mode, I just want to say one more thing here. We are working very closely with some of our customers who are running Eon Mode and getting better feedback from that on a regular basis. And based on the feedback, we are making lots of improvements and fixes in every hot-fix that we put out. So if you are running Eon Mode, and want to be part of this group, I suggest that, you keep your cluster current with latest hot-fixes and work with us to give us feedback, and get the improvements that you need to be successful. So let's look at what there-- What we need, to size Eon clusters. Sizing Eon clusters is very different from sizing Enterprise Mode cluster. When you are running Enterprise Mode cluster or when you're sizing Vertica cluster running Enterprise Mode, you need to take into account the amount of data that you want to store, and the configuration of your node. Depending on which you decide, how many nodes you will need, and then start the cluster. In Eon Mode, to size a cluster, you need few things like, what should be your shard count. Now, shard count decides the maximum number of nodes that will participate in your query. And we'll talk about this little bit more in the next slide. You will decide on number of nodes that you will need within a subcluster, the instance type you will pick for running statistic subcluster, and how many subclusters you will need, and how many of them should be running all the time, and how many should be running in a dynamic mode. When it comes to shard count, you have to pick shard count up front, and you can't change it once your database is up and running. So, we... So, you need to pick shard count depending the number of nodes, are the same number of nodes that you will need to process a query. Now one thing that we want to remember here, is this is not amount of data that you have in database, but this is amount of data your queries will process. So, you may have data for six years, but if your queries process last month of data, on most of the occasions, or if your dashboards are processing up to six weeks, or ten minutes, based on whatever your needs are, you will decide or pick the number of shards, shard count and nodes, based on how much data your queries process. Looking at most of our customers, we think that 12 is a good number that should work for most of our customers. And, that means, the maximum number of nodes in a subcluster that will process queries is going to be 12. If you feel that, you need more than 12 nodes to process your query, you can pick other numbers like 24 or 48. If you pick a higher number, like 48, and you go with three nodes in your subcluster, that means node subscribes to 16 primary and 16 secondary shard subscription, which totals to 32 subscriptions per node. That will leave your catalog in a broken state. So, pick shard count appropriately, don't pick prime numbers, we suggest 12 should work for most of our customers, if you think you process more than, you know, the regular, the regular number that, or you think that your customers, you think your queries process terabytes of data, then pick a number like 24. Don't pick a prime number. Okay? We are also coming up with features in Vertica like current scaling, that will help you run more-- run queries on more than, more nodes than the number of shards that you pick. And that feature will be coming out soon. So if you have picked a smaller shard count, it's not the end of the story. Now, the next thing is, you need to pick how many nodes you need within your subclusters, to process your query. Ideal number would be node number equal to shard count, or, if you want to pick a number that is less, pick node count which is such that each of the nodes has a balanced distribution of subscriptions. When... So over here, you can have, option where you can have 12 nodes and 12 shards, or you can have two subclusters with 6 nodes and 12 shards. Depending on your workload, you can pick either of the two options. The first option, where you have 12 nodes and 12 shards, is more suitable for, more suitable for batch applications, whereas two subclusters with, with six nodes each, is more suitable for desktop type applications. Picking subclusters is, it depends on your workload, you can add remove nodes relative to isolation, or Elastic Throughput Scaling. Your subclusters can have nodes of different sizes, and you need to make sure that the nodes within the subcluster have to be homogenous. So this is my last slide before I hand over to Sumeet. And this I think is very important slide that I want you to pay attention to. When you pick instance, you are going to pick instance based on workload and query budget. I want to make it clear here that we want you to pay attention to the local disk, because you have depot on your local disk, which is going to be your query performance enhancer for all kinds of deployment, in cloud, as well as on premise. So you'd expect of what you read, or what you heard, depots still play a very important role in every Eon deployment, and they act like performance enhancers. Most of our customers choose Vertica because they love the performance we offer, and we don't want you to compromise on the performance. So pick nodes with some amount of local disk, at least two terabytes is what we suggest. i3 instances in Amazon have, you know, come up with a good local disk that is very helpful, and some of our customers are benefiting from. With that said, I want to pass it over to Sumeet. >> Sumeet: So, hi everyone, my name is Sumeet Keswani, and I'm a Product Technology Engineer at Vertica. I will be discussing the various use cases that customers deploy in Eon Mode. After that, I will go into some technical details of SQL, and then I'll blend that into the best practices, in Eon Mode. And finally, we'll go through some tips and tricks. So let's get started with the use cases. So a very basic use case that users will encounter, when they start Eon Mode the first time, is they will have two subclusters. The first subcluster will be the primary subcluster, used for ETL, like Shirang mentioned. And this subcluster will be mostly on, or always on. And there will be another subcluster used for, purely for queries. And this subcluster is the secondary subcluster and it will be on sometimes. Depending on the use case. Maybe from nine to five, or Monday to Friday, depending on what application is running on it, or what users are doing on it. So this is the most basic use case, something users get started with to get their feet wet. Now as the use of the deployment of Eon Mode with subcluster increases, the users will graduate into the second use case. And this is the next level of deployment. In this situation, they still have the primary subcluster which is used for ETL, typically a larger subcluster where there is more heavier ETL running, pretty much non-stop. Then they have the usual query subcluster which will use for queries, but they may add another one, another secondary subcluster for ad-hoc workloads. The motivation for this subcluster is to isolate the unpredictable workload from the predictable workload, so as not to impact certain isolates. So you may have ad-hoc queries, or users that are running larger queries or bad workloads that occur once in a while, from running on a secondary subcluster, on a different secondary subcluster, so as to not impact the more predictable workload running on the first subcluster. Now there is no reason why these two subclusters need to have the same instances, they can have different number of nodes, different instance types, different depot configurations. And everything can be different. Another benefit is, they can be metered differently, they can be costed differently, so that the appropriate user or tenant can be billed the cost of compute. Now as the use increases even further, this is what we see as the final state of a very advanced Eon Mode deployment here. As you see, there is the primary subcluster of course, used for ETL, very heavy ETL, and that's always on. There are numerous secondary subclusters, some for predictable applications that have a very fine-tuned workload that needs a definite performance. There are other subclusters that have different usages, some for ad-hoc queries, others for demanding tenants, there could be still more subclusters for different departments, like Finance, that need it maybe at the end of the quarter. So very, very different applications, and this is the full and final promise of Eon, where there is workload isolation, there is different metering, and each app runs in its own compute space. Okay, so let's talk about a very interesting feature in Eon Mode, which we call Hibernate and Revive. So what is Hibernate? Hibernating a Vertica database is the act of dissociating all the computers on the database, and shutting it down. At this point, you shut down all compute. You still pay for storage, because your data is in the S3 bucket, but all the compute has been shut down, and you do not pay for compute anymore. If you have reserved instances, or any other instances you can use them for different applications, and your Vertica database is shut down. So this is very similar to stop database, in Eon Mode, you're stopping all compute. The benefit of course being that you pay nothing anymore for compute. So what is Revive, then? The Revive is the opposite of Hibernate, where you now associate compute with your S3 bucket or your storage, and start up the database. There is one limitation here that you should be aware of, is that the size of the database that you have during Hibernate, you must revive it the same size. So if you have a 12-node primary subcluster when hibernating, you need to provision 12 nodes in order to revive. So one best practice comes down to this, is that you must shrink your database to the smallest size possible before you hibernate, so that you can revive it in the same size, and you don't have to spin up a ton of compute in order to revive. So basically, what this means is, when you have decided to hibernate, we ask you to remove all your secondary subclusters and shrink your primary subcluster down to the bare minimum before you hibernate it. And the benefit being, is when you do revive, you will have, you will be able to do so with the mimimum number of nodes. And of course, before you hibernate, you must cleanly shut down the database, so that all the data can be synced to S3. Finally, let's talk about backups and replication. Backups and replications are still supported in Eon Mode, we sometimes get the question, "We're in S3, and S3 has nine nines of reliability, we need a backup." Yes, we highly recommend backups, you can back-up by using the VBR script, you can back-up your database to another bucket, you can also copy the bucket and revive to a different, revive a different instance of your database. This is very useful because many times people want staging or development databases, and they need some of the data from production, and this is a nice way to get that. And it also makes sure that if you accidentally delete something you will be able to get back your data. Okay, so let's go into best practices now. I will start, let's talk about the depot first, which is the biggest performance enhancer that we see for queries. So, I want to state very clearly that reading from S3, or a remote object store like S3 is very slow, because data has to go over the network, and it's very expensive. You will pay for access cost. This is where S3 is not very cheap, is that every time you access the data, there is an ATI and access cost levied. Now the depot is a performance enhancing feature that will improve the performance of queries by keeping a local cache of the data that is most frequently used. It will also reduce the cost of accessing the data because you no longer have to go to the remote object store to get the data, since it's available on a local and permanent volume. Hence depot shaping is a very important aspect of performance tuning in an Eon database. What we ask you to do is, if you are going to use a specific table or partition frequency, you can choose to pin it, in the depot, so that if your depot is under pressure or is highly utilized, these objects that are most frequently used are kept in the depot. So therefore, depot, depot shaping is the act of setting eviction policies, instead you prevent the eviction of files that you believe you need to keep, so for example, you may keep the most recent year's data or the most recent, recent partition in the depot, and thereby all queries running on those partitions will be faster. At this time, we allow you to pin any table or partition in the depot, but it is not subcluster-based. Future versions of Vertica will allow you fine-tuning the depot based on each subcluster. So, let's now go and understand a little bit of internals of how a SQL query works in Eon Mode. And, once I explain this, we will blend into best practice and it will become much more clearer why we recommend certain things. So, since S3 is our layer of durability, where data is persistent in an Eon database. When you run an insert query, like, insert into table value one, or something similar. Data is synchronously written into S3. So, it will control returns back to the client, the copy of the data is first stored in the local depot, and then uploaded to S3. And only then do we hand the control back to the client. This ensures that if something bad were to happen, the data will be persistent. The second, the second types of SQL transactions are what we call DTLs, which are catalog operations. So for example, you create a table, or you added a column. These operations are actually working with metadata. Now, as you may know, S3 does not offer mutable storage, the storage in S3 is immutable. You can never append to a file in S3. And, the way transaction logs work is, they are append operation. So when you modify the metadata, you are actually appending to a transaction log. So this poses an interesting challenge which we resolve by appending to the transaction log locally in the catalog, and then there is a service that syncs the catalog to S3 every five minutes. So this poses an interesting challenge, right. If you were to destroy or delete an instance abruptly, you could lose the commits that happened in the last five minutes. And I'll speak to this more in the subsequent slides. Now, finally let's look at, drops or truncates in Eon. Now a drop or a truncate is really a combination of the first two things that we spoke about, when you drop a table, you are making, a drop operation, you are making a metadata change. You are telling Vertica that this table no longer exists, so we go into the transaction log, and append into the transaction log, that this table has been removed. This log of course, will be synced every five minutes to S3, like we spoke. There is also the secondary operation of deleting all the files that were associated with data in this table. Now these files are on S3. And we can go about deleting them synchronously, but that would take a lot of time. And we do not want to hold up the client for this duration. So at this point, we do not synchronously delete the files, we put the files that need to be removed in a reaper queue. And return the control back to the client. And this has the performance benefit as to the drops appear to occur really fast. This also has a cost benefit, batching deletes, in big batches, is more performant, and less costly. For example, on Amazon, you could delete 1,000 files at a time in a single cost. So if you batched your deletes, you could delete them very quickly. The disadvantage of this is if you were to terminate a Vertica customer abruptly, you could leak files in S3, because the reaper queue would not have had the chance to delete these files. Okay, so let's, let's go into best practices after speaking, after understanding some technical details. So, as I said, reading and writing to S3 is slow and costly. So, the first thing you can do is, avoid as many round trips to S3 as possible. The bigger the batches of data you load, the better. The better performance you get, per commit. The fact thing is, don't read and write from S3 if you can avoid it. A lot of our customers have intermediate data processing which they think temporarily they will transform the data before finally committing it. There is no reason to use regular tables for this kind of intermediate data. We recommend using local temporary tables, and local temporary tables have the benefit of not having to upload data to S3. Finally, there is another optimization you can make. Vertica has the concept of active partitions and inactive partitions. Active partitions are the ones where you have recently loaded data, and Vertica is lazy about merging these partitions into a single ROS container. Inactive partitions are historical partitions, like, consider last year's data, or the year before that data. Those partitions are aggressively merging into a single container. And how do we know how many partitions are active and inactive? Well that's based on the configuration parameter. If you load into an inactive partition, Vertica is very aggressive about merging these containers, so we download the entire partition, merge the records that you loaded into it, and upload it back again. This creates a lot of network traffic, and I said, accessing data is, from S3, slow and costly. So we recommend you not load into inactive partitions. You should load into the most recent or active partitions, and if you happen to load into inactive partitions, set your active partition count correctly. Okay, let's talk about the reaper queue. Depending on the velocity of your ETL, you can pile up a lot of files that need to be deleted asynchronously. If you were were to terminate a Vertica customer without allowing enough time for these files to get deleted, you could leak files in S3. Now, of course if you use local temporary tables this problem does not occur because the files were never created in S3, but if you are using regular tables, you must allow Vertica enough time to delete these files, and you can change the interval at which we delete, and how much time we allow to delete and shut down, by exiting some configuration parameters that I have mentioned here. And, yeah. Okay, so let's talk a little bit about a catalog at this point. So, the catalog is synced every five minutes onto S3 for persistence. And, the catalog truncation version is the minimum, minimal viable version of the catalog to which we can revive. So, for instance, if somebody destroyed a Vertica cluster, the entire Vertica cluster, the catalog truncation version is the mimimum viable version that you will be able to revive to. Now, in order to make sure that the catalog truncation version is up to date, you must always shut down your Vertica cluster cleanly. This allows the catalog to be synced to S3. Now here are some SQL commands that you can use to see what the catalog truncation version is on S3. For the most part, you don't have to worry about this if you're shutting down cleanly, so, this is only in cases of disaster or some event where all nodes were terminated, without... without the user's permission. And... And finally let's talk about backups, so one more time, we highly recommend you take backups, you know, S3 is designed for 99.9% availability, so there could be a, maybe an occasional down-time, making sure you have backups will help you if you accidentally drop a table. S3 will not protect you against data that was deleted by accident, so, having a backup helps you there. And why not backup, right, storage is cheap. You can replicate the entire bucket and have that as a backup, or have DR plus, you're running in a different region, which also sources a backup. So, we highly recommend that you make backups. So, so with this I would like to, end my presentation, and we're ready for any questions if you have it. Thank you very much. Thank you very much.

Published Date : Mar 30 2020

SUMMARY :

Also as reminder, that you can maximize your screen and get the improvements that you need to be successful. So, the first thing you can do is,

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Sumeet	PERSON	0.99+
Sumeet Keswani	PERSON	0.99+
Shirang Kamat	PERSON	0.99+
Jeff Healey	PERSON	0.99+
6 nodes	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
six years	QUANTITY	0.99+
ten minutes	QUANTITY	0.99+
12 nodes	QUANTITY	0.99+
Shirang	PERSON	0.99+
1,000 files	QUANTITY	0.99+
one	QUANTITY	0.99+
12 shards	QUANTITY	0.99+
forum.vertica.com	OTHER	0.99+
99.9%	QUANTITY	0.99+
two modes	QUANTITY	0.99+
S3	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
first subcluster	QUANTITY	0.99+
first time	QUANTITY	0.99+
two options	QUANTITY	0.99+
first	QUANTITY	0.99+
first option	QUANTITY	0.99+
each	QUANTITY	0.99+
two subclusters	QUANTITY	0.99+
Each node	QUANTITY	0.99+
hundreds of nodes	QUANTITY	0.99+
Today	DATE	0.99+
each app	QUANTITY	0.99+
today	DATE	0.99+
last year	DATE	0.99+
second	QUANTITY	0.99+
One	QUANTITY	0.98+
three nodes	QUANTITY	0.98+
SQL	TITLE	0.98+
Eon Mode	TITLE	0.98+
single container	QUANTITY	0.97+
this week	DATE	0.97+
16 secondary shard subscription	QUANTITY	0.97+
two types	QUANTITY	0.97+
Sizing and Configuring Vertica in Eon Mode for Different Use Cases	TITLE	0.97+
Vertica	TITLE	0.97+
one limitation	QUANTITY	0.97+

UNLIST TILL 4/2 - The Next-Generation Data Underlying Architecture

>> Paige: Hello, everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled, Vertica next generation architecture. I'm Paige Roberts, open social relationship Manager at Vertica, I'll be your host for this session. And joining me is Vertica Chief Architect, Chuck Bear, before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment, in the question box that's below the slides and click submit. So as you think about it, go ahead and type it in, there'll be a Q&A session at the end of the presentation, where we'll answer as many questions, as we're able to during the time. Any questions that we don't get a chance to address, we'll do our best to answer offline. Or alternatively, you can visit the Vertica forums to post your questions there, after the session. Our engineering team is planning to join the forum and keep the conversation going, so you can, it's just sort of like the developers lounge would be in delight conference. It gives you a chance to talk to our engineering team. Also, as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this virtual session is being recorded, and it will be available to view on demand this week, we'll send you a notification, as soon as it's ready. Okay, now, let's get started, over to you, Chuck. >> Chuck: Thanks for the introduction, Paige, Vertica vision is to help customers, get value from structured data. This vision is simple, it doesn't matter what vertical the customer is in. They're all analytics companies, it doesn't matter what the customers environment is, as data is generated everywhere. We also can't do this alone, we know that you need other tools and people to build a complete solution. You know our database is key to delivering on the vision because we need a database that scales. When you start a new database company, you aren't going to win against 30 year old products on features. But from day one, we had something else, an architecture built for analytics performance. This architecture was inspired by the C-store project, combining the best design ideas from academics and industry veterans like Dr. Mike Stonebreaker. Our storage is optimized for performance, we use many computers in parallel. After over 10 years of refinements against various customer workloads, much of the design held up and serendipitously, the fact that we don't store in place updates set Vertica up for success in the cloud as well. These days, there are other tools that embody some of these design ideas. But we have other strengths that are more important than the storage format, where the only good analytics database that runs both on premise and in the cloud, giving customers the option to migrate their workloads, in most convenient and economical environment, or a full data management solution, not just the query tool. Unlike some other choices, ours comes with integration with a sequel ecosystem and full professional support. We organize our product roadmap into four key pillars, plus the cross cutting concerns of open integration and performance and scale. We have big plans to strengthen Vertica, while staying true to our core. This presentation is primarily about the separation pillar, and performance and scale, I'll cover our plans for Eon, our data management architecture, Mart analytic clusters, or fifth generation query executer, and our data storage layer. Let's start with how Vertica manages data, one of the central design points for Vertica was shared nothing, a design that didn't utilize a dedicated hardware shared disk technology. This quote here is how Mike put it politely, but around the Vertica office, shared disk with an LMTB over Mike's dead body. And we did get some early field experience with shared disk, customers, well, in fact will learn on anything if you let them. There were misconfigurations that required certified experts, obscure bugs extent. Another thing about the shared nothing designed for commodity hardware though, and this was in the papers, is that all the data management features like fault tolerance, backup and elasticity have to be done in software. And no matter how much you do, procuring, configuring and maintaining the machines with disks is harder. The software configuration process to add more service may be simple, but capacity planning, racking and stacking is not. The original allure of shared storage returned, this time though, the complexity and economics are different. It's cheaper, even provision storage with a few clicks and only pay for what you need. It expands, contracts and brings the maintenance of the storage close to a team is good at it. But there's a key difference, it's an object store, an object stores don't support the API's and access patterns used by most database software. So another Vertica visionary Ben, set out to exploit Vertica storage organization, which turns out to be a natural fit for modern cloud shared storage. Because Vertica data files are written once and not updated, they match the object storage model perfectly. And so today we have Eon, Eon uses shared storage to hold Vertica data with local disk depot's that act as caches, ensuring that we can get the performance that our customers have come to expect. Essentially Eon in enterprise behave similarly, but we have the benefit of flexible storage. Today Eon has the features our customers expect, it's been developed in tune for years, we have successful customers such as Redpharma, and if you'd like to know more about Eon has helped them succeed in Amazon cloud, I highly suggest reading their case study, which you can find on vertica.com. Eon provides high availability and flexible scaling, sometimes on premise customers with local disks get a little jealous of how recovery and sub-clusters work in Eon. Though we operate on premise, particularly on pure storage, but enterprise also had strengths, the most obvious being that you don't need and short shared storage to run it. So naturally, our vision is to converge the two modes, back into a single Vertica. A Vertica that runs any combination of local disks and shared storage, with full flexibility and portability. This is easy to say, but over the next releases, here's what we'll do. First, we realize that the query executer, optimizer and client drivers and so on, are already the same. Just the transaction handling and data management is different. But there's already more going on, we have peer-to-peer depot operations and other internode transfers. And enterprise also has a network, we could just get files from remote nodes over that network, essentially mimicking the behavior and benefits of shared storage with the layer of software. The only difference at the end of it, will be which storage hold the master copy. In enterprise, the nodes can't drop the files because they're the master copy. Whereas in Eon they can be evicted because it's just the cache, the masters, then shared storage. And in keeping with versus current support for multiple storage locations, we can intermix these approaches at the table level. Getting there as a journey, and we've already taken the first steps. One of the interesting design ideas of the C-store paper is the idea that redundant copies, don't have to have the same physical organization. Different copies can be optimized for different queries, sorted in different ways. Of course, Mike also said to keep the recovery system simple, because it's hard to debug, whenever the recovery system is being used, it's always in a high pressure situation. This turns out to be a contradiction, and the latter idea was better. No down performing stuff, if you don't keep the storage the same. Recovery hardware if you have, to reorganize data in the process. Even query optimization is more complicated. So over the past couple releases, we got rid of non identical buddies. But the storage files can still diverge at the fifth level, because tuple mover operations are synchronized. The same record can end up in different files than different nodes. The next step in our journey, is to make sure both copies are identical. This will help with backup and restore as well, because the second copy doesn't need backed up, or if it is backed up, it appears identical to the deduplication that is going to look present in both backup systems. Simultaneously, we're improving the Vertica networking service to support this new access pattern. In conjunction with identical storage files, we will converge to a recovery system that instantaneous nodes can process queries immediately, by retrieving data they need over the network from the redundant copies as they do in Eon day with even higher performance. The final step then is to unify the catalog and transaction model. Related concepts such as segment and shard, local catalog and shard catalog will be coalesced, as they're really represented the same concepts all along, just in different modes. In the catalog, we'll make slight changes to the definition of a projection, which represents the physical storage organization. The new definition simplifies segmentation and introduces valuable granularities of sharding to support evolution over time, and offers a straightforward migration path for both Eon and enterprise. There's a lot more to our Eon story than just the architectural roadmap. If you missed yesterday's Vertica, in Eon mode presentation about supported cloud, on premise storage option, replays are available. Be sure to catch the upcoming presentation on sizing and configuring vertica and in beyond doors. As we've seen with Eon, Vertica can separate data storage from the compute nodes, allowing machines to quickly fill in for each other, to rebuild fault tolerance. But separating compute and storage is used for much, much more. We now offer powerful, flexible ways for Vertica to add servers and increase access to the data. Vertica nine, this feature is called sub-clusters. It allows computing capacity to be added quickly and incrementally, and isolates workloads from each other. If your exploratory analytics team needs direct access to the source data, they need a lot of machines and not the same number all the time, and you don't 100% trust the kind of queries and user defined functions, they might be using sub-clusters as the solution. While there's much more expensive information available in our other presentation. I'd like to point out the highlights of our latest sub-cluster best practices. We suggest having a primary sub-cluster, this is the one that runs all the time, if you're loading data around the clock. It should be sized for the ETL workloads and also determines the natural shard count. Additional read oriented secondary sub-clusters can be added for real time dashboards, reports and analytics. That way, subclusters can be added or deep provisioned, without disruption to other users. The sub-cluster features of Vertica 9.3 are working well for customers. Yesterday, the Trade Desk presented their use case for Vertica over 300,000 in 5 sub clusters running in the cloud. If you missed a presentation, check out the replay. But we have plans beyond sub-clusters, we're extending sub-clusters to real clusters. For the Vertica savvy, this means the clusters bump, share the same spread ring network. This will provide further isolation, allowing clusters to control their own independent data sets. While replicating all are part of the data from other clusters using a publish subscribe mechanism. Synchronizing data between clusters is a feature customers want to understand the real business for themselves. This vision effects are designed for ancillary aspects, how we should assign resource pools, security policies and balance client connection. We will be simplifying our data segmentation strategy, so that when data that originate in the different clusters meet, they'll still get fully optimized joins, even if those clusters weren't positioned with the same number of nodes per shard. Having a broad vision for data management is a key component to political success. But we also take pride in our execution strategy, when you start a new database from scratch as we did 15 years ago, you won't compete on features. Our key competitive points where speed and scale of analytics, we set a target of 100 x better query performance in traditional databases with path loads. Our storage architecture provides a solid foundation on which to build toward these goals. Every query starts with data retrieval, keeping data sorted, organized by column and compressed by using adaptive caching, to keep the data retrieval time in IO to the bare minimum theoretically required. We also keep the data close to where it will be processed, and you clusters the machines to increase throughput. We have partition pruning a robust optimizer evaluate active use segmentation as part of the physical database designed to keep records close to the other relevant records. So the solid foundation, but we also need optimal execution strategies and tactics. One execution strategy which we built for a long time, but it's still a source of pride, it's how we process expressions. Databases and other systems with general purpose expression evaluators, write a compound expression into a tree. Here I'm using A plus one times B as an example, during execution, if your CPU traverses the tree and compute sub-parts from the whole. Tree traversal often takes more compute cycles than the actual work to be done. Especially in evaluation is a very common operation, so something worth optimizing. One instinct that engineers have is to use what we call, just-in-time or JIT compilation, which means generating code form the CPU into the specific activity expression, and add them. This replaces the tree of boxes that are custom made box for the query. This approach has complexity bugs, but it can be made to work. It has other drawbacks though, it adds a lot to query setup time, especially for short queries. And it pretty much eliminate the ability of mere models, mere mortals to develop user defined functions. If you go back to the problem we're trying to solve, the source of the overhead is the tree traversal. If you increase the batch of records processed in each traversal step, this overhead is amortized until it becomes negligible. It's a perfect match for a columnar storage engine. This also sets the CPU up for efficiency. The CPUs look particularly good, at following the same small sequence of instructions in a tight loop. In some cases, the CPU may even be able to vectorize, and apply the same processing to multiple records to the same instruction. This approach is easy to implement and debug, user defined functions are possible, then generally aligned with the other complexities of implementing and improving a large system. More importantly, the performance, both in terms of query setup and record throughput is dramatically improved. You'll hear me say that we look at research and industry for inspiration. In this case, our findings in line with academic binding. If you'd like to read papers, I recommend everything you always wanted to know about compiled and vectorized queries, don't afraid to ask, so we did have this idea before we read that paper. However, not every decision we made in the Vertica executer that the test of time as well as the expression evaluator. For example, sorting and grouping aren't susceptible to vectorization because sort decisions interrupt the flow. We have used JIT compiling on that for years, and Vertica 401, and it provides modest setups, but we know we can do even better. But who we've embarked on a new design for execution engine, which I call EE five, because it's our best. It's really designed especially for the cloud, now I know what you're thinking, you're thinking, I just put up a slide with an old engine, a new engine, and a sleek play headed up into the clouds. But this isn't just marketing hype, here's what I mean, when I say we've learned lessons over the years, and then we're redesigning the executer for the cloud. And of course, you'll see that the new design works well on premises as well. These changes are just more important for the cloud. Starting with the network layer in the cloud, we can't count on all nodes being connected to the same switch. Multicast doesn't work like it does in a custom data center, so as I mentioned earlier, we're redesigning the network transfer layer for the cloud. Storage in the cloud is different, and I'm not referring here to the storage of persistent data, but to the storage of temporary data used only once during the course of query execution. Our new pattern is designed to take into account the strengths and weaknesses of cloud object storage, where we can't easily do a path. Moving on to memory, many of our access patterns are reasonably effective on bare metal machines, that aren't the best choice on cloud hyperbug that have overheads, page faults or big gap. Here again, we found we can improve performance, a bit on dedicated hardware, and even more in the cloud. Finally, and this is true in all environments, core counts have gone up. And not all of our algorithms take full advantage, there's a lot of ground to cover here. But I think sorting in the perfect example to illustrate these points, I mentioned that we use JIT in sorting. We're getting rid of JIT in favor of a data format that can be treated efficiently, independent of what the data types are. We've drawn on the best, most modern technology from academia and industry. We've got our own analysis and testing, you know what we chose, we chose parallel merge sort, anyone wants to take a guess when merge sort was invented. It was invented in 1948, or at least documented that way, like computing context. If you've heard me talk before, you know that I'm fascinated by how all the things I worked with as an engineer, were invented before I was born. And in Vertica , we don't use the newest technologies, we use the best ones. And what is noble about Vertica is the way we've combined the best ideas together into a cohesive package. So all kidding about the 1940s aside, or he redesigned is actually state of the art. How do we know the sort routine is state of the art? It turns out, there's a pretty credible benchmark or at the appropriately named historic sortbenchmark.org. Anyone with resources looking for fame for their product or academic paper can try to set the record. Record is last set in 2016 with Tencent Sort, 100 terabytes in 99 seconds. Setting the records it's hard, you have to come up with hundreds of machines on a dedicated high speed switching fabric. There's a lot to a distributed sort, there all have core sorting algorithms. The authors of the paper conveniently broke out of the time spent in their sort, 67 out of 99 seconds want to know local sorting. If we break this out, divided by two CPUs and each of 512 nodes, we find that each CPU so there's almost a gig and a half per second. This is for what's called an indy sort, like an Indy race car, is in general purpose. It only handles fixed hundred five records with 10 byte key. There is a record length can vary, then it's called daytona sort, a 10 set daytona sort, is a little slower. One point is 10 gigabytes per second per CPU, now for Verrtica, We have a wide variety ability in record sizes, and more interesting data types, but still no harm in setting us like phone numbers, comfortable to the world record. On my 2017 era AMD desktop CPU, the Vertica EE5 sort to store about two and a half gigabytes per second. Obviously, this test isn't apply to apples because they use their own open power chip. But the number of DRM channels is the same, so it's pretty close the number that says we've hit on the right approach. And it performs this way on premise, in the cloud, and we can adapt it to cloud temp space. So what's our roadmap for integrating EE5 into the product and compare replacing the query executed the database to replacing the crankshaft and other parts of the engine of a car while it's been driven. We've actually done it before, between Vertica three and a half and five, and then we never really stopped changing it, now we'll do it again. The first part in replacing with algorithm called storage merge, which combines sorted data from disk. The first time has was two that are in vertical in incoming 10.0 patch that will be EE5 or resegmented storage merge, and then convert sorting and grouping into do out. There the performance results so far, in cases where the Vertica execute is doing well today, simple environments with simple data patterns, such as this simple capitalistic query, there's a lot of speed up, when we ship the segmentation code, which didn't quite make the freeze as much like to bump longer term, what we do is grouping into the storage of large operations, we'll get to where we think we ought to be, given a theoretical minimum work the CPUs need to do. Now if we look at a case where the current execution isn't doing as well, we see there's a much stronger benefit to the code shipping in Vertica 10. In fact, it turns a chart bar sideways to try to help you see the difference better. This case also benefit from the improvements in 10 product point releases and beyond. They will not happening to the vertical query executer, That was just the taste. But now I'd like to switch to the roadmap first for our adapters layer. I'll start with a story about, how our storage access layer evolved. If you go back to the academic ideas, if you start paper that persuaded investors to fund Vertica, read optimized store was the part that had substantiation in the form of performance data. Much of the paper was speculative, but we tried to follow it anyway. That paper talked about the WS with RS, The rights are in the read store, and how they work together for transaction processing and how there was a supernova. In all honesty, Vertica engineers couldn't figure out from the paper what to do next, incase you want to try, and we asked them they would like, We never got enough clarification to build it that way. But here's what we built, instead. We built the ROS, read optimized store, introduction on steep major revision. It's sorted, ordered columnar and compressed that follows a table partitioning that worked even better than the we are as described in the paper. We also built the last byte optimized store, we built four versions of this over the years actually. But this was the best one, it's not a set of interrelated V tree. It's just an append only, insertion order remember your way here, am sorry, no compression, no base, no partitioning. There is, however, a tuple over which does what we call move out. Move the data from WOS to ROS, sorting and compressing. Let's take a moment to compare how they behave, when you load data directly to the ROS, there's a data parsing operation. Then we finished the sorting, and then compressing right out the columnar data files to stay storage. The next query through executes against the ROS and it runs as it should because the ROS is read optimized. Let's repeat the exercise for WOS, the load operation response before the sorting and compressing, and before the data is written to persistent storage. Now it's possible for a query to come along, and the query could be responsible for sorting the lost data in addition to its other processes. Effect on query isn't predictable until the TM comes along and writes the data to the ROS. Over the years, we've done a lot of comparisons between ROS and WOS. ROS has always been better for sustained load throughput, it achieves much higher records per second without pushing back against the client and hasn't Vertica for when we developed the first usable merge out algorithm. ROS has always been better for predictable query performance, the ROS has never had the same management complexity and limitations as WOS. You don't have to pick a memory size and figure out which transactions get to use the pool. A non persistent nature of ROS always cause headaches when there are unexpected cluster shutdowns. We also looked at field usage data, we found that few customers were using a lot, especially among those that studied the issue carefully. So how we set out on a mission to improve the ROS to the point where it was always better than both the WOS and the profit of the past. And now it's true, ROS is better than the WOS and the loss of a couple of years ago. We implemented storage bundling, better catalog object storage and better tuple mover merge outs. And now, after extensive Q&A and customer testing, we've now succeeded, and in Vertica 10, we've removed the whys. Let's talk for a moment about simplicity, one of the best things Mike Stonebreaker said is no knobs. Anyone want to guess how many knobs we got rid of, and we took the WOS out of the product. 22 were five knobs to control whether it didn't went to ROS as well. Six controlling the ROS itself, Six more to set policies for the typical remove out and so on. In my honest opinion is still wasn't enough control over to achieve excess in a multi tenant environment, the big reason to get rid of the WOS for simplicity. Make the lives of DBAs and users better, we have a long way to go, but we're doing it. On my desk, I keep a jar with the knob in it for each knob in Vertica. When developers add a knob to the product, they have to add a knob to the jar. When they remove a knob, they get to choose one to take out, We have a lot of work to do, but I'm thrilled to report that in 15 years 10 is the first release with a number of knobs ticked downward. Get back to the WOS, I've said the most important thing get rid of it for last. We're getting rid of it so we can deliver our vision of the future to our customer. Remember how he said an Eon and sub-clusters we got all these benefits from shared storage? Guess what can't live in shared storage, the WOS. Remember how it's been a big part of the future was keeping the copies that identical to the primary copy? Independent actions of the WOS took a little at the root of the divergence between copies of the data. You have to admit it when you're wrong. That was in the original design and held up to the a selling point of time, without onto the idea of a separate ROS and WOS for too long. In Vertica, 10, we can finally bid, good reagents. I've covered a lot of ground, so let's put all the pieces together. I've talked a lot about our vision and how we're achieving it. But we also still pay attention to tactical detail. We've been fine tuning our memory management model to enhance performance. That involves revisiting tens of thousands of satellite of code, much like painting the inside of a large building with small paintbrushes. We're getting results as shown in the chart in Vertica nine, concurrent monitoring queries use memory from the global catalog tool, and Vertica 10, they don't. This is only one example of an important detail we're improving. We've also reworked the monitoring tables without network messages into two parts. The increased data we're collecting and analyzing and our quality assurance processes, we're improving on everything. As the story goes, I still have my grandfather's axe, of course, my father had to replace the handle, and I had to replace the head. Along the same lines, we still have Mike Stonebreaker Vertica. We didn't replace the query optimizer twice the debate database designer and storage layer four times each. The query executed is and it's a free design, like charted out how our code has changed over the years. I found that we don't have much from a long time ago, I did some digging, and you know what we have left in 2007. We have the original curly braces, and a little bit of percent code for handling dates and times. To deliver on our mission to help customers get value from their structured data, with high performance of scale, and in diverse deployment environments. We have the sound architecture roadmap, reviews the best execution strategy and solid tactics. On the architectural front, we're converging in an enterprise, we're extending smart analytic clusters. In query processing, we're redesigning the execution engine for the cloud, as I've told you. There's a lot more than just the fast engine. that you want to learn about our new data support for complex data types, improvements to the query optimizer statistics, or extension to live aggregate projections and flatten tables. You should check out some of the other engineering talk that the big data conference. We continue to stay on top of the details from low level CPU and memory too, to the monitoring management, developing tighter feedback cycles between development, Q&A and customers. And don't forget to check out the rest of the pillars of our roadmap. We have new easier ways to get started with Vertica in the cloud. Engineers have been hard at work on machine learning and security. It's easier than ever to use Vertica with third Party product, as a variety of tools integrations continues to increase. Finally, the most important thing we can do, is to help people get value from structured data to help people learn more about Vertica. So hopefully I left plenty of time for Q&A at the end of this presentation. I hope to hear your questions soon.

Published Date : Mar 30 2020

SUMMARY :

and keep the conversation going, and apply the same processing to multiple records

ENTITIES

Entity	Category	Confidence
Mike	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
2007	DATE	0.99+
Chuck Bear	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
2016	DATE	0.99+
Paige Roberts	PERSON	0.99+
Chuck	PERSON	0.99+
second copy	QUANTITY	0.99+
99 seconds	QUANTITY	0.99+
67	QUANTITY	0.99+
100%	QUANTITY	0.99+
1948	DATE	0.99+
Ben	PERSON	0.99+
two modes	QUANTITY	0.99+
Redpharma	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
first steps	QUANTITY	0.99+
Paige	PERSON	0.99+
two parts	QUANTITY	0.99+
First	QUANTITY	0.99+
five knobs	QUANTITY	0.99+
100 terabytes	QUANTITY	0.99+
both copies	QUANTITY	0.99+
Today	DATE	0.99+
each knob	QUANTITY	0.99+
WS	ORGANIZATION	0.99+
AMD	ORGANIZATION	0.99+
Eon	ORGANIZATION	0.99+
1940s	DATE	0.99+
today	DATE	0.99+
One point	QUANTITY	0.99+
first part	QUANTITY	0.99+
fifth level	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.98+
both	QUANTITY	0.98+
Six	QUANTITY	0.98+
first	QUANTITY	0.98+
512 nodes	QUANTITY	0.98+
ROS	TITLE	0.98+
over 10 years	QUANTITY	0.98+
Yesterday	DATE	0.98+
15 years ago	DATE	0.98+
twice	QUANTITY	0.98+
sortbenchmark.org	OTHER	0.98+
first release	QUANTITY	0.98+
two CPUs	QUANTITY	0.97+
Vertica 10	TITLE	0.97+
100 x	QUANTITY	0.97+
WOS	TITLE	0.97+
vertica.com	OTHER	0.97+
10 byte	QUANTITY	0.97+
this week	DATE	0.97+
one	QUANTITY	0.97+
5 sub clusters	QUANTITY	0.97+
two	QUANTITY	0.97+
one example	QUANTITY	0.97+
over 300,000	QUANTITY	0.96+
Dr.	PERSON	0.96+
One	QUANTITY	0.96+
tens of thousands of satellite	QUANTITY	0.96+
EE5	COMMERCIAL_ITEM	0.96+
fifth generation	QUANTITY	0.96+

Roger Barga, AWS | AWS re:Invent 2018

>>From Las Vegas, it's the cube covering AWS reinvent 2018 brought to you by Amazon web services entails their ecosystem partners. >>Okay. Welcome back everybody to the cube live in Las Vegas for AWS Amazon web services reinvent 2018 upshot four with David, Dave, our sixth year covering AWS reinvent. We EV except for the first year we weren't there, but certainly it's been fun to watch the massive, massive rive of the wave of the cloud and Amazon's discipline and execution. Our next guest is Roger Barga, general manager, robotics and autonomous services for Amazon web services. Great to have you thank you for joining us. It's great to be here today. So a lot of stuff to talk about this, Amazon's got like this cult personality, or they do cool things. Uh, they innovate as well as they take care of the basic cloud needs more compute, better networking, more storage, or the core engine, uh, robotics, autonomous, you think of cars, you think of future flying drones, maybe in the future. >>What's going on? What are you, what are you working on? I think it mentioned explain what your job is and what you're doing at Amazon. I think it's super important. We actually look at robots as being anything that census computes and acts, and that opens up such a wide range of the definition of robot from a washing machine to escape the system to the robots. We think of actually that's the full spectrum is what we're trying to address. And we've announced a new service called AWS robo maker. It is designed to support the end to end application development life cycle for building intelligent robot, deploying it to one 10 hundreds, thousands of robots out in the field, monitoring them. We are really addressing the developer need on how to build and scale and run a robotics business. You know, what really resonates with me and, uh, with you guys at Andy's keynote this morning was he used the word builder a lot of times, um, tool for the right job. >>I think that really connects with the culture that we're seeing in the world today. Maker fair started it out. Robotics clubs in high schools were probably at an all time high in terms of interests. It's not just a nerdy geek thing. It's actually kind of mainstream. People are attracted to rabbis. People have wearables. So you're seeing a world where technology and robotics are colliding. So this kind of falls into the new kind of persona developers that's out there. Who's building a robotic stuff. It used to be some like special group of people. Not anymore. Explain how you guys are going after the developers with this. Okay. So it is very focused on the developer. And we started talking to our internal customers who are building robots. We started talking to external customers, building robots to really understand the struggles that they had and have to face. >>And you actually realized that the roboticists tend to actually are deepened hardware, drivers, actuators, sensors, and they are forced to be software engineers at the same time, because there's just not ready-made software and they have to go roll their own tooling. So we're actually providing them with the tools so they can actually focus on the hardware and the innovation that goes on there, or adding the intelligence to the robot to carry out the more meaningful task. And again, we've had conversations with companies that are, that are building small appliances that basically they think of as a robot, a dishwasher that has sensors, they've actually sense how the water flow is going the temperature and then take action all the way to our group. That's actually putting a robot in the space station to take photographs all over underwater robots, air robots, and the drones. So those deed came in robotic competitions, right? >>You're familiar with those, right? It was all high school kids. And there's always a hardware team, which is kind of clear. And then the software team, which always struggled. So I'm envisioning these guys are now going to be using robo maker as part of that team. So if I understand it, the mission is kind of develop secure, deploy, and manage robotic apps. That's really what you guys are a little bit more also, please. So we've actually bundled in our cloud service for machine learning, for analytics and for monitoring. And so now with Amazon Polly and Amazon Lex integration, you can talk to your robot, your robot can respond to you. We can stream the video off the robot through Kinesis, video streams and send it to recognition. So the robot can actually see, you'll be able to see what your robot is seeing, run it through recognition. >>You can identify what it's, what it's seeing and be able to tell it, go to the refrigerator. And it knows where the refrigerator is something else we have done. I think it's interesting to share with you is that we've actually working with something called the robot operating system, which is the most commonly used open source software framework for robotics ROS. Um, we have contributed all of our cloud extensions as open source to the community. And we're also technical steering committee members for Ross two, which is the next generation of Ross. We like to think of it as a commercial grade version of Ross, the Linux for robots. And we're also contributing open source to that as well, because what you'll find is this is what developers are using and reusing. So if you have a sensor or an actuator for a robot you'd like to use, you're probably going to find ross' package already out there to actually drive that sensor or drive that actuator that you can use. >>And now you see new ones for our cloud services that you can turn monitoring on machine learning services on as well. So you contribute to open source community you're so that's going to accelerate the adoption. So you're also making it easier. I want you to explain how you guys are working to do that because if this kind of continues on this track is going to remove some of the blockers or the barriers to get into this and that's to get the applications up and running, which should have a impact on like fleet management to, you know, anything. I mean, that's really the problem statement here. Isn't it, it really isn't, it's really what our mission is. We're always looking at developers and how we can accelerate them and make them more productive. Let's say the three of us wanted to go off and build a robotics application. >>We'd have to make sure that the environment and all of our machines are the same, because you might have a DLL, a different DLL or a different package, which means when we deploy to the robot, we're breaking it. We're not consistent. We actually offer a cloud development environment for robotics. With one click off the AWS management console. You can choose the operating system that you'd like to deploy to your robot. It'll download it. It'll configure that for you. It'll create scalable storage to store the artifacts. As we build our robot and try different algorithms out it'll provision compute for, to compile our, our robot application. We even have pre-built applications to get you started and you have access to all the ROS packages. And so within minutes we could it be up and working together, writing a robotics application. That's just part of it though. >>So again, I talked about the cloud service extensions, but simulation is such a huge thing because we may not even have a robot bill yet. And we want to simulate our robot. We offer pre-built worlds like a room in a house or a retail store or a racetrack for the race car that you heard about today. And you can drop your robot in these environments and test it. You can turn a physics model on and say, my robots carrying 500 pounds simulate. When you're happy with it, then you can deploy that over the air to your actual robot and the simulation. You can actually run hundreds of them in parallel, faster than wall clock time. So it's literally, we could actually do a thousand simulation hours, probably in 15 or 20 minutes to test our robot and all this compute, you spin up a supercomputer, basically bring it all together. >>You mentioned the formula. One thing, that's interesting. What insights can come into this. And I want to get down to the intelligence piece because when I met Andy, I just wrote an article yesterday on Forbes with my, on my interview with him, he made a comment. I want to add to the conversation. He said, the clouds are the brains on premise as their environment. So robots will deep rains. So talk about the connection to the AWS. Yes. So that's a key part, right? It connects to the, they got a lot of brains. So you got a lot of opportunities to connect services. What kinds of services do you envision connecting to the robots? Okay. So what was announced today with the race car it's at that car is actually trained in robo maker through simulation, through reinforcement learning. And so hundreds of simulations of the car, trying to go around the track, all that information is being fed to SageMaker, which is using its reinforcement learning to actually build an algorithm, a better algorithm, and then pulling it back to the car and trying it over and over again. >>That's how you actually train the car and you see that beautiful partitioning with the cloud, big compute, reinforcement learning, large datasets. The car wants you to deploy the machine learning model to the car. It can actually continue to set up signals for more information. So as the car is being used for racing, you're still learning. It's still updating the model. So again, this beautiful part, how's that how's that data flow. So you have data coming off the car, you send it back up to the cloud, you then that's where the heavy modeling occurs. And then you push it back down. The small machine learning model, back down, we have Kinesis data streams. We also have IOT MQTT messages. We can send back up to the cloud and you really start to see the role of the cloud. When we have hundreds of devices out, each one might make a mistake every once in a while, but collectively you're getting a large training set for returning a model and pushing it back down. >>It's where deep learning really adds value, too. It really is. And you mentioned adding more personality to it before we came on camera robot, you saw, this is really kind of where it's going to really kind of make it personalized. It, it is. And in fact, Leah, it's this it's a robot that's made by by robot care systems, excuse me, robot care services. And Leah is an intelligent robotic Walker. Absolutely brilliant. The elderly and disabled canal live more independent, more agile lives. Um, it has 72 sensors since compute act. It figures out what the user is trying to do. The user now can actually interact with it with voice through our Amazon Polly and Amazon's Lex integrations. So with the walkers across the room, the user can say, Leah, come to me and Leah will actually motor over to the user user can get on. >>Leah will sense that it's carrying load and it can say, Leah, let's go to the front door and Leah will start moving our way to the front door. That's just so natural. And that's the impact of real life impact of that. People who live alone, could it be diabetes or maybe something as they get sick robot could be tied into a health meter. I mean, this is kind of real world scenarios that aren't far away. No they're happening now. It's happening right now. And again, you're starting to see the value that robots are going to bring to our lives. And again, robotics has to have such hard problems to solve with the hardware and that algorithm, the writing. We really don't want the other work to have to be a burden for them. We really want to simplify that. So I'll talk about the CHAM, the total market adjustability here, because the F the formula one, the developers, I get that Jennifer's I get the formula one. Is there a market for robots? Who's doing it. Where is it? Is, is it embryonic and early? Is it, how's this forming you in your mind? Um, marketplace, as we've looked at this, we have been amazed at all the places we're finding robots. Again, we see robots underwater. We see drones in the air. We see robotic arms and factories. We see them in education. I have yet to see an area where a robot can assist or carry out tasks to help humans. How about doing interviews? >>Yeah. We're not gonna be replaced yet. Although we have >>Robot on the cube one, despite the fact that we'd like to think how advanced robots are, you can't replace humans, not the NR, the mobility, our intelligence or personality. So if the number of things robots could do keeps getting, >>Yeah, it wasn't, it wasn't that long ago, robots couldn't climb stairs. >>That's right. That's right. Amazing. Let's talk about your goals for the year. What are you trying to do with the, with the service? Um, and what can people expect to see coming from AWS? We're definitely going to be listening to our customers now that we've launched and we're working backwards to actually add features that they tell us. They'd like to see. We're really pleased that we've got a partnership with first robotics. We want to work with with first, actually bring our service to allow students and learners of all ages to learn robotics. We have an education and research program with about 25 universities with more signing on as well. They're very interested in using the service for teaching robotics and for education and research as well. So I really want to, we really want to push hard there's because we think robotics has a great future. >>It's going to help our lives. And we think robo makers, the way that they're going to do, I can tell you from my four living in Palo Alto, which is again, a different zip code than middle America, robotics is hot. People like robotics. They like to play with the robotics. And it has now it's software democratization tools and frameworks. You don't need to be a rocket scientist to code sheet language. Yeah. Yeah. That's I think the power of our service is that basically the developers no longer limited to the code. They write in the software. They can hardware that can put on their robot that can take advantage of cloud services, glue them together and start building a robot. Well, we are very interested in covering, uh, what goes on with your area and certainly want to know more about how the community's developing. Certainly the open source I think, is going to be a very big part of your plan. We agree. We're committed. Roger. Thanks for coming on. Great insight, robo maker. One of the top announcements is a great demo on the keynote, uh, from, uh, the formula one, uh, spokesperson. I think the executive great demo that I think is worth watching. Congratulations on the success or cube coverage here. No robots here. We're live coverage. Re-invent 2018. We right back.

Published Date : Nov 28 2018

SUMMARY :

brought to you by Amazon web services entails their ecosystem Great to have you thank you for joining us. We are really addressing the developer need on how to build We started talking to external customers, building robots to really understand the struggles or adding the intelligence to the robot to carry out the more meaningful task. So the robot can actually see, you'll be able to see what your robot is seeing, run it through recognition. I think it's interesting to share with you is So you contribute to open source community you're so that's going to accelerate the adoption. We even have pre-built applications to get you started over the air to your actual robot and the simulation. So talk about the connection to the AWS. We can send back up to the cloud and you really start to see the role of the cloud. to it before we came on camera robot, you saw, this is really kind of where it's going to really kind of make it personalized. robotics has to have such hard problems to solve with the hardware and that algorithm, Although we have Robot on the cube one, despite the fact that we'd like to think how advanced robots are, you can't replace humans, We're definitely going to be listening to our customers now that we've launched and we're working backwards to actually Certainly the open source I think, is going to be a very big part

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
Roger Barga	PERSON	0.99+
David	PERSON	0.99+
Palo Alto	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
500 pounds	QUANTITY	0.99+
15	QUANTITY	0.99+
Roger	PERSON	0.99+
three	QUANTITY	0.99+
72 sensors	QUANTITY	0.99+
sixth year	QUANTITY	0.99+
Jennifer	PERSON	0.99+
Leah	PERSON	0.99+
today	DATE	0.99+
yesterday	DATE	0.99+
Las Vegas	LOCATION	0.99+
20 minutes	QUANTITY	0.99+
Linux	TITLE	0.98+
first	QUANTITY	0.98+
four	QUANTITY	0.98+
Kinesis	TITLE	0.98+
hundreds of devices	QUANTITY	0.98+
Ross	ORGANIZATION	0.97+
Dave	PERSON	0.97+
One thing	QUANTITY	0.97+
each one	QUANTITY	0.96+
about 25 universities	QUANTITY	0.96+
One	QUANTITY	0.95+
first year	QUANTITY	0.95+
hundreds of simulations	QUANTITY	0.94+
one	QUANTITY	0.93+
thousands of robots	QUANTITY	0.86+
2018	DATE	0.85+
this morning	DATE	0.84+
hundreds of them	QUANTITY	0.81+
Amazon web	ORGANIZATION	0.79+
a thousand simulation hours	QUANTITY	0.78+
one 10 hundreds	QUANTITY	0.76+
middle America	LOCATION	0.7+
Leah	ORGANIZATION	0.69+
Amazon Polly	ORGANIZATION	0.69+
CHAM	ORGANIZATION	0.65+
Lex	TITLE	0.63+
ROS	TITLE	0.63+
Kinesis	ORGANIZATION	0.62+
invent	EVENT	0.61+
SageMaker	ORGANIZATION	0.57+
Invent 2018	EVENT	0.57+
Forbes	ORGANIZATION	0.56+
re:	EVENT	0.48+
Polly	COMMERCIAL_ITEM	0.47+
once	QUANTITY	0.45+
reinvent	EVENT	0.41+
two	QUANTITY	0.37+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for ROS: