UNLIST TILL 4/2 - The Shortest Path to Vertica – Best Practices for Data Warehouse Migration and ETL

hello everybody and thank you for joining us today for the virtual verdict of BBC 2020 today's breakout session is entitled the shortest path to Vertica best practices for data warehouse migration ETL I'm Jeff Healey I'll leave verdict and marketing I'll be your host for this breakout session joining me today are Marco guesser and Mauricio lychee vertical product engineer is joining us from yume region but before we begin I encourage you to submit questions or comments or in the virtual session don't have to wait just type question in a comment in the question box below the slides that click Submit as always there will be a Q&A session the end of the presentation will answer as many questions were able to during that time any questions we don't address we'll do our best to answer them offline alternatively visit Vertica forums that formed at vertical comm to post your questions there after the session our engineering team is planning to join the forums to keep the conversation going also reminder that you can maximize your screen by clicking the double arrow button and lower right corner of the sides and yes this virtual session is being recorded be available to view on demand this week send you a notification as soon as it's ready now let's get started over to you mark marco andretti oh hello everybody this is Marco speaking a sales engineer from Amir said I'll just get going ah this is the agenda part one will be done by me part two will be done by Mauricio the agenda is as you can see big bang or piece by piece and the migration of the DTL migration of the physical data model migration of et I saw VTL + bi functionality what to do with store procedures what to do with any possible existing user defined functions and migration of the data doctor will be by Maurice it you want to talk about emeritus Rider yeah hello everybody my name is Mauricio Felicia and I'm a birth record pre-sales like Marco I'm going to talk about how to optimize that were always using some specific vertical techniques like table flattening live aggregated projections so let me start with be a quick overview of the data browser migration process we are going to talk about today and normally we often suggest to start migrating the current that allows the older disease with limited or minimal changes in the overall architecture and yeah clearly we will have to port the DDL or to redirect the data access tool and we will platform but we should minimizing the initial phase the amount of changes in order to go go live as soon as possible this is something that we also suggest in the second phase we can start optimizing Bill arouse and which again with no or minimal changes in the architecture as such and during this optimization phase we can create for example dog projections or for some specific query or optimize encoding or change some of the visual spools this is something that we normally do if and when needed and finally and again if and when needed we go through the architectural design for these operations using full vertical techniques in order to take advantage of all the features we have in vertical and this is normally an iterative approach so we go back to name some of the specific feature before moving back to the architecture and science we are going through this process in the next few slides ok instead in order to encourage everyone to keep using their common sense when migrating to a new database management system people are you often afraid of it it's just often useful to use the analogy of how smooth in your old home you might have developed solutions for your everyday life that make perfect sense there for example if your old cent burner dog can't walk anymore you might be using a fork lifter to heap in through your window in the old home well in the new home consider the elevator and don't complain that the window is too small to fit the dog through this is very much in the same way as Narita but starting to make the transition gentle again I love to remain in my analogy with the house move picture your new house as your new holiday home begin to install everything you miss and everything you like from your old home once you have everything you need in your new house you can shut down themselves the old one so move each by feet and go for quick wins to make your audience happy you do bigbang only if they are going to retire the platform you are sitting on where you're really on a sinking ship otherwise again identify quick wings implement published and quickly in Vertica reap the benefits enjoy the applause use the gained reputation for further funding and if you find that nobody's using the old platform anymore you can shut it down if you really have to migrate you can still go to really go to big battle in one go only if you absolutely have to otherwise migrate by subject area use the group all similar clear divisions right having said that ah you start off by migrating objects objects in the database that's one of the very first steps it consists of migrating verbs the places where you can put the other objects into that is owners locations which is usually schemers then what do you have that you extract tables news then you convert the object definition deploy them to Vertica and think that you shouldn't do it manually never type what you can generate ultimate whatever you can use it enrolls usually there is a system tables in the old database that contains all the roads you can export those to a file reformat them and then you have a create role and create user scripts that you can apply to Vertica if LDAP Active Directory was used for the authentication the old database vertical supports anything within the l dubs standard catalogued schemas should be relatively straightforward with maybe sometimes the difference Vertica does not restrict you by defining a schema as a collection of all objects owned by a user but it supports it emulates it for old times sake Vertica does not need the catalog or if you absolutely need the catalog from the old tools that you use it it usually said it is always set to the name of the database in case of vertical having had now the schemas the catalogs the users and roles in place move the take the definition language of Jesus thought if you are allowed to it's best to use a tool that translates to date types in the PTL generated you might see as a mention of old idea to listen by memory to by the way several times in this presentation we are very happy to have it it actually can export the old database table definition because they got it works with the odbc it gets what the old database ODBC driver translates to ODBC and then it has internal translation tables to several target schema to several target DBMS flavors the most important which is obviously vertical if they force you to use something else there are always tubes like sequel plots in Oracle the show table command in Tara data etc H each DBMS should have a set of tools to extract the object definitions to be deployed in the other instance of the same DBMS ah if I talk about youth views usually a very new definition also in the old database catalog one thing that you might you you use special a bit of special care synonyms is something that were to get emulated different ways depending on the specific needs I said I stop you on the view or table to be referred to or something that is really neat but other databases don't have the search path in particular that works that works very much like the path environment variable in Windows or Linux where you specify in a table an object name without the schema name and then it searched it first in the first entry of the search path then in a second then in third which makes synonym hugely completely unneeded when you generate uvl we remained in the analogy of moving house dust and clean your stuff before placing it in the new house if you see a table like the one here at the bottom this is usually corpse of a bad migration in the past already an ID is usually an integer and not an almost floating-point data type a first name hardly ever has 256 characters and that if it's called higher DT it's not necessarily needed to store the second when somebody was hired so take good care in using while you are moving dust off your stuff and use better data types the same applies especially could string how many bytes does a string container contains for eurozone's it's not for it's actually 12 euros in utf-8 in the way that Vertica encodes strings and ASCII characters one died but the Euro sign thinks three that means that you have to very often you have when you have a single byte character set up a source you have to pay attention oversize it first because otherwise it gets rejected or truncated and then you you will have to very carefully check what their best science is the best promising is the most promising approach is to initially dimension strings in multiples of very initial length and again ODP with the command you see there would be - I you 2 comma 4 will double the lengths of what otherwise will single byte character and multiply that for the length of characters that are wide characters in traditional databases and then load the representative sample of your cells data and profile using the tools that we personally use to find the actually longest datatype and then make them shorter notice you might be talking about the issues of having too long and too big data types on projection design are we live and die with our projects you might know remember the rules on how default projects has come to exist the way that we do initially would be just like for the profiling load a representative sample of the data collector representative set of already known queries from the Vertica database designer and you don't have to decide immediately you can always amend things and otherwise follow the laws of physics avoid moving data back and forth across nodes avoid heavy iOS if you can design your your projections initially by hand encoding matters you know that the database designer is a very tight fisted thing it would optimize to use as little space as possible you will have to think of the fact that if you compress very well you might end up using more time in reading it this is the testimony to run once using several encoding types and you see that they are l e is the wrong length encoded if sorted is not even visible while the others are considerably slower you can get those nights and look it in look at them in detail I will go in detail you now hear about it VI migrations move usually you can expect 80% of everything to work to be able to live to be lifted and shifted you don't need most of the pre aggregated tables because we have live like regain projections many BI tools have specialized query objects for the dimensions and the facts and we have the possibility to use flatten tables that are going to be talked about later you might have to ride those by hand you will be able to switch off casting because vertical speeds of everything with laps Lyle aggregate projections and you have worked with molap cubes before you very probably won't meet them at all ETL tools what you will have to do is if you do it row by row in the old database consider changing everything to very big transactions and if you use in search statements with parameter markers consider writing to make pipes and using verticals copy command mouse inserts yeah copy c'mon that's what I have here ask you custom functionality you can see on this slide the verticals the biggest number of functions in the database we compare them regularly by far compared to any other database you might find that many of them that you have written won't be needed on the new database so look at the vertical catalog instead of trying to look to migrate a function that you don't need stored procedures are very often used in the old database to overcome their shortcomings that Vertica doesn't have very rarely you will have to actually write a procedure that involves a loop but it's really in our experience very very rarely usually you can just switch to standard scripting and this is basically repeating what Mauricio said in the interest of time I will skip this look at this one here the most of the database data warehouse migration talks should be automatic you can use you can automate GDL migration using ODB which is crucial data profiling it's not crucial but game-changing the encoding is the same thing you can automate at you using our database designer the physical data model optimization in general is game-changing you have the database designer use the provisioning use the old platforms tools to generate the SQL you have no objects without their onus is crucial and asking functions and procedures they are only crucial if they depict the company's intellectual property otherwise you can almost always replace them with something else that's it from me for now Thank You Marco Thank You Marco so we will now point our presentation talking about some of the Vertica that overall the presentation techniques that we can implement in order to improve the general efficiency of the dot arouse and let me start with a few simple messages well the first one is that you are supposed to optimize only if and when this is needed in most of the cases just a little shift from the old that allows to birth will provide you exhaust the person as if you were looking for or even better so in this case probably is not really needed to to optimize anything in case you want optimize or you need to optimize then keep in mind some of the vertical peculiarities for example implement delete and updates in the vertical way use live aggregate projections in order to avoid or better in order to limit the goodbye executions at one time used for flattening in order to avoid or limit joint and and then you can also implement invert have some specific birth extensions life for example time series analysis or machine learning on top of your data we will now start by reviewing the first of these ballots optimize if and when needed well if this is okay I mean if you get when you migrate from the old data where else to birth without any optimization if the first four month level is okay then probably you only took my jacketing but this is not the case one very easier to dispute in session technique that you can ask is to ask basket cells to optimize the physical data model using the birth ticket of a designer how well DB deal which is the vertical database designer has several interfaces here I'm going to use what we call the DB DB programmatic API so basically sequel functions and using other databases you might need to hire experts looking at your data your data browser your table definition creating indexes or whatever in vertical all you need is to run something like these are simple as six single sequel statement to get a very well optimized physical base model you see that we start creating a new design then we had to be redesigned tables and queries the queries that we want to optimize we set our target in this case we are tuning the physical data model in order to maximize query performances this is why we are using my design query and in our statement another possible journal tip would be to tune in order to reduce storage or a mix between during storage and cheering queries and finally we asked Vertica to produce and deploy these optimized design in a matter of literally it's a matter of minutes and in a few minutes what you can get is a fully optimized fiscal data model okay this is something very very easy to implement keep in mind some of the vertical peculiarities Vaska is very well tuned for load and query operations aunt Berta bright rose container to biscuits hi the Pharos container is a group of files we will never ever change the content of this file the fact that the Rose containers files are never modified is one of the political peculiarities and these approach led us to use minimal locks we can add multiple load operations in parallel against the very same table assuming we don't have a primary or unique constraint on the target table in parallel as a sage because they will end up in two different growth containers salad in read committed requires in not rocket fuel and can run concurrently with insert selected because the Select will work on a snapshot of the catalog when the transaction start this is what we call snapshot isolation the kappa recovery because we never change our rows files are very simple and robust so we have a huge amount of bandages due to the fact that we never change the content of B rows files contain indiarose containers but on the other side believes and updates require a little attention so what about delete first when you believe in the ethica you basically create a new object able it back so it appeared a bit later in the Rose or in memory and this vector will point to the data being deleted so that when the feed is executed Vertica will just ignore the rules listed in B delete records and it's not just about the leak and updating vertical consists of two operations delete and insert merge consists of either insert or update which interim is made of the little insert so basically if we tuned how the delete work we will also have tune the update in the merge so what should we do in order to optimize delete well remember what we said that every time we please actually we create a new object a delete vector so avoid committing believe and update too often we reduce work the work for the merge out for the removal method out activities that are run afterwards and be sure that all the interested projections will contain the column views in the dedicate this will let workers directly after access the projection without having to go through the super projection in order to create the vector and the delete will be much much faster and finally another very interesting optimization technique is trying to segregate the update and delete operation from Pyrenean third workload in order to reduce lock contention beliefs something we are going to discuss and these contain using partition partition operation this is exactly what I want to talk about now here you have a typical that arouse architecture so we have data arriving in a landing zone where the data is loaded that is from the data sources then we have a transformation a year writing into a staging area that in turn will feed the partitions block of data in the green data structure we have at the end those green data structure we have at the end are the ones used by the data access tools when they run their queries sometimes we might need to change old data for example because we have late records or maybe because we want to fix some errors that have been originated in the facilities so what we do in this case is we just copied back the partition we want to change or we want to adjust from the green interior a the end to the stage in the area we have a very fast operation which is Tokyo Station then we run our updates or our adjustment procedure or whatever we need in order to fix the errors in the data in the staging area and at the very same time people continues to you with green data structures that are at the end so we will never have contention between the two operations when we updating the staging area is completed what we have to do is just to run a swap partition between tables in order to swap the data that we just finished to adjust in be staging zone to the query area that is the green one at the end this swap partition is very fast is an atomic operation and basically what will happens is just that well exchange the pointer to the data this is a very very effective techniques and lot of customer useless so why flops on table and live aggregate for injections well basically we use slot in table and live aggregate objection to minimize or avoid joint this is what flatten table are used for or goodbye and this is what live aggregate projections are used for now compared to traditional data warehouses better can store and process and aggregate and join order of magnitudes more data that is a true columnar database joint and goodbye normally are not a problem at all they run faster than any traditional data browse that page there are still scenarios were deficits are so big and we are talking about petabytes of data and so quickly going that would mean be something in order to boost drop by and join performances and this is why you can't reduce live aggregate projections to perform aggregations hard loading time and limit the need for global appear on time and flux and tables to combine information from different entity uploading time and again avoid running joint has query undefined okay so live aggregate projections at this point in time we can use live aggregate projections using for built in aggregate functions which are some min Max and count okay let's see how this works suppose that you have a normal table in this case we have a table unit sold with three columns PIB their time and quantity which has been segmented in a given way and on top of this base table we call it uncle table we create a projection you see that we create the projection using the salad that will aggregate the data we get the PID we get the date portion of the time and we get the sum of quantity from from the base table grouping on the first two columns so PID and the date portion of day time okay what happens in this case when we load data into the base table all we have to do with load data into the base table when we load data into the base table we will feel of course big injections that assuming we are running with k61 we will have to projection to projections and we will know the data in those two projection with all the detail in data we are going to load into the table so PAB playtime and quantity but at the very same time at the very same time and without having to do nothing any any particular operation or without having to run any any ETL procedure we will also get automatically in the live aggregate projection for the data pre aggregated with be a big day portion of day time and the sum of quantity into the table name total quantity you see is something that we get for free without having to run any specific procedure and this is very very efficient so the key concept is that during the loading operation from VDL point of view is executed again the base table we do not explicitly aggregate data or we don't have any any plc do the aggregation is automatic and we'll bring the pizza to be live aggregate projection every time we go into the base table you see the two selection that we have we have on in this line on the left side and you see that those two selects will produce exactly the same result so running select PA did they trying some quantity from the base table or running the select star from the live aggregate projection will result exactly in the same data you know this is of course very useful but is much more useful result that if we and we can observe this if we run an explained if we run the select against the base table asking for this group data what happens behind the scene is that basically vertical itself that is a live aggregate projection with the data that has been already aggregating loading phase and rewrite your query using polite aggregate projection this happens automatically you see this is a query that ran a group by against unit sold and vertical decided to rewrite this clearly as something that has to be collected against the light aggregates projection because if I decrease this will save a huge amount of time and effort during the ETL cycle okay and is not just limited to be information you want to aggregate for example another query like select count this thing you might note that can't be seen better basically our goodbyes will also take advantage of the live aggregate injection and again this is something that happens automatically you don't have to do anything to get this okay one thing that we have to keep very very clear in mind Brassica what what we store in the live aggregate for injection are basically partially aggregated beta so in this example we have two inserts okay you see that we have the first insert that is entered in four volts and the second insert which is inserting five rules well in for each of these insert we will have a partial aggregation you will never know that after the first insert you will have a second one so better will calculate the aggregation of the data every time irin be insert it is a key concept and be also means that you can imagine lies the effectiveness of bees technique by inserting large chunk of data ok if you insert data row by row this technique live aggregate rejection is not very useful because for every goal that you insert you will have an aggregation so basically they'll live aggregate injection will end up containing the same number of rows that you have in the base table but if you everytime insert a large chunk of data the number of the aggregations that you will have in the library get from structure is much less than B base data so this is this is a key concept you can see how these works by counting the number of rows that you have in alive aggregate injection you see that if you run the select count star from the solved live aggregate rejection the query on the left side you will get four rules but actually if you explain this query you will see that he was reading six rows so this was because every of those two inserts that we're actively interested a few rows in three rows in India in the live aggregate projection so this is a key concept live aggregate projection keep partially aggregated data this final aggregation will always happen at runtime okay another which is very similar to be live aggregate projection or what we call top K projection we actually do not aggregate anything in the top case injection we just keep the last or limit the amount of rows that we collect using the limit over partition by all the by clothes and this again in this case we create on top of the base stable to top gay projection want to keep the last quantity that has been sold and the other one to keep the max quantity in both cases is just a matter of ordering the data in the first case using the B time column in the second page using quantity in both cases we fill projection with just the last roof and again this is something that we do when we insert data into the base table and this is something that happens automatically okay if we now run after the insert our select against either the max quantity okay or be lost wanted it okay we will get the very last you see that we have much less rows in the top k projections okay we told at the beginning that basically we can use for built-in function you might remember me max sum and count what if I want to create my own specific aggregation on top of the lid and customer sum up because our customers have very specific needs in terms of live aggregate projections well in this case you can code your own live aggregate production user-defined functions so you can create the user-defined transport function to implement any sort of complex aggregation while loading data basically after you implemented miss VPS you can deploy using a be pre pass approach that basically means the data is aggregated as loading time during the data ingestion or the batch approach that means that the data is when that woman is running on top which things to remember on live a granade projections they are limited to be built in function again some max min and count but you can call your own you DTF so you can do whatever you want they can reference only one table and for bass cab version before 9.3 it was impossible to update or delete on the uncle table this limit has been removed in 9.3 so you now can update and delete data from the uncle table okay live aggregate projection will follow the segmentation of the group by expression and in some cases the best optimizer can decide to pick the live aggregates objection or not depending on if depending on the fact that the aggregation is a consistent or not remember that if we insert and commit every single role to be uncoachable then we will end up with a live aggregate indirection that contains exactly the same number of rows in this case living block or using the base table it would be the same okay so this is one of the two fantastic techniques that we can implement in Burtka this live aggregate projection is basically to avoid or limit goodbyes the other which we are going to talk about is cutting table and be reused in order to avoid the means for joins remember that K is very fast running joints but when we scale up to petabytes of beta we need to boost and this is what we have in order to have is problem fixed regardless the amount of data we are dealing with so how what about suction table let me start with normalized schemas everybody knows what is a normalized scheme under is no but related stuff in this slide the main scope of an normalized schema is to reduce data redundancies so and the fact that we reduce data analysis is a good thing because we will obtain fast and more brides we will have to write into a database small chunks of data into the right table the problem with these normalized schemas is that when you run your queries you have to put together the information that arrives from different table and be required to run joint again jointly that again normally is very good to run joint but sometimes the amount of data makes not easy to deal with joints and joints sometimes are not easy to tune what happens in in the normal let's say traditional data browser is that we D normalize the schemas normally either manually or using an ETL so basically we have on one side in this light on the left side the normalized schemas where we can get very fast right on the other side on the left we have the wider table where we run all the three joints and pre aggregation in order to prepare the data for the queries and so we will have fast bribes on the left fast reads on the Left sorry fast bra on the right and fast read on the left side of these slides the probability in the middle because we will push all the complexity in the middle in the ETL that will have to transform be normalized schema into the water table and the way we normally implement these either manually using procedures that we call the door using ETL this is what happens in traditional data warehouse is that we will have to coach in ETL layer in order to round the insert select that will feed from the normalized schema and right into the widest table at the end the one that is used by the data access tools we we are going to to view store to run our theories so this approach is costly because of course someone will have to code this ETL and is slow because someone will have to execute those batches normally overnight after loading the data and maybe someone will have to check the following morning that everything was ok with the batch and is resource intensive of course and is also human being intensive because of the people that will have to code and check the results it ever thrown because it can fail and introduce a latency because there is a get in the time axis between the time t0 when you load the data into be normalized schema and the time t1 when we get the data finally ready to be to be queried so what would be inverter to facilitate this process is to create this flatten table with the flattened T work first you avoid data redundancy because you don't need the wide table on the normalized schema on the left side second is fully automatic you don't have to do anything you just have to insert the data into the water table and the ETL that you have coded is transformed into an insert select by vatika automatically you don't have to do anything it's robust and this Latin c0 is a single fast as soon as you load the data into the water table you will get all the joints executed for you so let's have a look on how it works in this case we have the table we are going to flatten and basically we have to focus on two different clauses the first one is you see that there is one table here I mentioned value 1 which can be defined as default and then the Select or set using okay the difference between the fold and set using is when the data is populated if we use default data is populated as soon as we know the data into the base table if we use set using Google Earth to refresh but everything is there I mean you don't need them ETL you don't need to code any transformation because everything is in the table definition itself and it's for free and of course is in latency zero so as soon as you load the other columns you will have the dimension value valued as well okay let's see an example here suppose here we have a dimension table customer dimension that is on the left side and we have a fact table on on the right you see that the fact table uses columns like o underscore name or Oh the score city which are basically the result of the salad on top of the customer dimension so Beezus were the join is executed as soon as a remote data into the fact table directly into the fact table without of course loading data that arise from the dimension all the data from the dimension will be populated automatically so let's have an example here suppose that we are running this insert as you can see we are running be inserted directly into the fact table and we are loading o ID customer ID and total we are not loading made a major name no city those name and city will be automatically populated by Vertica for you because of the definition of the flood table okay you see behave well all you need in order to have your widest tables built for you your flattened table and this means that at runtime you won't need any join between base fuck table and the customer dimension that we have used in order to calculate name and city because the data is already there this was using default the other option was is using set using the concept is absolutely the same you see that in this case on the on the right side we have we have basically replaced this all on the school name default with all underscore name set using and same is true for city the concept that I said is the same but in this case which we set using then we will have to refresh you see that we have to run these select trash columns and then the name of the table in this case all columns will be fresh or you can specify only certain columns and this will bring the values for name and city reading from the customer dimension so this technique this technique is extremely useful the difference between default and said choosing just to summarize the most important differences remember you just have to remember that default will relate your target when you load set using when you refresh end and in some cases you might need to use them both so in some cases you might want to use both default end set using in this example here we'll see that we define the underscore name using both default and securing and this means that we love the data populated either when we load the data into the base table or when we run the Refresh this is summary of the technique that we can implement in birth in order to make our and other browsers even more efficient and well basically this is the end of our presentation thank you for listening and now we are ready for the Q&A session you

Published Date : Mar 30 2020

SUMMARY :

the end to the stage in the area we have

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Tom	PERSON	0.99+
Marta	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris Keg	PERSON	0.99+
Laura Ipsen	PERSON	0.99+
Jeffrey Immelt	PERSON	0.99+
Chris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Chris O'Malley	PERSON	0.99+
Andy Dalton	PERSON	0.99+
Chris Berg	PERSON	0.99+
Dave Velante	PERSON	0.99+
Maureen Lonergan	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Paul Forte	PERSON	0.99+
Erik Brynjolfsson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Andrew McCafee	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Cheryl	PERSON	0.99+
Mark	PERSON	0.99+
Marta Federici	PERSON	0.99+
Larry	PERSON	0.99+
Matt Burr	PERSON	0.99+
Sam	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Dave Wright	PERSON	0.99+
Maureen	PERSON	0.99+
Google	ORGANIZATION	0.99+
Cheryl Cook	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
$8,000	QUANTITY	0.99+
Justin Warren	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Europe	LOCATION	0.99+
Andy	PERSON	0.99+
30,000	QUANTITY	0.99+
Mauricio	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Robb	PERSON	0.99+
Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mike Nygaard	PERSON	0.99+

Gabriel Chapman grphx full

hi everybody and welcome to this cube special presentation of the verdict of virtual Big Data conference the cube is running in parallel with day 1 and day 2 of the verdict big data event by the way the cube has been at every single big data event and it's our pleasure to be here in the virtual / digital event as well Gabriel Chapman is here is the director of flash blade product solutions marketing at pure storage gave great to see you thanks for coming on great to see you - how's it going it's going very well I mean I wish we were meeting in Boston at the Encore Hotel but you know and and hopefully we'll be able to meet it accelerate at some point you cheer or one of the the sub shows that you guys are doing the regional shows but because we've been covering that show as well but I really want to get into it and the last accelerate September 2019 pure and Vertica announced a partnership I remember a joint being ran up to me and said hey you got to check this out the separation of Butte and storage by a Eon mode now available on flash played so and and I believe still the only company that can support that separation and independent scaling both on permit in the cloud so Gabe I want to ask you what were the trends in analytical database and cloud that led to this partnership you know realistically I think what we're seeing is that there's been in kind of a larger shift when it comes to modern analytics platforms towards moving away from the the traditional you know Hadoop type architecture where we were doing on and leveraging a lot of direct attached storage primarily because of the limitations of how that solution was architected when we start to look at the larger trends towards you know how organizations want to do this type of work on premises they're looking at solutions that allow them to scale the compute storage pieces independently and therefore you know the flash play platform ended up being a great solution to support Vertica in their transition to Eon mode leveraging is essentially as an s3 object store okay so let's let's circle back on that you guys in your in your announcement of a flash blade you make the claim that flash blade is the industry's most advanced file and object storage platform ever that's a bold statement so defend that it's supposed to yeah III like to go beyond that and just say you know so we've really kind of looked at this from a standpoint of you know as as we've developed flash blade as a platform and keep in mind it's been a product that's been around for over three years now and has you know it's been very successful for pure storage the reality is is that fast file and fast object as a combined storage platform is a direction that many organizations are looking to go and we believe that we're a leader in that fast object of best file storage place in realistically would we start to see more organizations start to look at building solutions that leverage cloud storage characteristics but doing so on prem or multitude different reasons we've built a platform that really addresses a lot of those needs around simplicity around you know making things assure that you know vast matters for us simple is smart we can provide you know cloud integrations across the spectrum and you know there's a subscription model that fits into that as well we fall that that falls into our umbrella of what we consider the modern data experience and it's something that we've built into the entire pure portfolio okay so I want to get into the architecture a little bit of Flash blade and then better understand the fit for analytic databases generally but specifically Vertica so it is a blade so you got compute and a network included it's a key value store based system so you're talking about scale out unlike unlike viewers sort of you know initial products which were scale up and so I want to under in as a fabric base system I want to understand what that all mean so take us through the architecture you know some of the quote-unquote firsts that you guys talk about so let's start with sort of the blade aspect yeah the blade aspect meaning we call it a flash blade because if you look at the actual platform you have a primarily a chassis with built in networking components right so there's a fabric interconnect with inside the platform that connects to each one of the individual blades the individual blades have their own compute that drives basically a pure storage flash components inside it's not like we're just taking SSDs and plugging them into a system and like you would with the traditional commodity off-the-shelf hardware design this is a very much an engineered solution that is built towards the characteristics that we believe were important with fast file and fast object scalability you know massive parallelization when it comes to performance and the ability to really kind of grow and scale from essentially seven blades right now to a hundred and fifty that's that's the kind of scale that customers are looking for especially as we start to address these larger analytic spools they have multi petabyte datasets you know that single addressable object space and you know file performance that is beyond what most of your traditional scale-up storage platforms are able to deliver yes I interviewed cause last September and accelerate and and Christopher's been you know attacked by some of the competitors is not having a scale out I asked him his thoughts on that he said well first of all our Flash blade is scale-out and he said look anything that that that adds the complexity you know we avoid but for the workloads that are associated with Flash blade scale-out is the right sort of approach maybe you could talk about why that is well you know realistically I think you know that that approach is better when we're starting to learn to work with large unstructured data sets I mean flash plays uniquely architected to allow customers to achieve you know a superior resource utilization for compute and storage well at the same time you know reducing significantly the complexity that is arisen around these kind of bespoke or siloed nature of big data and analytic solutions I mean we really kind of look at this from a standpoint of you have built and delivered or created applications in the public cloud space that address you know object storage and and unstructured data and and for some organizations the importance is bringing that on Prem I mean we do seek repatriation that coming on on for a lot of organizations as these data egress charges continue to expand and grow and then organizations that want even higher performance in the what we're able to get into the public cloud space they are bringing that data back on Prem they are looking at from a standpoint we still want to be able to scale the way we scale on the cloud we still want to operate the same way we operate in the cloud but we want to do it within control of our own you know our own borders and so that's you know that's one of the bigger pieces to that is we start to look at how do we address cloud characteristics and dynamics and consumption metrics or models as well as the benefits and efficiencies of scale that they're able to afford but allowing customers that do that with inside their own data center yes are you talking about the trends earlier you had these cloud native databases that allowed the scaling of compute and storage independently of Vertica comes in with eon of a lot of times we talk about these these partnerships as Barney deals of you know I love you you love me here's a press release and then we go on or they're just straight you know go to market are there other aspects of this partnership that are that are non Barney deal like in other words any specific you know engineering you know other go to market programs can you talk about that a little bit yeah it's it's it's more than just you know I then what we consider a channel meet in the middle or you know that Barney type of deal it's the realistically you know we've done some first with Vertica that I think are really important if they think you look at the architecture and how we do have we've brought this to market together we have solutions teams in the back end who are you know subject matter experts in this space if you talk to joy and the people from vertigo they're very high on or very excited about the partnership because it often it opens up a new set of opportunities for their customers to to leverage Eon mode and you know get into some of the the nuanced aspects of how they leverage the depot for Depot with inside each individual compute node and adjustments with inside there I reach additional performance gains for customers on Prem and at the same time for them there's still the ability to go into that cloud model if they wish to and so I think a lot of it is around how do we partner as two companies how do we do a joint selling motions you know how do we show up and and you know do white papers and all of the the traditional marketing aspects that we bring devote to the market and then you know joint selling opportunities as exists where they are and so that's realistically I think like any other organization that's going to market with a partner or an ISP that they have a strong partnership with you'll continue to see us you know talking about our chose mutually beneficial relationships and the solutions that we're bringing to the market okay you know of course he used to be a Gartner analyst and you go over to the vendor side now but as but as it but as a gardener analyst you're obviously objective you see it all you know well there's a lot of ways to skin a cat there are there are there are strengths weaknesses opportunities threats etc for every vendor so you have you have Vertica who's got a very mature stack and and talking to a number of the customers out there we're using Eon mode you know there's certain workloads where these cloud native databases make sense it's not just the economics of scaling compute and storage independently I want to talk more about that there's flexibility aspects as well but Vertica really you know has to play its trump card which is look we've got a big on-premise state and we're gonna bring that you know Eon capability both on Prem and we're embracing the cloud now they're obviously you have to they had to play catch-up in the cloud but at the same time they've got a much more mature stack than a lot of these other you know cloud native databases that might have just started a couple of years ago so you know so there's trade-offs that customers have to make how do you sort through that where do you see the interest in this and and and what's the sweet spot for this partnership you know we've been really excited to build the partnership with Vertica and we're providing you know we're really proud to provide pretty much the only on Prem storage platform that's validated with the vertical yawn mode to deliver a modern data experience for our customers together you know it's it's that partnership that allows us to go into customers that on Prem space where I think that they're still you know not to say that not everybody wants to go the cloud I think there's aspects and solutions that work very well there but for the vast majority I still think that there's you know the your data center is not going away and you do want to have control over some of the many of the different facets with inside the operational confines so therefore we start to look at how do we can do the best of what cloud offers but on Prem and that's realistically where we start to see the stronger push for those customers who still want to manage their data locally as well as maybe even work around some of the restrictions that they might have around cost and complexity hiring you know the different types of skills skill sets that are required to bring you know applications purely cloud native it's still that larger part of that digital transformation that many organizations are going for going forward with and realistically I think they're taking a look at the pros and cons and we've been doing cloud long enough for people recognize that you know it's not perfect for everything and that there's certain things that we still want to keep inside our own data center so I mean realistically as we move forward that's that that better option when it comes to a modern architecture they can do it you know we can deliver and address a diverse set of performance requirements and allow the organization to continue to grow the model to the data you know based on the data that they're actually trying to leverage and that's really what flash Wood was built or it was built for a platform that can address small files or large files or high throughput high throughput low latency scale to petabytes in a single namespace in a single rack as we like to put it in there I mean we see customers that have put you know 150 flash blades into production as a single namespace it's significant for organizations that are making that drive towards modern data experience with modern analytics platforms pure and Vertica have delivered an experience that can address that to a wide range of customers that are implementing you know the verdict technology I'm interested in exploring the use case a little bit further you just sort of gave some parameters and some examples and some of the flexibility that you have in but take us through kind of what the discuss the customer discussions are like obviously you've got a big customer base you and Vertica that that's on prem that's the the the unique advantage of this but there are others it's not just the economics of the the granular scaling of compute and storage independently there are other aspects so to take us through that sort of a primary use case or use cases yeah you know I mean I can give you a couple customer examples and we have a large SAS analyst company which uses verdict on flash play to authenticate the quality of digital media in real time and you know then for them it makes a big difference is they're doing they're streaming and whatnot that they can they can fine tune and grandly control that so that's one aspect that that we get address we have a multi national car con company which uses verdict on flash blade to make thousands of decisions per second for autonomous vehicle decision-making trees that you know that's what really these new modern analytics platforms were built or there's another healthcare organization that uses Vertica on flash blade to enable healthcare providers to make decisions in real time the impact Ives especially when we start to look at and you know the current state of affairs with Kovac in the coronavirus you know those types of technologies are really going to help us kind of get love and and help lower and been you know bend that curve downward so you know there's all these different areas where we can address the goals and the achievements that we're trying to look bored with with real-time analytic decision making tools like Berta and you know realistically as we have these conversations with customers they're looking to get beyond the ability of just you know you know a data scientist or a data architect looking to just kind of drive in information we were talking about Hadoop earlier we're kind of going well beyond that now and I guess what I'm saying is that in the first phase of cloud it was all about infrastructure it was about you know spinning up you know compute and storage a little bit of networking in there seems like the the a next a new workload that's clearly emerging is you've got and it started with the cloud databases but then bringing in you know AI and machine learning tooling on top of that and then being able to really drive these new types of insights and it's really about taking data these bogs this bog of data that we've collected over the last 10 years a lot of that you know driven by Hadoop bringing machine intelligence into the equation scaling it with either cloud public cloud or bringing that cloud experience on prams scale you know across your organizations and across your partner network that really is a new emerging work load do you see that and maybe talk a little bit about you know what you're seeing with customers yeah I mean it really is we see several trends you know one of those is the ability to take a take this approach to move it out of the lab but into production you know especially when it comes to you know data science projects machine learning projects that traditionally start out as kind of small proofs of concept easy to spin up in the cloud but when a customer wants to scale and move towards a real you know it derived a significant value from that they do want to be able to control more characteristics right and we know machine learning you know needs to needs to learn from a massive amounts of data to provide accuracy there's just too much data to retrieve in the cloud for every training job at the same time predictive analytics without accuracy is not going to deliver the business advantage of what everyone is seeking you know we see this the visualization of data analytics is traditionally deployed as being on a continuum with you know the things that we've been doing in the long you know in the past you know with data warehousing data lakes AI on the other end but but this way we're starting to manifest it in organizations that are looking towards you know getting more utility and better you know elasticity out of the data that they are working for so they're not looking to just build ups you know silos of bespoke AI environments they're looking to leverage you know a platform that can allow them to you know do a I for one thing machine learning for another leverage multiple protocols to access that data because the tools are so much different you know it is a growing diversity of of use cases that you can put on a single platform I think organizations are looking for as they try to scale these environments I think there's gonna be a big growth area in the coming years gay ball I wish we were in Boston together you would have painted your little corner of Boston Orange I know that you guys are sharing but I really appreciate you coming on the cube wall-to-wall coverage two days at the vertical Vertica virtual big data conference keep you right there but right back right after this short break [Music]

Published Date : Mar 30 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Dave	PERSON	0.99+
John	PERSON	0.99+
Jeff	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
PCCW	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Michelle Dennedy	PERSON	0.99+
Matthew Roszak	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
George	PERSON	0.99+
Jeff Swain	PERSON	0.99+
Andy Kessler	PERSON	0.99+
Europe	LOCATION	0.99+
Matt Roszak	PERSON	0.99+
Frank Slootman	PERSON	0.99+
John Donahoe	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dan Cohen	PERSON	0.99+
Michael Biltz	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Michael Conlin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Melo	PERSON	0.99+
John Furrier	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Joe Brockmeier	PERSON	0.99+
Sam	PERSON	0.99+
Matt	PERSON	0.99+
Jeff Garzik	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Joe	PERSON	0.99+
George Canuck	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Rebecca Night	PERSON	0.99+
Brian	PERSON	0.99+
Dave Valante	PERSON	0.99+
NUTANIX	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Michael	PERSON	0.99+
Mike Nickerson	PERSON	0.99+
Jeremy Burton	PERSON	0.99+
Fred	PERSON	0.99+
Robert McNamara	PERSON	0.99+
Doug Balog	PERSON	0.99+
2013	DATE	0.99+
Alistair Wildman	PERSON	0.99+
Kimberly	PERSON	0.99+
California	LOCATION	0.99+
Sam Groccot	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+

Gabriel Chapman

hi everybody and welcome to this cube special presentation of the verdict of virtual Big Data conference the cube is running in parallel with day 1 and day 2 of the verdict big data event by the way the cube has been at every single big data event and it's our pleasure to be here in the virtual / digital event as well Gabriel Chapman is here is the director of flash blade product solutions marketing at pure storage gave great to see you thanks for coming on great to see you - how's it going it's going very well I mean I wish we were meeting in Boston at the Encore Hotel but you know and and hopefully we'll be able to meet it accelerate at some point you cheer or one of the the sub shows that you guys are doing the regional shows but because we've been covering that show as well but I really want to get into it and the last accelerate September 2019 pure and Vertica announced a partnership I remember a joint being ran up to me and said hey you got to check this out the separation of Butte and storage by a Eon mode now available on flash played so and and I believe still the only company that can support that separation and independent scaling both on permit in the cloud so Gabe I want to ask you what were the trends in analytical database and cloud that led to this partnership you know realistically I think what we're seeing is that there's been in kind of a larger shift when it comes to modern analytics platforms towards moving away from the the traditional you know Hadoop type architecture where we were doing on and leveraging a lot of direct attached storage primarily because of the limitations of how that solution was architected when we start to look at the larger trends towards you know how organizations want to do this type of work on premises they're looking at solutions that allow them to scale the compute storage pieces independently and therefore you know the flash play platform ended up being a great solution to support Vertica in their transition to Eon mode leveraging is essentially as an s3 object store okay so let's let's circle back on that you guys in your in your announcement of a flash blade you make the claim that flash blade is the industry's most advanced file and object storage platform ever that's a bold statement so defend that it's supposed to yeah III like to go beyond that and just say you know so we've really kind of looked at this from a standpoint of you know as as we've developed flash blade as a platform and keep in mind it's been a product that's been around for over three years now and has you know it's been very successful for pure storage the reality is is that fast file and fast object as a combined storage platform is a direction that many organizations are looking to go and we believe that we're a leader in that fast object of best file storage place in realistically would we start to see more organizations start to look at building solutions that leverage cloud storage characteristics but doing so on prem or multitude different reasons we've built a platform that really addresses a lot of those needs around simplicity around you know making things assure that you know vast matters for us simple is smart we can provide you know cloud integrations across the spectrum and you know there's a subscription model that fits into that as well we fall that that falls into our umbrella of what we consider the modern data experience and it's something that we've built into the entire pure portfolio okay so I want to get into the architecture a little bit of Flash blade and then better understand the fit for analytic databases generally but specifically Vertica so it is a blade so you got compute and a network included it's a key value store based system so you're talking about scale out unlike unlike viewers sort of you know initial products which were scale up and so I want to under in as a fabric base system I want to understand what that all mean so take us through the architecture you know some of the quote-unquote firsts that you guys talk about so let's start with sort of the blade aspect yeah the blade aspect meaning we call it a flash blade because if you look at the actual platform you have a primarily a chassis with built in networking components right so there's a fabric interconnect with inside the platform that connects to each one of the individual blades the individual blades have their own compute that drives basically a pure storage flash components inside it's not like we're just taking SSDs and plugging them into a system and like you would with the traditional commodity off-the-shelf hardware design this is a very much an engineered solution that is built towards the characteristics that we believe were important with fast file and fast object scalability you know massive parallelization when it comes to performance and the ability to really kind of grow and scale from essentially seven blades right now to a hundred and fifty that's that's the kind of scale that customers are looking for especially as we start to address these larger analytic spools they have multi petabyte datasets you know that single addressable object space and you know file performance that is beyond what most of your traditional scale-up storage platforms are able to deliver yes I interviewed cause last September and accelerate and and Christopher's been you know attacked by some of the competitors is not having a scale out I asked him his thoughts on that he said well first of all our Flash blade is scale-out and he said look anything that that that adds the complexity you know we avoid but for the workloads that are associated with Flash blade scale-out is the right sort of approach maybe you could talk about why that is well you know realistically I think you know that that approach is better when we're starting to learn to work with large unstructured data sets I mean flash plays uniquely architected to allow customers to achieve you know a superior resource utilization for compute and storage well at the same time you know reducing significantly the complexity that is arisen around these kind of bespoke or siloed nature of big data and analytic solutions I mean we really kind of look at this from a standpoint of you have built and delivered or created applications in the public cloud space that address you know object storage and and unstructured data and and for some organizations the importance is bringing that on Prem I mean we do seek repatriation that coming on on for a lot of organizations as these data egress charges continue to expand and grow and then organizations that want even higher performance in the what we're able to get into the public cloud space they are bringing that data back on Prem they are looking at from a standpoint we still want to be able to scale the way we scale on the cloud we still want to operate the same way we operate in the cloud but we want to do it within control of our own you know our own borders and so that's you know that's one of the bigger pieces to that is we start to look at how do we address cloud characteristics and dynamics and consumption metrics or models as well as the benefits and efficiencies of scale that they're able to afford but allowing customers that do that with inside their own data center yes are you talking about the trends earlier you had these cloud native databases that allowed the scaling of compute and storage independently of Vertica comes in with eon of a lot of times we talk about these these partnerships as Barney deals of you know I love you you love me here's a press release and then we go on or they're just straight you know go to market are there other aspects of this partnership that are that are non Barney deal like in other words any specific you know engineering you know other go to market programs can you talk about that a little bit yeah it's it's it's more than just you know I then what we consider a channel meet in the middle or you know that Barney type of deal it's the realistically you know we've done some first with Vertica that I think are really important if they think you look at the architecture and how we do have we've brought this to market together we have solutions teams in the back end who are you know subject matter experts in this space if you talk to joy and the people from vertigo they're very high on or very excited about the partnership because it often it opens up a new set of opportunities for their customers to to leverage Eon mode and you know get into some of the the nuanced aspects of how they leverage the depot for Depot with inside each individual compute node and adjustments with inside there I reach additional performance gains for customers on Prem and at the same time for them there's still the ability to go into that cloud model if they wish to and so I think a lot of it is around how do we partner as two companies how do we do a joint selling motions you know how do we show up and and you know do white papers and all of the the traditional marketing aspects that we bring devote to the market and then you know joint selling opportunities as exists where they are and so that's realistically I think like any other organization that's going to market with a partner or an ISP that they have a strong partnership with you'll continue to see us you know talking about our chose mutually beneficial relationships and the solutions that we're bringing to the market okay you know of course he used to be a Gartner analyst and you go over to the vendor side now but as but as it but as a gardener analyst you're obviously objective you see it all you know well there's a lot of ways to skin a cat there are there are there are strengths weaknesses opportunities threats etc for every vendor so you have you have Vertica who's got a very mature stack and and talking to a number of the customers out there we're using Eon mode you know there's certain workloads where these cloud native databases make sense it's not just the economics of scaling compute and storage independently I want to talk more about that there's flexibility aspects as well but Vertica really you know has to play its trump card which is look we've got a big on-premise state and we're gonna bring that you know Eon capability both on Prem and we're embracing the cloud now they're obviously you have to they had to play catch-up in the cloud but at the same time they've got a much more mature stack than a lot of these other you know cloud native databases that might have just started a couple of years ago so you know so there's trade-offs that customers have to make how do you sort through that where do you see the interest in this and and and what's the sweet spot for this partnership you know we've been really excited to build the partnership with Vertica and we're providing you know we're really proud to provide pretty much the only on Prem storage platform that's validated with the vertical yawn mode to deliver a modern data experience for our customers together you know it's it's that partnership that allows us to go into customers that on Prem space where I think that they're still you know not to say that not everybody wants to go the cloud I think there's aspects and solutions that work very well there but for the vast majority I still think that there's you know the your data center is not going away and you do want to have control over some of the many of the different facets with inside the operational confines so therefore we start to look at how do we can do the best of what cloud offers but on Prem and that's realistically where we start to see the stronger push for those customers who still want to manage their data locally as well as maybe even work around some of the restrictions that they might have around cost and complexity hiring you know the different types of skills skill sets that are required to bring you know applications purely cloud native it's still that larger part of that digital transformation that many organizations are going for going forward with and realistically I think they're taking a look at the pros and cons and we've been doing cloud long enough for people recognize that you know it's not perfect for everything and that there's certain things that we still want to keep inside our own data center so I mean realistically as we move forward that's that that better option when it comes to a modern architecture they can do it you know we can deliver and address a diverse set of performance requirements and allow the organization to continue to grow the model to the data you know based on the data that they're actually trying to leverage and that's really what flash Wood was built or it was built for a platform that can address small files or large files or high throughput high throughput low latency scale to petabytes in a single namespace in a single rack as we like to put it in there I mean we see customers that have put you know 150 flash blades into production as a single namespace it's significant for organizations that are making that drive towards modern data experience with modern analytics platforms pure and Vertica have delivered an experience that can address that to a wide range of customers that are implementing you know the verdict technology I'm interested in exploring the use case a little bit further you just sort of gave some parameters and some examples and some of the flexibility that you have in but take us through kind of what the discuss the customer discussions are like obviously you've got a big customer base you and Vertica that that's on prem that's the the the unique advantage of this but there are others it's not just the economics of the the granular scaling of compute and storage independently there are other aspects so to take us through that sort of a primary use case or use cases yeah you know I mean I can give you a cup of customer examples and we have a large SAS analyst company which uses verdict on flash play to authenticate the quality of digital media in real time and you know then for them it makes a big difference is they're doing they're streaming and whatnot that they can they can fine tune and grandly control that so that's one aspect that we get address we have a multi national car con company which uses verdict on flash blade to make thousands of decisions per second for autonomous vehicle decision-making trees that you know that's what really these new modern analytics platforms were built or there's another healthcare organization that uses Vertica on flash blade to enable healthcare providers to make decisions in real time the impact Ives especially when we start to look at and you know the current state of affairs with Kovac in the coronavirus you know those types of technologies are really going to help us kind of get love and and help lower and been you know bend that curve downward so you know there's all these different areas where we can address the goals and the achievements that we're trying to look bored with with real-time analytic decision making tools like Berta and you know realistically as we have these conversations with customers they're looking to get beyond the ability of just you know you know a data scientist or a data architect looking to just kind of drive in information we were talking about Hadoop earlier we're kind of going well beyond that now and I guess what I'm saying is that in the first phase of cloud it was all about infrastructure it was about you know spinning up you know compute and storage a little bit of networking in there seems like the the a next a new workload that's clearly emerging is you've got and it started with the cloud databases but then bringing in you know AI and machine learning tooling on top of that and then being able to really drive these new types of insights and it's really about taking data these bogs this bog of data that we've collected over the last 10 years a lot of that you know driven by Hadoop bringing machine intelligence into the equation scaling it with either cloud public cloud or bringing that cloud experience on prams scale you know across your organizations and across your partner network that really is a new emerging work load do you see that and maybe talk a little bit about you know what you're seeing with customers yeah I mean it really is we see several trends you know one of those is the ability to take a take this approach to move it out of the lab but into production you know especially when it comes to you know data science projects machine learning projects that traditionally start out as kind of small proofs of concept easy to spin up in the cloud but when a customer wants to scale and move towards a real you know it derived a significant value from that they do want to be able to control more characteristics right and we know machine learning you know needs to needs to learn from a massive amounts of data to provide accuracy there's just too much data to retrieve in the cloud for every training job at the same time predictive analytics without accuracy is not going to deliver the business advantage of what everyone is seeking you know we see this the visualization of data analytics is traditionally deployed as being on a continuum with you know the things that we've been doing in the long you know in the past you know with data warehousing data lakes AI on the other end but but this way we're starting to manifest it in organizations that are looking towards you know getting more utility and better you know elasticity out of the data that they are working for so they're not looking to just build ups you know silos of bespoke AI environments they're looking to leverage you know a platform that can allow them to you know do a I for one thing machine learning for another leverage multiple protocols to access that data because the tools are so much different you know it is a growing diversity of of use cases that you can put on a single platform I think organizations are looking for as they try to scale these environments I think there's gonna be a big growth area in the coming years gay ball I wish we were in Boston together you would have painted your little corner of Boston Orange I know that you guys are sharing but I really appreciate you coming on the cube wall-to-wall coverage two days at the vertical Vertica virtual big data conference keep you right there but right back right after this short break [Music]

Published Date : Mar 30 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
September 2019	DATE	0.99+
Gabriel Chapman	PERSON	0.99+
Boston	LOCATION	0.99+
two companies	QUANTITY	0.99+
Barney	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
Gabe	PERSON	0.99+
Gartner	ORGANIZATION	0.98+
two days	QUANTITY	0.98+
Christopher	PERSON	0.98+
last September	DATE	0.98+
first phase	QUANTITY	0.97+
a hundred and fifty	QUANTITY	0.97+
one aspect	QUANTITY	0.97+
over three years	QUANTITY	0.97+
seven blades	QUANTITY	0.97+
pure	ORGANIZATION	0.96+
day 2	QUANTITY	0.96+
both	QUANTITY	0.95+
one	QUANTITY	0.95+
single rack	QUANTITY	0.95+
firsts	QUANTITY	0.94+
Boston Orange	LOCATION	0.94+
coronavirus	OTHER	0.93+
Encore Hotel	LOCATION	0.93+
thousands of decisions per second	QUANTITY	0.93+
single namespace	QUANTITY	0.92+
each one	QUANTITY	0.92+
single platform	QUANTITY	0.92+
Hadoop	TITLE	0.91+
day 1	QUANTITY	0.91+
150 flash blades	QUANTITY	0.9+
single	QUANTITY	0.89+
Big Data	EVENT	0.88+
first	QUANTITY	0.86+
Berta	ORGANIZATION	0.86+
a couple of years ago	DATE	0.85+
Kovac	ORGANIZATION	0.84+
last 10 years	DATE	0.82+
Prem	ORGANIZATION	0.81+
each individual	QUANTITY	0.8+
Ives	ORGANIZATION	0.7+
big data	EVENT	0.66+
one of the bigger pieces	QUANTITY	0.66+
the sub shows	QUANTITY	0.66+
every single	QUANTITY	0.64+
Vertica	TITLE	0.61+
Eon	TITLE	0.57+
data	EVENT	0.56+
egress	ORGANIZATION	0.56+
times	QUANTITY	0.54+
Eon	ORGANIZATION	0.54+
petabytes	QUANTITY	0.53+
s3	TITLE	0.49+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Berta: