UNLIST TILL 4/2 - The Shortest Path to Vertica – Best Practices for Data Warehouse Migration and ETL
hello everybody and thank you for joining us today for the virtual verdict of BBC 2020 today's breakout session is entitled the shortest path to Vertica best practices for data warehouse migration ETL I'm Jeff Healey I'll leave verdict and marketing I'll be your host for this breakout session joining me today are Marco guesser and Mauricio lychee vertical product engineer is joining us from yume region but before we begin I encourage you to submit questions or comments or in the virtual session don't have to wait just type question in a comment in the question box below the slides that click Submit as always there will be a Q&A session the end of the presentation will answer as many questions were able to during that time any questions we don't address we'll do our best to answer them offline alternatively visit Vertica forums that formed at vertical comm to post your questions there after the session our engineering team is planning to join the forums to keep the conversation going also reminder that you can maximize your screen by clicking the double arrow button and lower right corner of the sides and yes this virtual session is being recorded be available to view on demand this week send you a notification as soon as it's ready now let's get started over to you mark marco andretti oh hello everybody this is Marco speaking a sales engineer from Amir said I'll just get going ah this is the agenda part one will be done by me part two will be done by Mauricio the agenda is as you can see big bang or piece by piece and the migration of the DTL migration of the physical data model migration of et I saw VTL + bi functionality what to do with store procedures what to do with any possible existing user defined functions and migration of the data doctor will be by Maurice it you want to talk about emeritus Rider yeah hello everybody my name is Mauricio Felicia and I'm a birth record pre-sales like Marco I'm going to talk about how to optimize that were always using some specific vertical techniques like table flattening live aggregated projections so let me start with be a quick overview of the data browser migration process we are going to talk about today and normally we often suggest to start migrating the current that allows the older disease with limited or minimal changes in the overall architecture and yeah clearly we will have to port the DDL or to redirect the data access tool and we will platform but we should minimizing the initial phase the amount of changes in order to go go live as soon as possible this is something that we also suggest in the second phase we can start optimizing Bill arouse and which again with no or minimal changes in the architecture as such and during this optimization phase we can create for example dog projections or for some specific query or optimize encoding or change some of the visual spools this is something that we normally do if and when needed and finally and again if and when needed we go through the architectural design for these operations using full vertical techniques in order to take advantage of all the features we have in vertical and this is normally an iterative approach so we go back to name some of the specific feature before moving back to the architecture and science we are going through this process in the next few slides ok instead in order to encourage everyone to keep using their common sense when migrating to a new database management system people are you often afraid of it it's just often useful to use the analogy of how smooth in your old home you might have developed solutions for your everyday life that make perfect sense there for example if your old cent burner dog can't walk anymore you might be using a fork lifter to heap in through your window in the old home well in the new home consider the elevator and don't complain that the window is too small to fit the dog through this is very much in the same way as Narita but starting to make the transition gentle again I love to remain in my analogy with the house move picture your new house as your new holiday home begin to install everything you miss and everything you like from your old home once you have everything you need in your new house you can shut down themselves the old one so move each by feet and go for quick wins to make your audience happy you do bigbang only if they are going to retire the platform you are sitting on where you're really on a sinking ship otherwise again identify quick wings implement published and quickly in Vertica reap the benefits enjoy the applause use the gained reputation for further funding and if you find that nobody's using the old platform anymore you can shut it down if you really have to migrate you can still go to really go to big battle in one go only if you absolutely have to otherwise migrate by subject area use the group all similar clear divisions right having said that ah you start off by migrating objects objects in the database that's one of the very first steps it consists of migrating verbs the places where you can put the other objects into that is owners locations which is usually schemers then what do you have that you extract tables news then you convert the object definition deploy them to Vertica and think that you shouldn't do it manually never type what you can generate ultimate whatever you can use it enrolls usually there is a system tables in the old database that contains all the roads you can export those to a file reformat them and then you have a create role and create user scripts that you can apply to Vertica if LDAP Active Directory was used for the authentication the old database vertical supports anything within the l dubs standard catalogued schemas should be relatively straightforward with maybe sometimes the difference Vertica does not restrict you by defining a schema as a collection of all objects owned by a user but it supports it emulates it for old times sake Vertica does not need the catalog or if you absolutely need the catalog from the old tools that you use it it usually said it is always set to the name of the database in case of vertical having had now the schemas the catalogs the users and roles in place move the take the definition language of Jesus thought if you are allowed to it's best to use a tool that translates to date types in the PTL generated you might see as a mention of old idea to listen by memory to by the way several times in this presentation we are very happy to have it it actually can export the old database table definition because they got it works with the odbc it gets what the old database ODBC driver translates to ODBC and then it has internal translation tables to several target schema to several target DBMS flavors the most important which is obviously vertical if they force you to use something else there are always tubes like sequel plots in Oracle the show table command in Tara data etc H each DBMS should have a set of tools to extract the object definitions to be deployed in the other instance of the same DBMS ah if I talk about youth views usually a very new definition also in the old database catalog one thing that you might you you use special a bit of special care synonyms is something that were to get emulated different ways depending on the specific needs I said I stop you on the view or table to be referred to or something that is really neat but other databases don't have the search path in particular that works that works very much like the path environment variable in Windows or Linux where you specify in a table an object name without the schema name and then it searched it first in the first entry of the search path then in a second then in third which makes synonym hugely completely unneeded when you generate uvl we remained in the analogy of moving house dust and clean your stuff before placing it in the new house if you see a table like the one here at the bottom this is usually corpse of a bad migration in the past already an ID is usually an integer and not an almost floating-point data type a first name hardly ever has 256 characters and that if it's called higher DT it's not necessarily needed to store the second when somebody was hired so take good care in using while you are moving dust off your stuff and use better data types the same applies especially could string how many bytes does a string container contains for eurozone's it's not for it's actually 12 euros in utf-8 in the way that Vertica encodes strings and ASCII characters one died but the Euro sign thinks three that means that you have to very often you have when you have a single byte character set up a source you have to pay attention oversize it first because otherwise it gets rejected or truncated and then you you will have to very carefully check what their best science is the best promising is the most promising approach is to initially dimension strings in multiples of very initial length and again ODP with the command you see there would be - I you 2 comma 4 will double the lengths of what otherwise will single byte character and multiply that for the length of characters that are wide characters in traditional databases and then load the representative sample of your cells data and profile using the tools that we personally use to find the actually longest datatype and then make them shorter notice you might be talking about the issues of having too long and too big data types on projection design are we live and die with our projects you might know remember the rules on how default projects has come to exist the way that we do initially would be just like for the profiling load a representative sample of the data collector representative set of already known queries from the Vertica database designer and you don't have to decide immediately you can always amend things and otherwise follow the laws of physics avoid moving data back and forth across nodes avoid heavy iOS if you can design your your projections initially by hand encoding matters you know that the database designer is a very tight fisted thing it would optimize to use as little space as possible you will have to think of the fact that if you compress very well you might end up using more time in reading it this is the testimony to run once using several encoding types and you see that they are l e is the wrong length encoded if sorted is not even visible while the others are considerably slower you can get those nights and look it in look at them in detail I will go in detail you now hear about it VI migrations move usually you can expect 80% of everything to work to be able to live to be lifted and shifted you don't need most of the pre aggregated tables because we have live like regain projections many BI tools have specialized query objects for the dimensions and the facts and we have the possibility to use flatten tables that are going to be talked about later you might have to ride those by hand you will be able to switch off casting because vertical speeds of everything with laps Lyle aggregate projections and you have worked with molap cubes before you very probably won't meet them at all ETL tools what you will have to do is if you do it row by row in the old database consider changing everything to very big transactions and if you use in search statements with parameter markers consider writing to make pipes and using verticals copy command mouse inserts yeah copy c'mon that's what I have here ask you custom functionality you can see on this slide the verticals the biggest number of functions in the database we compare them regularly by far compared to any other database you might find that many of them that you have written won't be needed on the new database so look at the vertical catalog instead of trying to look to migrate a function that you don't need stored procedures are very often used in the old database to overcome their shortcomings that Vertica doesn't have very rarely you will have to actually write a procedure that involves a loop but it's really in our experience very very rarely usually you can just switch to standard scripting and this is basically repeating what Mauricio said in the interest of time I will skip this look at this one here the most of the database data warehouse migration talks should be automatic you can use you can automate GDL migration using ODB which is crucial data profiling it's not crucial but game-changing the encoding is the same thing you can automate at you using our database designer the physical data model optimization in general is game-changing you have the database designer use the provisioning use the old platforms tools to generate the SQL you have no objects without their onus is crucial and asking functions and procedures they are only crucial if they depict the company's intellectual property otherwise you can almost always replace them with something else that's it from me for now Thank You Marco Thank You Marco so we will now point our presentation talking about some of the Vertica that overall the presentation techniques that we can implement in order to improve the general efficiency of the dot arouse and let me start with a few simple messages well the first one is that you are supposed to optimize only if and when this is needed in most of the cases just a little shift from the old that allows to birth will provide you exhaust the person as if you were looking for or even better so in this case probably is not really needed to to optimize anything in case you want optimize or you need to optimize then keep in mind some of the vertical peculiarities for example implement delete and updates in the vertical way use live aggregate projections in order to avoid or better in order to limit the goodbye executions at one time used for flattening in order to avoid or limit joint and and then you can also implement invert have some specific birth extensions life for example time series analysis or machine learning on top of your data we will now start by reviewing the first of these ballots optimize if and when needed well if this is okay I mean if you get when you migrate from the old data where else to birth without any optimization if the first four month level is okay then probably you only took my jacketing but this is not the case one very easier to dispute in session technique that you can ask is to ask basket cells to optimize the physical data model using the birth ticket of a designer how well DB deal which is the vertical database designer has several interfaces here I'm going to use what we call the DB DB programmatic API so basically sequel functions and using other databases you might need to hire experts looking at your data your data browser your table definition creating indexes or whatever in vertical all you need is to run something like these are simple as six single sequel statement to get a very well optimized physical base model you see that we start creating a new design then we had to be redesigned tables and queries the queries that we want to optimize we set our target in this case we are tuning the physical data model in order to maximize query performances this is why we are using my design query and in our statement another possible journal tip would be to tune in order to reduce storage or a mix between during storage and cheering queries and finally we asked Vertica to produce and deploy these optimized design in a matter of literally it's a matter of minutes and in a few minutes what you can get is a fully optimized fiscal data model okay this is something very very easy to implement keep in mind some of the vertical peculiarities Vaska is very well tuned for load and query operations aunt Berta bright rose container to biscuits hi the Pharos container is a group of files we will never ever change the content of this file the fact that the Rose containers files are never modified is one of the political peculiarities and these approach led us to use minimal locks we can add multiple load operations in parallel against the very same table assuming we don't have a primary or unique constraint on the target table in parallel as a sage because they will end up in two different growth containers salad in read committed requires in not rocket fuel and can run concurrently with insert selected because the Select will work on a snapshot of the catalog when the transaction start this is what we call snapshot isolation the kappa recovery because we never change our rows files are very simple and robust so we have a huge amount of bandages due to the fact that we never change the content of B rows files contain indiarose containers but on the other side believes and updates require a little attention so what about delete first when you believe in the ethica you basically create a new object able it back so it appeared a bit later in the Rose or in memory and this vector will point to the data being deleted so that when the feed is executed Vertica will just ignore the rules listed in B delete records and it's not just about the leak and updating vertical consists of two operations delete and insert merge consists of either insert or update which interim is made of the little insert so basically if we tuned how the delete work we will also have tune the update in the merge so what should we do in order to optimize delete well remember what we said that every time we please actually we create a new object a delete vector so avoid committing believe and update too often we reduce work the work for the merge out for the removal method out activities that are run afterwards and be sure that all the interested projections will contain the column views in the dedicate this will let workers directly after access the projection without having to go through the super projection in order to create the vector and the delete will be much much faster and finally another very interesting optimization technique is trying to segregate the update and delete operation from Pyrenean third workload in order to reduce lock contention beliefs something we are going to discuss and these contain using partition partition operation this is exactly what I want to talk about now here you have a typical that arouse architecture so we have data arriving in a landing zone where the data is loaded that is from the data sources then we have a transformation a year writing into a staging area that in turn will feed the partitions block of data in the green data structure we have at the end those green data structure we have at the end are the ones used by the data access tools when they run their queries sometimes we might need to change old data for example because we have late records or maybe because we want to fix some errors that have been originated in the facilities so what we do in this case is we just copied back the partition we want to change or we want to adjust from the green interior a the end to the stage in the area we have a very fast operation which is Tokyo Station then we run our updates or our adjustment procedure or whatever we need in order to fix the errors in the data in the staging area and at the very same time people continues to you with green data structures that are at the end so we will never have contention between the two operations when we updating the staging area is completed what we have to do is just to run a swap partition between tables in order to swap the data that we just finished to adjust in be staging zone to the query area that is the green one at the end this swap partition is very fast is an atomic operation and basically what will happens is just that well exchange the pointer to the data this is a very very effective techniques and lot of customer useless so why flops on table and live aggregate for injections well basically we use slot in table and live aggregate objection to minimize or avoid joint this is what flatten table are used for or goodbye and this is what live aggregate projections are used for now compared to traditional data warehouses better can store and process and aggregate and join order of magnitudes more data that is a true columnar database joint and goodbye normally are not a problem at all they run faster than any traditional data browse that page there are still scenarios were deficits are so big and we are talking about petabytes of data and so quickly going that would mean be something in order to boost drop by and join performances and this is why you can't reduce live aggregate projections to perform aggregations hard loading time and limit the need for global appear on time and flux and tables to combine information from different entity uploading time and again avoid running joint has query undefined okay so live aggregate projections at this point in time we can use live aggregate projections using for built in aggregate functions which are some min Max and count okay let's see how this works suppose that you have a normal table in this case we have a table unit sold with three columns PIB their time and quantity which has been segmented in a given way and on top of this base table we call it uncle table we create a projection you see that we create the projection using the salad that will aggregate the data we get the PID we get the date portion of the time and we get the sum of quantity from from the base table grouping on the first two columns so PID and the date portion of day time okay what happens in this case when we load data into the base table all we have to do with load data into the base table when we load data into the base table we will feel of course big injections that assuming we are running with k61 we will have to projection to projections and we will know the data in those two projection with all the detail in data we are going to load into the table so PAB playtime and quantity but at the very same time at the very same time and without having to do nothing any any particular operation or without having to run any any ETL procedure we will also get automatically in the live aggregate projection for the data pre aggregated with be a big day portion of day time and the sum of quantity into the table name total quantity you see is something that we get for free without having to run any specific procedure and this is very very efficient so the key concept is that during the loading operation from VDL point of view is executed again the base table we do not explicitly aggregate data or we don't have any any plc do the aggregation is automatic and we'll bring the pizza to be live aggregate projection every time we go into the base table you see the two selection that we have we have on in this line on the left side and you see that those two selects will produce exactly the same result so running select PA did they trying some quantity from the base table or running the select star from the live aggregate projection will result exactly in the same data you know this is of course very useful but is much more useful result that if we and we can observe this if we run an explained if we run the select against the base table asking for this group data what happens behind the scene is that basically vertical itself that is a live aggregate projection with the data that has been already aggregating loading phase and rewrite your query using polite aggregate projection this happens automatically you see this is a query that ran a group by against unit sold and vertical decided to rewrite this clearly as something that has to be collected against the light aggregates projection because if I decrease this will save a huge amount of time and effort during the ETL cycle okay and is not just limited to be information you want to aggregate for example another query like select count this thing you might note that can't be seen better basically our goodbyes will also take advantage of the live aggregate injection and again this is something that happens automatically you don't have to do anything to get this okay one thing that we have to keep very very clear in mind Brassica what what we store in the live aggregate for injection are basically partially aggregated beta so in this example we have two inserts okay you see that we have the first insert that is entered in four volts and the second insert which is inserting five rules well in for each of these insert we will have a partial aggregation you will never know that after the first insert you will have a second one so better will calculate the aggregation of the data every time irin be insert it is a key concept and be also means that you can imagine lies the effectiveness of bees technique by inserting large chunk of data ok if you insert data row by row this technique live aggregate rejection is not very useful because for every goal that you insert you will have an aggregation so basically they'll live aggregate injection will end up containing the same number of rows that you have in the base table but if you everytime insert a large chunk of data the number of the aggregations that you will have in the library get from structure is much less than B base data so this is this is a key concept you can see how these works by counting the number of rows that you have in alive aggregate injection you see that if you run the select count star from the solved live aggregate rejection the query on the left side you will get four rules but actually if you explain this query you will see that he was reading six rows so this was because every of those two inserts that we're actively interested a few rows in three rows in India in the live aggregate projection so this is a key concept live aggregate projection keep partially aggregated data this final aggregation will always happen at runtime okay another which is very similar to be live aggregate projection or what we call top K projection we actually do not aggregate anything in the top case injection we just keep the last or limit the amount of rows that we collect using the limit over partition by all the by clothes and this again in this case we create on top of the base stable to top gay projection want to keep the last quantity that has been sold and the other one to keep the max quantity in both cases is just a matter of ordering the data in the first case using the B time column in the second page using quantity in both cases we fill projection with just the last roof and again this is something that we do when we insert data into the base table and this is something that happens automatically okay if we now run after the insert our select against either the max quantity okay or be lost wanted it okay we will get the very last you see that we have much less rows in the top k projections okay we told at the beginning that basically we can use for built-in function you might remember me max sum and count what if I want to create my own specific aggregation on top of the lid and customer sum up because our customers have very specific needs in terms of live aggregate projections well in this case you can code your own live aggregate production user-defined functions so you can create the user-defined transport function to implement any sort of complex aggregation while loading data basically after you implemented miss VPS you can deploy using a be pre pass approach that basically means the data is aggregated as loading time during the data ingestion or the batch approach that means that the data is when that woman is running on top which things to remember on live a granade projections they are limited to be built in function again some max min and count but you can call your own you DTF so you can do whatever you want they can reference only one table and for bass cab version before 9.3 it was impossible to update or delete on the uncle table this limit has been removed in 9.3 so you now can update and delete data from the uncle table okay live aggregate projection will follow the segmentation of the group by expression and in some cases the best optimizer can decide to pick the live aggregates objection or not depending on if depending on the fact that the aggregation is a consistent or not remember that if we insert and commit every single role to be uncoachable then we will end up with a live aggregate indirection that contains exactly the same number of rows in this case living block or using the base table it would be the same okay so this is one of the two fantastic techniques that we can implement in Burtka this live aggregate projection is basically to avoid or limit goodbyes the other which we are going to talk about is cutting table and be reused in order to avoid the means for joins remember that K is very fast running joints but when we scale up to petabytes of beta we need to boost and this is what we have in order to have is problem fixed regardless the amount of data we are dealing with so how what about suction table let me start with normalized schemas everybody knows what is a normalized scheme under is no but related stuff in this slide the main scope of an normalized schema is to reduce data redundancies so and the fact that we reduce data analysis is a good thing because we will obtain fast and more brides we will have to write into a database small chunks of data into the right table the problem with these normalized schemas is that when you run your queries you have to put together the information that arrives from different table and be required to run joint again jointly that again normally is very good to run joint but sometimes the amount of data makes not easy to deal with joints and joints sometimes are not easy to tune what happens in in the normal let's say traditional data browser is that we D normalize the schemas normally either manually or using an ETL so basically we have on one side in this light on the left side the normalized schemas where we can get very fast right on the other side on the left we have the wider table where we run all the three joints and pre aggregation in order to prepare the data for the queries and so we will have fast bribes on the left fast reads on the Left sorry fast bra on the right and fast read on the left side of these slides the probability in the middle because we will push all the complexity in the middle in the ETL that will have to transform be normalized schema into the water table and the way we normally implement these either manually using procedures that we call the door using ETL this is what happens in traditional data warehouse is that we will have to coach in ETL layer in order to round the insert select that will feed from the normalized schema and right into the widest table at the end the one that is used by the data access tools we we are going to to view store to run our theories so this approach is costly because of course someone will have to code this ETL and is slow because someone will have to execute those batches normally overnight after loading the data and maybe someone will have to check the following morning that everything was ok with the batch and is resource intensive of course and is also human being intensive because of the people that will have to code and check the results it ever thrown because it can fail and introduce a latency because there is a get in the time axis between the time t0 when you load the data into be normalized schema and the time t1 when we get the data finally ready to be to be queried so what would be inverter to facilitate this process is to create this flatten table with the flattened T work first you avoid data redundancy because you don't need the wide table on the normalized schema on the left side second is fully automatic you don't have to do anything you just have to insert the data into the water table and the ETL that you have coded is transformed into an insert select by vatika automatically you don't have to do anything it's robust and this Latin c0 is a single fast as soon as you load the data into the water table you will get all the joints executed for you so let's have a look on how it works in this case we have the table we are going to flatten and basically we have to focus on two different clauses the first one is you see that there is one table here I mentioned value 1 which can be defined as default and then the Select or set using okay the difference between the fold and set using is when the data is populated if we use default data is populated as soon as we know the data into the base table if we use set using Google Earth to refresh but everything is there I mean you don't need them ETL you don't need to code any transformation because everything is in the table definition itself and it's for free and of course is in latency zero so as soon as you load the other columns you will have the dimension value valued as well okay let's see an example here suppose here we have a dimension table customer dimension that is on the left side and we have a fact table on on the right you see that the fact table uses columns like o underscore name or Oh the score city which are basically the result of the salad on top of the customer dimension so Beezus were the join is executed as soon as a remote data into the fact table directly into the fact table without of course loading data that arise from the dimension all the data from the dimension will be populated automatically so let's have an example here suppose that we are running this insert as you can see we are running be inserted directly into the fact table and we are loading o ID customer ID and total we are not loading made a major name no city those name and city will be automatically populated by Vertica for you because of the definition of the flood table okay you see behave well all you need in order to have your widest tables built for you your flattened table and this means that at runtime you won't need any join between base fuck table and the customer dimension that we have used in order to calculate name and city because the data is already there this was using default the other option was is using set using the concept is absolutely the same you see that in this case on the on the right side we have we have basically replaced this all on the school name default with all underscore name set using and same is true for city the concept that I said is the same but in this case which we set using then we will have to refresh you see that we have to run these select trash columns and then the name of the table in this case all columns will be fresh or you can specify only certain columns and this will bring the values for name and city reading from the customer dimension so this technique this technique is extremely useful the difference between default and said choosing just to summarize the most important differences remember you just have to remember that default will relate your target when you load set using when you refresh end and in some cases you might need to use them both so in some cases you might want to use both default end set using in this example here we'll see that we define the underscore name using both default and securing and this means that we love the data populated either when we load the data into the base table or when we run the Refresh this is summary of the technique that we can implement in birth in order to make our and other browsers even more efficient and well basically this is the end of our presentation thank you for listening and now we are ready for the Q&A session you
SUMMARY :
the end to the stage in the area we have
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Tom | PERSON | 0.99+ |
Marta | PERSON | 0.99+ |
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Chris Keg | PERSON | 0.99+ |
Laura Ipsen | PERSON | 0.99+ |
Jeffrey Immelt | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Chris O'Malley | PERSON | 0.99+ |
Andy Dalton | PERSON | 0.99+ |
Chris Berg | PERSON | 0.99+ |
Dave Velante | PERSON | 0.99+ |
Maureen Lonergan | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Paul Forte | PERSON | 0.99+ |
Erik Brynjolfsson | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Andrew McCafee | PERSON | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Cheryl | PERSON | 0.99+ |
Mark | PERSON | 0.99+ |
Marta Federici | PERSON | 0.99+ |
Larry | PERSON | 0.99+ |
Matt Burr | PERSON | 0.99+ |
Sam | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Dave Wright | PERSON | 0.99+ |
Maureen | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Cheryl Cook | PERSON | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
$8,000 | QUANTITY | 0.99+ |
Justin Warren | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
2012 | DATE | 0.99+ |
Europe | LOCATION | 0.99+ |
Andy | PERSON | 0.99+ |
30,000 | QUANTITY | 0.99+ |
Mauricio | PERSON | 0.99+ |
Philips | ORGANIZATION | 0.99+ |
Robb | PERSON | 0.99+ |
Jassy | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Mike Nygaard | PERSON | 0.99+ |
Marc Crespi, Exagrid - VeeamOn 2017 - #VeeamOn - #theCUBE
>> Announcer: Live from New Orleans, it's theCube. Covering VeeamON 2017, brought to you by Veeam. >> We're back at VeeamON, Dave Vellante with Stu Miniman. Marc Crespi is here, he's the vice president of SEs at Exagrid Systems, big partner of Veeam's, big presence on the show floor here. Mark, thanks for coming on theCube. >> Thanks for having me. >> So what's doing with Exagrid, we were talking off camera, kind of know you guys a little bit, you guys are right around the corner from us in Massachusetts, but give us the update in the company and what's new? >> Yeah, be happy to. So, first I'd like to thank Veeam for putting on a terrific show, and it's great to be in the beautiful city of New Orleans with you guys. So, if you look at the Exagrid business, Exagrid is a leader in disbase backup with data deduplication business. And we've been a Veeam partner for a decade now, and right from the early days when we started talking and working with Veeam, we realized that our two architectures had a natural fit. So when we talked to joint Veeam customers, whether they're new customers or existing customers, they're experiencing an exponential benefit over just using Veeam with some other disk player as result. If you look at how our business has evolved over the last decade or so, we were originally in the tape replacement business, you know, the dinosaur tape libraries that were still roaming the earth back then, and what we find now is, a lot of customers have moved on from tape, tape is a minority of the backup storage media that we see in the market today. And most of our business in fact is replacing other disk based implementations, either with or without native data deduplication, about 80% of our business now. And it's all the names you hear in the disk based backup with data deduplication market that we're replacing. We've also grown from a company that initially focused on midsized enterprise to now an enterprise class built product and company. So if you look at our average sale, our average customer size, it has grown exponentially over the past several years. And our sales force has grown over 500% just in the last two to three years itself, so we're in a high growth mode, we're experiencing a lot of success and much of our business is, a significant portion of our business is working with either existing or new Veeam customers. >> And a lot of the growth is coming from replacing existing, what's generally referred to as purpose built backup appliances, is that correct? >> That is correct, and the reason that we're seeing that phenomenon is when we sat down and created our architecture, we looked at the legacy of tape and what was wrong with tape. Well tape wasn't very mechanical, it was unreliable, but it also suffered from a vicious cycle of grow, break, replace. So, all our customer data is growing 20, 30% a year, which means your data's doubling every 2.5 to three years. And whatever you're backing up to, you're going to outgrow it. And you're going to ultimately have to replace it in its entirety. And you've got those precious IT budget dollars that you'd like to spend on other initiatives, and you're rebuying your backup storage just to tread water before you even get around to spending on the expansion. So we said that problem needs to be eliminated entirely, and the only way you can eliminate that problem, is by having a highly scalable architecture that never requires forklift upgrade. So if you look at our technology and why we're able to replace incumbent vendors, we're typically finding a frustrated customer who's been through two or three forced refreshes, either 'cause they outgrew technology or the vendor forced them to to outgrow technology by end of lifing et cetera, which we don't do, we don't end of life any of our products, and therefore they lift their head to say, well before I just spend all these dollars again, plus expansion, why don't I go back into the market and see if anyones figured out a better way to do this, and that's where we come in. We come in and show them that you can start with the footprint you need and then you can expand infinitely and we're never going to force you to buy what you already own, so, it marries up much more closely with the lifespan customer customers want for backup storage than the lifespan vendors want for back up storage. >> Marc, can you unpack that a little bit for us, I think about VM where it was an example of how we avoided having to do certain upgrades. I think of operating systems, or servers that were end of life, stick it into VM, I could grow and expand, but when I think about Gear, there's all sorts of reasons why just the exponential growth of, you know, different media types, different sizes that we need to take, that how come you can do this, while others, you know, force those upgrades. >> That's a great question, so I'd compare and contrast a little bit with virtualization, what virtualization brought to the table was, it allowed you to take a set of computing resources and make sure it was fully utilized, right, so if you had a server, you were running one application on it, maybe it was only 30% utilized, you had spare storage, you had spare compute, so what virtualization allowed you to do was add applications that were segmented, and therefore they could run without conflict and you could get that hardware fully utilized. This is a little bit different in that, if you think about what backup really is, on a nightly or weekly basis, even with some of the modern backup techniques that have come out, customers are moving large amounts of data, and it has to be within a certain window of time, because they don't want backups running during productions hours, because that can impact network performance, server performance, et cetera. The other side of the equation is when they want something back they want it back fast. So in order to achieve that, we made two architectural differences, on a scalability side, we said that the legacy storage architectures that typically, utilize a fixed amount of compute, and then expand by simply adding storage, missed the point that when you add workload to a system, but you don't add power to that system, performance at the same time, everything that system does is going to take longer. So, if I have a certain amount of data, and I have a certain amount of compute, and then I double my data, but I don't double my compute, my memory and my networking, naturally everything that system's doing going to take twice as long. So we recognized that you needed a grid based architecture, or a cluster based architecture, that said, when my data doubles, I'll double the storage, but I'm also going to double the compute, the network, the memory, et cetera, at the same time. So if I have a very short backup window day one, with an Exagrid implementation, and my data doubles, I have that very same backup window, I have the very same recovery time. I have the very same replication time, all the things that a disk based backup appliance do, grow linearly with Exagrid. >> And you're saying other architectures had to wait for intel? >> That's a great point, yes, they rely very much on the compute. Now there's implementations where Flash is being added to try and speed up processes, et cetera, which sounds like a great idea, 'cause Flash is obviously a very useful technology in the storage industry, but when you look at the pricing of backup infrastructure, Flash breaks the model for that, for backup infrastructure. It makes the products more expensive and its unnecessary if you implement things correctly. >> Because FAT Disk is still cheaper than cheap Flash, is that right? >> Spinning disk is still about a sixth to an eighth the cost of Flash. >> Now, I wonder if can go back, I want to pick your technical brain for a minute. So you mentioned tape replacement, and then, as I recall the ascendancy of we can call them purpose built backup appliances, I think it's an IDC term or whatever, but we'll use that. A big part of the value proposition was plugging directly, looking like tape, so you didn't have to rip and replace your processes, and I remember Avomar was trying to convince the market that no, you have to change your processes, and people were like, conceptually that sounds good, but its too disruptive for me, so where were you guys on that curve? Do you look like tape, are you easy to pop in or? >> Proud to say we look nothing like tape. >> Okay, so that was a head wind for you early on, right? But it's really benefited you down the road, is that fair to say? >> If it was a head wind, it was a breeze, okay, and what I mean by that is, the technology we're referring to is VTL, Virtual Type Library, and in the very early days of the market, there were some legacy environments typically Fibersand type environments, where you had to make your disk look like tape so that the customer could transition, especially larger customers where, you know, change is harder, radical change is harder to make quickly. So VTL provided a sort of bridge, or transition technology over a period of time. We're through that phase of the market. >> Dave: But it was a band aide? >> It was very much a band aide. >> But you say it was a breeze, but Data Domaine got two thirds of the market, so, I mean... >> Yeah, but it wasn't because of their VTL. >> Dave: It wasn't. >> No, that was a result of there were still some Fiber environments out there, and they decided to cover that part of the market. We looked at the percentage of the market that we thought would need that, both in the early days but more, even more forward looking, you know, everything about our architecture is quite a bit more forward looking than the people we're competing against. And we realized that the investment it would take to do that, would eventually be wasted because it would go away, and heres why, if you look at what Veeam's software does with instant VM recovery and synthetic fulls and, sureback, and virtual lab, et cetera, when you make a disk look like tape, you lock yourself into the Fred Flinstone era of backup. In other words, you can't take advantage of any of the advanced features in that software, because tape couldn't support those features. And as far as the software knows, it thinks it's talking to a tape library, so it's doing silly things like saying fast forward, rewind, eject with disk. If you think about it, you can almost do a stand up set. >> Dave: Hey, your sequential... >> You know, picking on this, right. So what we said is, that's going to go away, it's very clear with what the software folks are doing, especially Veeam, that that's going to go away. Now, I realize Veeam recently added tape capability, but the reason for that is, not because its a primary backup media, it's because for customers that have, you know, infinite retention, or seven, eight, 10 year retention... >> Dave: They need an offsite tape option. >> They need an economic option. It's not that they like it, because we actually have a lot of conversations with customers, even with that longer term retention where they at least want to explore the economics of disk, but in some instances, even though they hate it, and they grin and bare it, they go with tape just purely economically. >> Right, so early days was, hey don't change anything about your software, keep the Fred Flinstone software and all your processes associated with that, and then, of course VM Ware changed everything. >> Marc: Right, and then graduate to the modern... >> Okay, and then the other big, sort of intern scenario, they used to argue about Dedupe rates, and I presume it's the work load and the nature of the data that determines that, not necessarily the technology, but maybe not, maybe there's some nuance on. >> It's a little bit of both. So a responsible deduplication vendor's going to ask the customer a number of questions about the make up and the nature of their data, okay, however, there's also a lot of aspects to which algorithm you use that are going to drive that. So, if you don't implement a very strong aggressive deduplication algorithm, your result is going to be lower, and we find in many of the software based implementations, and some of the appliance vendors, that they took shortcuts on the algorithms itself. Either because they were compute bound or you might be running it on a standard Windows server which is not optimized to run a really strong algorithm, and therefore where, we may say at 12 weeks of retention, you can get about 20 to one, they're getting six or seven to one, and in some cases they're recommending just put straight disk behind the software, well you end up with disk sprawl, because you're keeping all of this retention but you're not reducing the data enough, so you've got disk everywhere. >> Okay, so the quality of the data reduction algorithms matter, okay, and then the other arguments used to be inline or post process, Frank Luptin used "Oh that crappy post process..." >> Marc: I don't remember when he said that. >> Yeah, and weigh in on that. >> So, we kind of agree. Not that inline is better but that parallelization is better so we actually invented a third way called Adaptive Deduplication. Which basically, what that does is, it allows the chunks of data to land into our box first, and then we begin deduplicating, and replicating and parallel, right. So, we're doing it at the same time, but we're not doing it inline. And we monitor utilization of the system and we favor the backup window, so if think our deduplication is going to slow the back window down, we throttle back a bit. If we have plenty of resources, we crank away at the deduplication and replication. So we eliminated the potential drawbacks of post process, we eliminated the potential drawbacks of inline, and the biggest drawback of inline is that, when you go to recover a system and you think about Veeam's instant VM recovery, if you boot a virtual machine, we have that virtual machine in its entirety in a high speed cache, so it's up in seconds. So I was talking to a customer of ours at our booth who recovered an exchange server recently by booting it off of a Exagrid in about five minutes, right. If you tried to do that out of a dedupe, a device that only has inline deduplicative data, you're looking at hours to maybe even a day. Now you're CEO's not going to be too happy when they can't do email for a day, so I would recommend a high speed cache. >> Marc, Exagrid's been a partner with Veeam for a lot of this journey that Veeam's been on for the last 10 years. Here at the show, they've been talking about where the next 10 years are going, everything cloud, and expanding what they're doing, as you look forward, any announcements this week or as you look forward as a partnership, where do you see things growing? >> We don't have any specific announcements this week, I would refer folks to our website, we just recently announced our 5.0 release, it includes some pretty important things. One of the things it includes is, integration with Veeam's scale out backup repository, which dramatically simplifies the use of multiple Veeam repository's with Veeam's software. We also announced an offering for AWS we think that's appropriate for some customers, not all necessarily, where we can put a virtual appliance on Amazon, and in the cloud realm, there's no question that customers are going to continue to explore the cloud model for both efficiency, operational, expense versus capital, but there's going to be multiple cloud models, for example we partnered with a company, who's here, who you may have spoken to Offsite Data Sync. So if the customer doesn't want to do Amazon for some reason, then Offsite Data Sync will offer them the very same service with Exagrid technology and an operational expense model. And they've been a very good partner of ours as well. >> And the virtual appliance in AWS how does that work? You pop it in a COLO facility or? >> No, you literally, you load it into Amazon like you would any other Amazon machine instance, and it behaves just like a second data center. So you replicate to it, and it can store all of your offsite data, and then when you need it back, you can recover it provided bandwidth is adequate. >> So, I access the instance from the AWS marketplace, or? >> No, we actually provide it directly. >> Oh, okay. >> Through reseller network. >> Yeah, yeah, yeah, yeah. Okay, so I appreciate you by the way taking me down memory lane and sort of educating us on... >> Marc: Love talking about this stuff. >> Now, so, a lot of things we talked about are old news, to sort of set the context. Where are we today, what is the state of the market and the competitive differentiators that customers really care about? >> I think that we're at the state of the market where people are frustrated with a lot of legacy approaches, whether it's on the backup software side or the backup storage side. The licensing models are expensive, the vendors are gouging them, because they're trying to keep revenue, and they're worried about, you know, the players that are becoming the replacement players like Veeam, like Exagrid. So we're at point now where I see more activity of customers looking for alternatives to what they're running today than maybe in history of backup. You know, people always used to say, backup apps are very sticky, they're very hard to replace, well, look at what Veeam's been able to accomplish. Backup storage is very hard to replace, once it's installed. Well if you force a customer every three years to respend the money they already spent, plus more, you're creating a vent where that customers going to get frustrated and they're going to go out and look at alternatives. So I think we're at a point now where more so than ever, customers are looking for alternatives that stop the madness of backup spending, and stop the madness of backup performance degradation. >> Yeah, we had Dave Russel on yesterday and in his last magic quadrant, you probably read it, I think one of his strategic planning assumptions was 50% of the customers out there are going to replace or sunset their existing backup architecture in the next two years. I mean, that's a massive number, so, and obviously a huge opportunity for you and for Veeam. >> Yeah, I'm honored to be talking to Dave later today. >> Well Marc, listen, thanks very much for coming on theCube, it was really a pleasure. >> Thank you guys, it's been fun. >> Thank you. >> Thank you. >> Alright, keep it right there everybody, we'll be back with our next guest after this short break. (techno music)
SUMMARY :
Covering VeeamON 2017, brought to you by Veeam. Marc Crespi is here, he's the vice president of the beautiful city of New Orleans with you guys. and the only way you can eliminate that problem, that how come you can do this, while others, missed the point that when you add workload to a system, but when you look at the pricing of backup infrastructure, the cost of Flash. for me, so where were you guys on that curve? and in the very early days of the market, But you say it was a breeze, but Data Domaine if you look at what Veeam's software does but the reason for that is, not because its a primary It's not that they like it, because we actually and then, of course VM Ware changed everything. that determines that, not necessarily the technology, disk behind the software, well you end up with Okay, so the quality of the data reduction algorithms and the biggest drawback of inline is that, and expanding what they're doing, as you look forward, and in the cloud realm, there's no question So you replicate to it, and it can store all of your Okay, so I appreciate you by the way taking me down and the competitive differentiators that customers and they're worried about, you know, the players that are and in his last magic quadrant, you probably read it, on theCube, it was really a pleasure. we'll be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Frank Luptin | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Marc Crespi | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Exagrid | ORGANIZATION | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
Marc | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
50% | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
Dave Russel | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
12 weeks | QUANTITY | 0.99+ |
Veeam | ORGANIZATION | 0.99+ |
Mark | PERSON | 0.99+ |
New Orleans | LOCATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
VeeamON | ORGANIZATION | 0.99+ |
seven | QUANTITY | 0.99+ |
eight | QUANTITY | 0.99+ |
Fred Flinstone | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
eighth | QUANTITY | 0.99+ |
two architectures | QUANTITY | 0.99+ |
over 500% | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
Exagrid Systems | ORGANIZATION | 0.98+ |
this week | DATE | 0.98+ |
10 year | QUANTITY | 0.98+ |
Data Domaine | ORGANIZATION | 0.98+ |
yesterday | DATE | 0.98+ |
one | QUANTITY | 0.97+ |
two thirds | QUANTITY | 0.97+ |
a day | QUANTITY | 0.97+ |
Flash | TITLE | 0.96+ |
one application | QUANTITY | 0.96+ |
20, 30% a year | QUANTITY | 0.96+ |
about 80% | QUANTITY | 0.96+ |
One | QUANTITY | 0.95+ |
30% | QUANTITY | 0.95+ |
twice | QUANTITY | 0.94+ |
Windows | TITLE | 0.93+ |
about five minutes | QUANTITY | 0.93+ |
second data center | QUANTITY | 0.93+ |
first | QUANTITY | 0.93+ |
three years | QUANTITY | 0.93+ |
earth | LOCATION | 0.92+ |
third way | QUANTITY | 0.92+ |
today | DATE | 0.92+ |
Eric Bassier, Quantum - VeeamOn 2017 - #VeeamOn - #theCUBE
(bright music) >> Narrator: Live from New Orleans. It's The Cube! Covering VeeamON 2017. Brought to you by Veeam. >> Welcome back. Eric Bassier is here. He's the senior director of data center products at Quantum, Veeam partner. Big announcement this week. Eric, good to see you again. Thanks for coming back on. >> Thank you guys for having me. >> So, big theme of this event is, of course, the ecosystem. Veeam sells exclusively through channel partners. Very partner-friendly. Obviously, you guys are the leader in the backup in data protection space. Give us the lowdown on what you guys have announced this week, and we'll get into the partnership. >> Yeah, absolutely. Really excited about what we've announced this week. We've announced new integration with Veeam, both with our DXi deduplication appliances, as well as with our scalar tape products, and we can kind of talk about both individually. On the DXi side, we've integrated with Veeam's data mover service. And what that means is that some of the advanced features that Veeam has, like instant VM recovery, synthetic full backup creation. Historically, we haven't been able to support that on the DXi. And with this latest integration, we've improved performance quite a bit to where we can support those advanced features. And, you know, happy to talk more about that. We think this is a, it's a big step for us. It's a bit of a gap we've had with our DXi a little bit with Veeam. And I think it's going to bring a lot more value to Veeam customers using that deed of appliance. >> Eric, you know, there's always in the keynote, tape gets mentioned, and there's some people that are excited, and some people that look at it sideways and say, "Wait, we still use tape?" I saw tweets going out there, tape and VTL both alive and well, doing there. But, what are you seeing? Maybe help clear up any misconceptions. >> You know, I had a conversation today at VeeamON with a joint Quantum and Veeam customer, and it was an interaction that perfectly summed it up. And they said they were planning to move away from tape and get rid of it. And the events of this last weekend changed their mind. Verbatim. >> Ransomware. >> Ransomware. And Veeam has been good, actually, about promoting why they love tape and why it's important to their customers, and they talk not so much about low cost, long term retention, right? I think there's a really good place for tape as long-term storage for massive scale unstructured data. That's more on kind of the other side of our business. But in the data protection realm, it's about that offline or air-gapped copy to protect against ransomware. And we're seeing, I would almost say, a resurgence in relevance, just from that perspective. It's changing how people use tape, but from that perspective, I think it's as relevant as ever. >> Are your customers actually thinking that way and actually deploying tape in that context? And how does that all work? I wonder if we could talk about that a little bit. >> Yeah, I think they are. I think many of them have been doing it for a number of years. We, at this show, and for a while with Veeam, we've been promoting the old rule or adage, 3-2-1 data protection best practices. I think a lot of our customers that use tape follow that practice. And... You know, they... They're probably not... We've certainly seen customers use less tape for backup. No doubt about it. They're consolidating it in the data centers, but they still create that offline copy. And then they keep it either offsite, or even just on premise, and it's got that air gap. It's not on the network, so it's not susceptible to these ransomware viruses. >> So I want to unpack that a little bit. I had a conversation with Edward, our buddy Edward Helekiel, give him credit for this idea. And I was sort of making that argument that it has that air gap, and his point was, "Well, yeah, but you got to recycle the backups, "the offline tape." And I said, "Okay." His point was, if you... 'Cause my understanding is with ransomware, everything starts to get encrypted. And then you got to pay for the keys. So if you're backing up encrypted data, eventually you're in trouble, unless you have a way to detect it. So, is that part of the... Again, we're sort of veering off into a tangent of ransomware. >> No, that's all right. >> But you would think that a backup supplier like Veeam would be able to detect anomalies because you're doing incremental change data every day or multiple times per day, and if you're starting to see some uptick in anomalous activity, say, "Whoa, hold on!" Maybe that's a signal. Is that the right way to think about it? >> You know, I do think, I think that Veeam, and I think that some of the other data protection applications are starting to build a little bit of intelligence and to try to detect it. I don't know... I'm not an expert on that. I can't speak to it. I would say that, we would advocate as a best practice that customers should be making that offline copy on tape with adequate frequency so that the feel like they're protected. Because I wouldn't say that you need to rotate the tapes, but I would think about it as if you create tapes once a day, and then you get hit with a ransomware attack, the data that's going to be susceptible is any new data that's been created since the last backup you made on tape a day ago. It's kind of that old backup rule a little bit. >> Dave: So your RPO is one day? >> That's right, and so... But once you've got that offline copy created on tape, it can be on premise, or it can be offsite at a vault or something, and keep it there for as long as you need to keep it there. It's offline, it's not on the network. >> And the backup software vendor is in a good position to provide visibility to those anomalies. Okay, let's go back to the appliance that you had asked. >> Before I do, actually, just so we're on the segue, it actually goes, let's stick with tape for a second. >> Dave: Yeah, be happy to. >> And... We can come back to dedup side. The cool thing we've done is, for Veeam customers, historically, it's been difficult to create tape in a Veeam environment because they've required an external physical tape server. And, of course, their customers are largely virtualized, right? Well, we've solved that. So what we've done is we've, we just announced what we call our Scalar iBlade for our new scalar tape libraries. It's an embedded intel-based blade server that fits in the back of our library chassis. And it comes with a Windows operating system on that. And... What it does, we've designed it so it can actually host a Veeam tape server, a Veeam proxy server. Really easy to install, and I can talk more about that. Net for customers is, they can now create tape in a Veeam environment without this external dedicated physical server. >> Dave: You just utilize the resource on your appliance. >> So on the one hand, it's not anything super revolutionary. On the other hand, there's nobody else in the market that has anything like this for tape. I joke that it's converge tape, or it's hyper converge tape, because we built the compute in. But... It's more of a marketing thing. I think for customers, it is providing a really good value. Because they're able to create tapes in a Veeam environment now, really easy way, and if they're in a 100% virtualized environment, they can do that without having to install that separate physical server. So that's iBlade. That was one of the big things we announced, and certainly sort of a cornerstone of what we talk about for 3-2-1 data protection. >> So Eric, of course, one of the big announcements this morning was the version 10 of the Veeam Availability Suite. What does that mean to your customers and kind of joint development? >> There's a few things. There's one minor thing that I'll put a plug in, in that, in Veeam version 10, we'll actually have the, our DXi appliance be added to the Veeam user interface. So kind of a user usability enhancement. >> Simplifies things. >> Yeah, it simplifies things. I'm excited about the direction Veeam is taking in terms of... In fact, I just saw Jason talk about it a little bit. It's kind of this progression from backup to availability, and now to almost data management and getting more value out of that secondary storage. And when I think about Quantum, our focus is about secondary storage. It's about data protection and archive storage. And we've got some unique solutions there. I think we can have a hardware or storage portfolio that complements Veeam really well. It will be able to kind of bring that much more to the table for their customers. I'm excited about the direction that they talked about. I'm interested in learning more about it, but I'm excited about it. >> So, let's go back to the dedup appliance. You were saying that you've made really some enhancements to be able to exploit some of the things that the features at Veeam has been introducing over the years. Can you explain that a little bit further? >> Yeah, we... We... So the DXi's an inline variable dedup appliance. So the benefits of that, really good data reduction, et cetera, et cetera. One of the sort of gaps that we had was we just needed to make communication more efficient between a Veeam proxy server and our dedup appliance. And we've been working with the Veeam engineering team on this for about a year or something. We decided to go the route where we were going to use their data mover service. And so we've now announced that integration. The way it works from a customer perspective, pretty simple. Configure the DXi as a target. Once that backup job kicks off, Veeam actually installs a little data mover agent right on the DXi. And then we can use their data mover protocol to be able to communicate between the proxy and the dedup target. Net for a customer, it just makes operations, like instant VM recovery or creating a synthetic full backup 10 times faster or 20 times faster than where we were previously. >> Which was using a different data mover. >> Yeah, it was just a using a CIFS, NFS, or just standard kind of-- >> So not really a high-speed data mover designed to, okay. >> And we've done some things in our software through our, just our learnings, and the work that we've collaborated on with the Veeam engineering team. We've done some things in our DXi software to try to optimize reads and kind of how we do that under the covers, just to speed up things like instant VM recovery. So we've done some things there that I think will have a good benefit in terms of improved performance. >> I'm hearing a lot of just really practical activities going on in the partnership ecosystem, which says, "Okay, we got this big TAM. "How do we actually penetrate it? "How do we increase our ability to capture that TAM?" A perfect example here. >> Eric: Yeah, that's right. >> So where do you guys go from here? >> You know, I think we've been partnered with Veeam for a number of years now. We've got a lot of joint customers. I think this integration is just kind of, kind of the next step in our partnership, and... I think that given Veeam's direction, I just think we have even more opportunity to integrate with them, and I think it's going to be in the areas of not just data protection, but archive and kind of managing data over its life. You know, and I mean, that's... We already talk about that in terms of some of the things we do for our customers in different industries, like broadcast or post-production. I'm excited to kind of bring that into the data protection realm and the data center. And I think we'll be able to do some really cool things with it. >> Last question I have for you is sort of customer interactions. What are you hearing from them these days? Beyond the digital transformation bromide. What are some of the hardcore gnarly things that they want you to solve? >> You know, when I'm out talking to customers, I think it's... It seems to be all about Flash. It's all about the Cloud, and it's kind of all about convergence or hyper convergence. I think our customers, especially in IT, they're wrestling with this completely new infrastructure design. And what's the right roadmap for them to kind of go from here to there? And that's where, you know, that's where we're investing. That type of a transition doesn't happen overnight. And so, I think we just want to be there to help our customers kind of along that roadmap and along that journey. Embrace the Cloud and embrace these new technologies. Help 'em get to where they need to go. (chuckles) >> Excellent, well, Eric, thanks for sharing your announcements, and congratulations on all the hard work you're getting to market. We know how much goes into that, so we really appreciate your time. >> Yeah, thank you guys very much. Thank you. >> You're welcome, all right, so that's a wrap for us today. We'll be back tomorrow. We start at, what time do we start tomorrow, Stu? >> Stu: Right after the keynote. >> Right after the keynote. >> Stu: So, 11 o'clock. >> 11 a.m. local time. We're in New Orleans. >> Stu: Central. (chuckles) >> So that's central. And check out siliconangle.tv for all the videos today. Check out siliconangle.com for all the news. And we'll see you tomorrow, everybody. Thanks for watching. (energetic music) (typing) (plane engine accelerating)
SUMMARY :
Brought to you by Veeam. Eric, good to see you again. Give us the lowdown on what you guys And I think it's going to bring a lot more value and say, "Wait, we still use tape?" And the events of this last weekend changed their mind. But in the data protection realm, And how does that all work? It's not on the network, so it's not susceptible And then you got to pay for the keys. Is that the right way to think about it? the data that's going to be susceptible It's offline, it's not on the network. And the backup software vendor is in a good position it actually goes, let's stick with tape for a second. that fits in the back of our library chassis. So on the one hand, it's not anything super revolutionary. So Eric, of course, one of the big announcements our DXi appliance be added to the Veeam user interface. I'm excited about the direction that they talked about. that the features at Veeam has been introducing One of the sort of gaps that we had was and the work that we've collaborated on going on in the partnership ecosystem, which says, We already talk about that in terms of some of the things that they want you to solve? And so, I think we just want to be there and congratulations on all the hard work Yeah, thank you guys very much. We start at, what time do we start tomorrow, Stu? We're in New Orleans. Stu: Central. for all the videos today.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Edward | PERSON | 0.99+ |
Eric | PERSON | 0.99+ |
Veeam | ORGANIZATION | 0.99+ |
Eric Bassier | PERSON | 0.99+ |
Edward Helekiel | PERSON | 0.99+ |
New Orleans | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
tomorrow | DATE | 0.99+ |
VeeamON | ORGANIZATION | 0.99+ |
20 times | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
10 times | QUANTITY | 0.99+ |
11 o'clock | DATE | 0.99+ |
today | DATE | 0.99+ |
Jason | PERSON | 0.99+ |
11 a.m. | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
Windows | TITLE | 0.98+ |
once a day | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
one | QUANTITY | 0.97+ |
Quantum | ORGANIZATION | 0.95+ |
Stu | PERSON | 0.95+ |
Scalar iBlade | COMMERCIAL_ITEM | 0.94+ |
about a year | QUANTITY | 0.93+ |
One | QUANTITY | 0.91+ |
a day ago | DATE | 0.91+ |
this morning | DATE | 0.9+ |
DXi | COMMERCIAL_ITEM | 0.9+ |
version 10 | OTHER | 0.9+ |
Suite | TITLE | 0.84+ |
last weekend | DATE | 0.81+ |
Veeam | TITLE | 0.78+ |
one day | QUANTITY | 0.78+ |
times | QUANTITY | 0.77+ |
VeeamOn | EVENT | 0.77+ |
TAM | ORGANIZATION | 0.73+ |
iBlade | ORGANIZATION | 0.73+ |
siliconangle.tv | OTHER | 0.68+ |
2017 | DATE | 0.68+ |
one minor thing | QUANTITY | 0.66+ |
VTL | ORGANIZATION | 0.65+ |
DXi | TITLE | 0.63+ |
a second | QUANTITY | 0.61+ |
Verbatim | ORGANIZATION | 0.58+ |
VeeamON 2017 | EVENT | 0.57+ |
people | QUANTITY | 0.57+ |