Brett Rudenstein - Hadoop Summit 2014 - theCUBE - #HadoopSummit

the cube and hadoop summit 2014 is brought to you by anchor sponsor Hortonworks we do have do and headline sponsor when disco we make hadoop invincible okay welcome back and when we're here at the dupe summit live is looking valance the cube our flagship program we go out to the events expect a signal from noise i'm john per year but Jeff Rick drilling down on the topics we're here with wind disco welcome welcome Brett room Stein about senior director tell us what's going on for you guys I'll see you at big presence here so all the guys last night you guys have a great great booth so causing and the crew what's happening yeah I mean the show is going is going very well what's really interesting is we have a lot of very very technical individuals approaching us they're asking us you know some of the tougher more technical in-depth questions about how our consensus algorithm is able to do all this distributor replication which is really great because there's a little bit of disbelief and then of course we get to do the demonstration for them and then suspend disbelief if you will and and I think the the attendance has been great for our brief and okay I always get that you always we always have the geek conversations you guys are a very technical company Jeff and I always comment certainly de volada and Jeff Kelly that you know when disco doesn't has has their share pair of geeks and that dudes who know they're talking about so I'm sure you get that but now them in the business side you talk to customers I want to get into more the outcome that seems to be the show focused this year is a dupe of serious what are some of the outcomes then your customers are talking about when they get you guys in there what are their business issues what are they tore what are they working on to solve yeah I mean I think the first thing is to look at you know why they're looking at us and then and then with the particular business issues that we solve and the first thing and sort of the trend that we're starting to see is the prospects and the customers that we have are looking at us because of the data that they have and its data that matters so it's important data and that's when people start to come to is that's when they look to us as they have data that's very important to them in some cases if you saw some of the UCI stuff you see that the data is you know doing live monitoring of various you know patient activity where it's not just about about about a life and monitoring a life but potentially about saving the life and systems that go down not only can't save lives but they can potentially lose them so you have a demos you want to jump into this demo here what is this all about you know the demo that the demonstration that I'm going to do for you today is I want to show you our non-stop a new product i'm going to show you how we can basically stand up a single HDFS or a single Hadoop cluster across multiple data centers and I think that's one of the tough things that people are really having trouble getting their heads wrapped around because most people when they do multi data center Hadoop they tend to do two different clusters and then synchronize the data between the two of them the way they do that is they'll use you know flume or they'll use some form of parallel ingest they'll use technologies like dis CP to copy data between the data centers and each one of those has sort of an administrative burden on them and then some various flaws in their and their underlying architecture that don't allow them to do a really really detailed job as ensuring that all blocks are replicated properly that no mistakes are ever made and again there's the administrative burden you know somebody who always has to have eyes in the system we alleviate all those things so I think the first thing I want to start off with we had somebody come to our booth and we were talking about this consensus algorithm that we that we perform and the way we synchronize multiple name nodes across multiple geographies and and again and that sort of spirit of disbelief I said you know one of the key tenants of our application is it doesn't underlie it doesn't change the behavior of the application when you go from land scope to win scope and so I said for example if you create a file in one data center and 3,000 miles apart or 7,000 miles apart from that you were to hit the same create file operation you would expect that the right thing happens what somebody gets the file created and somebody gets file already exists even if at 7,000 miles distance they both hit this button at the exact same time I'm going to do a very quick demonstration of that for you here I'm going to put a file into HDFS the my top right-hand window is in Northern Virginia and then 3,000 miles distance from that my bottom right-hand window is in Oregon I'm going to put the etsy hosts file into a temp directory in Hadoop at the exact same time 3,000 miles distance apart and you'll see that exact behavior so I've just launched them both and again if you look at the top window the file is created if you look at the bottom window it says file already exists it's exactly what you'd expect a land scope up a landscape application and the way you'd expect it to behave so that is how we are ensure consistency and that was the question that the prospect has at that distance even the speed of light takes a little time right so what are some of the tips and tricks you can share this that enable you guys to do this well one of the things that we're doing is where our consensus algorithm is a majority quorum based algorithm it's based off of a well-known consensus algorithm called paxos we have a number of significant enhancements innovations beyond that dynamic memberships you know automatic scale and things of that nature but in this particular case every transaction that goes into our system gets a global sequence number and what we're able to do is ensure that those sequence numbers are executed in the correct order so you can't create you know you can't put a delete before a create you know everything has to happen in the order that it actually happened occurred in regardless of the UN distance between data centers so what is the biggest aha moment you get from customer you show them the demo is it is that the replication is availability what is the big big feature focus that they jump on yeah I think I think the biggest ones are basically when we start crashing nodes well we're running jobs we separate the the link between the win and maybe maybe I'll just do that for you now so let's maybe kick into the demonstration here what I have here is a single HDFS cluster it is spanning two geographic territory so it's one cluster in Northern Virginia part of it and the other part is in Oregon I'm going to drill down into the graphing application here and inside you see all of the name notes so you see I have three name nodes running in Virginia three name nodes running in Oregon and the demonstration is as follows I'm going to I'm going to run Terrigen and Terra sort so in other words i'm going to create some data in the cluster I'm then going to go to sort it into a total order and then I'm going to run Tara validate in the alternate data center and prove that all the blocks replicated from one side to the other however along the way I'm going to create some failures I am going to kill some of that active name nodes during this replication process i am going to shut down the when link between the two data centers during the replication paris's and then show you how we heal from from those kinds of conditions because our algorithm treats failure is a first class citizen so there's really no way to deal in the system if you will so let's start unplug John I'm active the local fails so let's go ahead and run the Terrigen in the terrorists or I'm going to put it in the directory called cube one so we're creating about 400 megabytes of data so a fairly small set that we're going to replicate between the two data centers now the first thing that you see over here on the right-hand side is that all of these name nodes kind of sprung to life that is because in an active active configuration with multiple name nodes clients actually load balance their requests across all of them also it's a synchronous namespace so any change that I make to one immediately Curzon immediately occurs on all of them the next thing you might notice in the graphing application is these blue lines over and only in the Oregon data center the blue lines essentially represent what we call a foreign block a block that is not yet made its way across the wide area network from the site of ingest now we move these blocks asynchronously from the site of in jeff's oh that I have land speed performance in fact you can see I just finished the Terrigen part of the application all at the same time pushing data across the wide area network as fast as possible now as we start to get into the next phase of the application here which is going to run terrace sort i'm going to start creating some failures in the environment so the first thing I'm going to do is want to pick two named nodes I'm going to fail a local named node and then we're also going to fail a remote name node so let's pick one of these i'm going to pick HD p 2 is the name of the machine so want to do ssh hd2 and i'm just going to reboot that machine so as I hit the reboot button the next time the graphing application updates what you'll notice here in the monitor is that a flat line so it's no longer taking any data in but if you're watching the application on the right hand side there's no interruption of the service the application is going to continue to run and you'd expect that to happen maybe in land scope cluster but remember this is a single cluster a twin scope with 3,000 miles between the two of them so I've killed one of the six active named nodes the next thing I'm going to do is kill one of the name nodes over in the Oregon data center so I'm going to go ahead and ssh into i don't know let's pick the let's pick the bottom one HTTP nine in this case and then again another reboot operation so I've just rebooted two of the six name nose while running the job but if again if you look in the upper right-hand corner the job running in Oregon kajabi running in North Virginia continues without any interruption and see we just went from 84 to eighty eight percent MapReduce and so forth so again uninterruptedly like to call continuous availability at when distances you are playing that what does continuous availability and wins because that's really important drill down on yeah I mean I think if you look at the difference between what people traditionally call high availability that means that generally speaking the system is there there is a very short time that the system will be unavailable and then it will then we come available again a continuously available system ensures that regardless of the failures that happen around it the system is always up and running something is able to take the request and in a leaderless system like ours where no one single node actually it actually creates a leadership role we're able to continue replication we're and we're also able to continue the coordinator that's two distinct is high availability which everyone kind of know was in loves expensive and then continues availability which is a little bit kind of a the Sun or cousin I guess you know saying can you put in context and cost implementation you know from a from a from a from a perspective of a when disco deployment it's kind of a continuously available system even though people look at us as somewhat traditional disaster recovery because we are replicating data to another data center but remember it's active active that means both data centers are able to write at the same time you have you get to maximize your cluster resources and again if we go back to one of the first questions you asked what are what a customer's doing this with this what a prospects want to do they want to maximize their resource investment if they have half a million dollars sitting in another data center that only is able to perform an emergency recovery situation that means they either have to a scale the primary data center or be what they want to do is utilize existing resource in an active active configuration which is why i say continuous availability they're able to do that in both data centers maximizing all their resource so you versus the consequences of not having that would be the consequences of not being able to do that is you have a one-way synchronization a disaster occurs you then have to bring that data center online you have to make sure that all the appropriate resources are there you have to you have an administrative burden that means a lot of people have to go into action very quickly with the win disco systems right what that would look like I mean with time effort cost and you have any kind of order of magnitude spec like a gay week called some guy upside dude get in the office login you have to look at individual customer service level agreements a number that i hear thrown out very very often is about 16 hours we can be back online within 16 hours really RTO 44 when disco deployment is essentially zero because both sites are active you're able to essentially continue without without any doubt some would say some would say that's contingent availability is high available because essentially zero 16 that's 16 hours I mean any any time down bad but 16 hours is huge yeah that's the service of level agreement then everyone says but we know we can do it in five hours the other of course the other part of that is of course ensuring that once a year somebody runs through the emergency configure / it you know procedure to know that they truly can be back up in line in the service level agreement timeframe so again there's a tremendous amount of effort that goes into the ongoing administrating some great comments here on our crowd chatter out chat dot net / hadoop summit joined the conversation i'll see ya we have one says nice he's talking about how the system has latency a demo is pretty cool the map was excellent excellent visual dave vellante just weighed in and said he did a survey with Jeff Kelly said large portion twenty-seven percent of respondents said lack of enterprises great availability was the biggest barriers to adoption is this what you're referring to yeah this is this is exactly what we're seeing you know people are not able to meet the uptime requirements and therefore applications stay in proof-of-concept mode or those that make it out of proof of concept are heavily burdened by administrators and a large team to ensure that same level of uptime that can be handled without error through software configuration like Linda scope so another comment from Burt thanks Burt for watching there's availability how about security yeah so security is a good one of course we are you know we run on standard dupe distributions and as such you know if you want to run your cluster with on wire encryption that's okay if you want to run your cluster with kerberos authentication that's fine we we fully support those environments got a new use case for crowd chapel in the questions got more more coming in so send them in we're watching the crowd chat slep net / hadoop summit great questions and a lot of people aren't i think people have a hard time partial eh eh versus continues availability because you can get confused between the two is it semantics or is it infrastructure concerns what is what is the how do you differentiate between those two definitions me not I think you know part of it is semantics but but but also from a win disco perspective we like to differentiate because there really isn't that that moment of downtime there is there really isn't that switch over moment where something has to fail over and then go somewhere else that's why I use that word continuous availability the system is able to simply continue operating by clients load balancing their requests to available nodes in a similar fashion when you have multiple data centers as I do here I'm able to continue operations simply by running the jobs in the alternate data center remember that it's active active so any data ingest on one side immediately transfers to the other so maybe let me do the the next part I showed you one failure scenario you've seen all the nodes have actually come back online and self healed the next part of this I want to do an separation I want to run it again so let me kick up kick that off when I would create another directory structure here only this time I'm going to actually chop the the network link between the two data centers and then after I do that I'm going to show you some some of our new products in the works give you a demonstration of that as well well that's far enough Britain what are some of the applications that that this enables people to use the do for that they were afraid to before well I think it allows you know when we look at our you know our customer base and our prospects who are evaluating our technologies it opens up all the all the regulated industries you know things like pharmaceutical companies financial services companies healthcare companies all these people who have strict regulations auditing requirements and now have a very clear concise way to not only prove that they're replicating data that data has actually made its way it can prove that it's in both locations that it's not just in both locations that it's the correct data sometimes we see in the cases of like dis CP copying files between data centers where the file isn't actually copied because it thinks it's the same but there is a slight difference between the two when the cluster diverges like that it's days of administration hour depending on the size of the cluster to actually to put the cluster you know to figure out what went wrong what went different and then of course you have to involve multiple users to figure out which one of the two files that you have is the correct one to keep so let me go ahead and stop the van link here of course with LuAnn disco technology there's nothing to keep track of you simply allow the system to do HDFS replication because it is essentially native HDFS so I've stopped the tunnel between the two datacenters while running this job one of the things that you're going to see on the left-hand size it looks like all the notes no longer respond of course that's just I have no visibility to those nodes there's no longer replicating any data because the the tunnel between the two has been shut down but if you look on the right hand side of the application the upper right-hand window of course you see that the MapReduce job is still running it's unaffected and what's interesting is once I start replicating the data again or once i should say once i start the tunnel up again between the two data centers i'll immediately start replicating data this is at the block level so again when we look at other copy technologies they are doing things of the file level so if you had a large file and it was 10 gigabytes in size and for some reason you know your your file crash but in that in that time you and you were seventy percent through your starting that whole transfer again because we're doing block replication if you had seventy percent of your box that had already gone through like perhaps what I've done here when i start the tunnel backup which i'm going to do now what's going to happen of course is we just continue from those blocks that simply haven't made their way across the net so i've started the tunnel back up the monitor you'll see springs back to life all the name nodes will have to resync that they've been out of sync for some period of time they'll learn any transactions that they missed they'll be they'll heal themselves into the cluster and we immediately start replicating blocks and then to kind of show you the bi-directional nature of this I'm going to run Tara validate in the opposite data center over in Oregon and I'll just do it on that first directory that we created and in what you'll see is that we now wind up with foreign blocks in both sides I'm running applications at the same time across datacenters fully active active configuration in a single Hadoop cluster okay so the question is on that one what is the net net summarized that demo reel quick bottom line in two sentences is that important bottom line is if name notes fail if the wind fails you are still continuously operational okay so we have questions from the commentary here from the crowd chat does this eliminate the need for backup and what is actually transferring certainly not petabytes of data ? I mean you somewhat have to transfer what what's important so if it's important for you to I suppose if it was important for you to transfer a petabyte of data then you would need the bandwidth that support I transfer of a petabyte of data but we are to a lot of Hollywood studios we were at OpenStack summit that was a big concern a lot of people are moving to the cloud for you know for workflow and for optimization Star Wars guys were telling us off the record that no the new film is in remote locations they set up data centers basically in the desert and they got actually provisioned infrastructure so huge issues yeah absolutely so what we're replicating of course is HDFS in this particular case I'm replicating all the data in this fairly small cluster between the two sites or in this case this demo is only between two sites I could add a third site and then a failure between any two would actually still allow complete you know complete availability of all the other sites that still participate in the algorithm Brent great to have you on I want to get the perspective from you in the trenches out in customers what's going on and win disco tell us what the culture there what's going on the company what's it like to work there what's the guys like I mean we we know some of the dudes there cause we always drink some vodka with him because you know likes to tip back a little bit once in a while but like great guy great geeks but like what's what's it like it when disco I think the first you know you touched on a little piece of it at first is there are a lot of smart people at windows go in fact I know when I first came on board I was like wow I'm probably the most unsmoked person at this company but culturally this is a great group of guys they like to work very hard but equally they like to play very hard and as you said you know I've been out with cause several times myself these are all great guys to be out with the culture is great it's a it's a great place to work and you know so you know people who are who are interested should certainly yeah great culture and it fits in we were talking last night very social crowd here you know something with a Hortonworks guide so javi medicate fortress ada just saw him walk up ibm's here people are really sociable this event is really has a camaraderie feel to it but yet it's serious business and you didn't the days they're all a bunch of geeks building in industry and now it's got everyone's attention Cisco's here in Intel's here IBM's here I mean what's your take on the big guys coming in I mean I think the big guys realize that that Hadoop is is is the elephant is as large as it appears elephant is in the room and exciting and it's and everybody wants a little piece of it as well they should want a piece of it Brett thanks for coming on the cube really appreciate when discs are you guys a great great company we love to have them your support thanks for supporting the cube we appreciate it we right back after this short break with our next guest thank you

Published Date : Jun 4 2014

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
seventy percent	QUANTITY	0.99+
Oregon	LOCATION	0.99+
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
3,000 miles	QUANTITY	0.99+
Virginia	LOCATION	0.99+
Jeff Rick	PERSON	0.99+
Burt	PERSON	0.99+
84	QUANTITY	0.99+
Northern Virginia	LOCATION	0.99+
North Virginia	LOCATION	0.99+
two	QUANTITY	0.99+
five hours	QUANTITY	0.99+
3,000 miles	QUANTITY	0.99+
7,000 miles	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
Brett	PERSON	0.99+
Star Wars	TITLE	0.99+
10 gigabytes	QUANTITY	0.99+
half a million dollars	QUANTITY	0.99+
16 hours	QUANTITY	0.99+
Brett Rudenstein	PERSON	0.99+
Jeff	PERSON	0.99+
both locations	QUANTITY	0.99+
two sentences	QUANTITY	0.99+
two files	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
two datacenters	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
one	QUANTITY	0.99+
two different clusters	QUANTITY	0.99+
both sides	QUANTITY	0.99+
both sites	QUANTITY	0.99+
first directory	QUANTITY	0.98+
third site	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first	QUANTITY	0.98+
Cisco	ORGANIZATION	0.98+
twenty-seven percent	QUANTITY	0.98+
John	PERSON	0.98+
first thing	QUANTITY	0.98+
one side	QUANTITY	0.97+
Britain	LOCATION	0.97+
today	DATE	0.97+
two definitions	QUANTITY	0.97+
OpenStack	EVENT	0.96+
Hortonworks	ORGANIZATION	0.96+
eighty eight percent	QUANTITY	0.96+
last night	DATE	0.96+
both data centers	QUANTITY	0.94+
each one	QUANTITY	0.94+
zero	QUANTITY	0.94+
once a year	QUANTITY	0.94+
one failure	QUANTITY	0.93+
the cube and hadoop summit 2014	EVENT	0.93+
two geographic territory	QUANTITY	0.93+
Intel	ORGANIZATION	0.92+
both	QUANTITY	0.92+
single	QUANTITY	0.92+
this year	DATE	0.91+
one data center	QUANTITY	0.91+
dupe summit	EVENT	0.9+
Brett room Stein	PERSON	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for the cube and hadoop summit 2014: