Image Title

Search Results for dupe summit:

Steve Wooledge - HP Discover Las Vegas 2014 - theCUBE - #HPDiscover


 

>>Live from Las Vegas, Nevada. It's a queue at HP. Discover 2014 brought to you by HP. >>Welcome back, everyone live here in Las Vegas for HP. Discover 2014. This is the cube we're out. We go where the action is. We're on the ground here at HP. Discover getting all the signals, sharing them with you, extracting the signal from the noise. I'm John furrier, founder of SiliconANGLE. I joined Steve Woolwich VP of product marketing at map art technologies. Great to see you welcome to the cube. Thank you. I know you got a plane to catch up, but I really wanted to squeeze you in because you guys are a leader in the big data space. You guys are in the top three, the three big whales map are Hortonworks, Cloudera. Um, you know, part of the original big data industry, which, you know, when we did the cube, when we first started the industry, you had like 30, 34 employees, total combined with three, one company Cloudera, and then Matt are announced and then Hortonworks, you guys have been part of that. Holy Trinity of, of early pioneers. Give us the update you guys are doing very, very well. Uh, we talked to you guys at the dupe summit last week. So Jack Norris for the party, give us the update what's going on with the momentum and the traction. And then I want to talk about some of the things with the product. >>Yeah. So we've seen a tremendous uptick in sales at map. Are we tripled revenue? We announced that publicly about a month ago. So we went up 300% in sales, over Q3, I'm sorry, Q1 of 2013. And I think it's really, you know, the maturity of the market. As people move more towards production, they appreciate the enterprise features. We built into the map, our distribution for Hadoop. So, um, you know, the stats I would share is that 80% of our customers triple the size of their cluster within the first 12 months and 50% of them doubled the size of the cluster because there's the, you know, they had that first production success use case and they find other applications and start rolling out more and more. So it's been great for us. >>You know, I always joke with Jack Norris, who's the VP of marketing over there. And John Frodo is the CEO about Matt bars, humbleness. You don't have the fanfare of all the height, depressed love cloud era. Now see they had done some pretty amazing things. They've had a liquidity event, so essentially kind of an IPO, if you will, that huge ex uh, financing from Intel and they're doing great big Salesforce. Hortonworks has got their open source play. You guys got, you got your heads down as well. So talk about that. How many employees you guys have and what's going on with the product? How many, how many new, what, how many products do you guys actually, >>We have, well, we have one product. So we have the map, our distribution for Hadoop, and it's got all the open source packages directly within it, but where we really innovate is in the course. So that's where we, we spent our time early on was really innovating that data platform to give everything within the Hadoop ecosystem, more reliability, better availability, performance, security scale, >>It's open source contributions to the court. And you guys put stuff on top of that, uh, >>And how it works. Yeah. And even some projects we lead the projects like with Apache Mahal and Apache drill, which is coming into beta shortly other projects, we commit and contribute back. But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is at that data platform level. So >>HP is a big data leader officer. They bought, uh, autonomy. They have HP Vertica. You guys are here. Hey, what are you doing here? Obviously we covered the cube, uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity going on other integration opportunities? >>Yeah, a few things. So, um, obviously the HP Vertica news was big. We went into general availability that solution the first week of may. So, um, what we have is the HP Vertica database integrated directly on top of our data platform. So it's this hybrid solution where you have full SQL database directly within your Hadoop distribution. Um, so it had a couple sessions on that. We had, uh, a nice panel discussion with our friends from Cloudera and Hortonworks. So really good discussion with HP about just the ecosystem and how it's evolving. The other things we're doing with HP now is, you know, we've got reference architectures on their hardware lines. So, um, you know, people can deploy Mapbox on the hardware of HP, but then also we're talking with the, um, the autonomy group about enterprise search and looking at a similar type of integration where you could have the search integrated directly into your Hadoop distro. And we've got some joint accounts we're piloting that she goes, now, >>You guys are integrating with HP pretty significantly that deals is working well. Absolutely. What's the coolest thing that you've seen with an HP that you can share. How so I asked you in the big data landscape, everyone's Bucher, you know, hunkering down, working on their feature, but outside in the real world, big data, it's not on the top of mind of the CIO, 24 7. It's probably an item that they're dressing. What have you seen and what have you been most impressed with at HP here? >>Yeah. Say, you know, this is my first HP event like this. I think the strategy they have is really good. I think in certain areas like the cloud in particular with the helium, I think they made a lot of early investments there and place some bets. And I think that's going to pay off well for them. And that marries pretty nicely with our strategy as well in terms of, you know, we have on-premise deployments, but we're also an OEM if you will, within Amazon web services. So we have a lot of agility in the cloud if you will. And I think as those products and the partnerships with HP, evolvable, we'll be playing a lot more with them in the cloud as well. >>I see that asks you a question. I want you to share with the folks out there in your own words, what is it about map bar that they may or may not understand or might not know about? Um, a little humble brag out there and share some, share some, uh, insight of, into, into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you guys do shit share with them, with what, what map map is all about. >>Yeah. I mean, for me, I was in this space with Aster data and kind of the whole Hadoop and MapReduce area since 2008 and pretty familiar with everybody in the space. I really looked at Matt bars, the best technology hands down, you look at the Forrester wave and they rank us as having the best technology today, as well as product roadmap. I think the misperception is people think, oh, it's proprietary and close. It's actually the opposite of that. We have an unbiased open-source approach where we'll ship in support in our distribution, in the entire Apache spark stack. We're not selective over which projects within Apache spark. We support. Um, I feel like SQL on Hadoop. We support Impala as well as hive and other SQL on to do technologies, including the ability to integrate HP Vertica directly in the system. And it's because of the openness of our platform. I'd say it's actually more open because of the standards we've integrated into the data platform to support a lot of third-party tools directly within it. So there is no locked in the storage formats are all the same. The code that runs on top of the distribution from the projects is exactly the same. So you can build a project in hive or some other system, and you can port it between any of the distributions. So there isn't a, lock-in >>The end of the day, what the customers want is they want ease of integration. They want reliability. That's right. And so what are you guys working on next? What's the big, uh, product marketing roadmap that you can share with us? >>Yeah, I think for us, because of the innovations we did in the data platform allows us to support not only more applications, but more types of operational systems. So integrating things like fraud detection and recommendation engines directly with the analytical systems to really speed up that, um, accuracy and, and, uh, in targeting and detecting risk and things like that. So I think now over time, you know, Hadoop has sort of been this batch analytic type of platform, but the ability to converge operations and analytics in one system is really going to be enabled by technology like Matt BARR. >>How many employees do you guys have now? Uh, >>I'm not sure what our CFO would. Let me say that before. You can say we're over 200 at this point >>As well. And over five, the customers which got the data, you guys do summit graduations, we covered your relationship with HP during our big data SV. That was exciting. Good to see John Schroeder, big, very impressive team. I'm impressed with map. I will always have been. You guys have Stephanie kept your knitting saved. Are you going to do, and again, leading the big data space, um, and again, not proprietary is a very key word and that's really cool. So thanks for coming on. Like you really appreciate Steve. We'll be right back. This is the cube live in Las Vegas, extracting the city from the noise with map bar here at the HP discover 2014. We'll be right back here for the short break.

Published Date : Jun 12 2014

SUMMARY :

Discover 2014 brought to you by HP. Uh, we talked to you guys at the dupe summit last week. So, um, you know, the stats You guys got, you got your heads down as well. and it's got all the open source packages directly within it, but where we really innovate is in the course. And you guys put stuff on top of that, But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity So it's this hybrid solution where you have full SQL How so I asked you in the big data landscape, everyone's Bucher, So we have a lot of agility in the cloud if you will. into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you So you can build a project in hive or some What's the big, uh, product marketing roadmap that you can So I think now over time, you know, Hadoop has sort of been this batch analytic Let me say that before. And over five, the customers which got the data, you guys do summit graduations,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
John SchroederPERSON

0.99+

Steve WoolwichPERSON

0.99+

StevePERSON

0.99+

Jack NorrisPERSON

0.99+

HPORGANIZATION

0.99+

John FrodoPERSON

0.99+

threeQUANTITY

0.99+

80%QUANTITY

0.99+

Steve WooledgePERSON

0.99+

50%QUANTITY

0.99+

John furrierPERSON

0.99+

Las VegasLOCATION

0.99+

Matt BARRPERSON

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

StephaniePERSON

0.99+

30QUANTITY

0.99+

300%QUANTITY

0.99+

firstQUANTITY

0.99+

last weekDATE

0.99+

AsterORGANIZATION

0.99+

2008DATE

0.98+

Q1DATE

0.98+

Las Vegas, NevadaLOCATION

0.98+

one productQUANTITY

0.98+

34 employeesQUANTITY

0.98+

one systemQUANTITY

0.98+

evolvableORGANIZATION

0.98+

over fiveQUANTITY

0.97+

SQLTITLE

0.97+

three big whalesQUANTITY

0.97+

MapReduceORGANIZATION

0.96+

SiliconANGLEORGANIZATION

0.96+

first 12 monthsQUANTITY

0.95+

Apache MahalORGANIZATION

0.95+

map mapORGANIZATION

0.95+

over 200QUANTITY

0.95+

24OTHER

0.94+

todayDATE

0.94+

IntelORGANIZATION

0.92+

MattPERSON

0.92+

SalesforceORGANIZATION

0.91+

2014DATE

0.9+

ImpalaTITLE

0.9+

HadoopORGANIZATION

0.89+

HP VerticaORGANIZATION

0.89+

map barORGANIZATION

0.89+

HadoopTITLE

0.86+

one companyQUANTITY

0.85+

dupe summitEVENT

0.84+

about a month agoDATE

0.83+

BucherPERSON

0.81+

Discover 2014EVENT

0.78+

first week of mayDATE

0.77+

Apache drillORGANIZATION

0.74+

#HPDiscoverORGANIZATION

0.73+

MapboxTITLE

0.73+

2013DATE

0.72+

SQL onTITLE

0.7+

art technologiesORGANIZATION

0.63+

ApacheORGANIZATION

0.61+

Brett Rudenstein - Hadoop Summit 2014 - theCUBE - #HadoopSummit


 

the cube and hadoop summit 2014 is brought to you by anchor sponsor Hortonworks we do have do and headline sponsor when disco we make hadoop invincible okay welcome back and when we're here at the dupe summit live is looking valance the cube our flagship program we go out to the events expect a signal from noise i'm john per year but Jeff Rick drilling down on the topics we're here with wind disco welcome welcome Brett room Stein about senior director tell us what's going on for you guys I'll see you at big presence here so all the guys last night you guys have a great great booth so causing and the crew what's happening yeah I mean the show is going is going very well what's really interesting is we have a lot of very very technical individuals approaching us they're asking us you know some of the tougher more technical in-depth questions about how our consensus algorithm is able to do all this distributor replication which is really great because there's a little bit of disbelief and then of course we get to do the demonstration for them and then suspend disbelief if you will and and I think the the attendance has been great for our brief and okay I always get that you always we always have the geek conversations you guys are a very technical company Jeff and I always comment certainly de volada and Jeff Kelly that you know when disco doesn't has has their share pair of geeks and that dudes who know they're talking about so I'm sure you get that but now them in the business side you talk to customers I want to get into more the outcome that seems to be the show focused this year is a dupe of serious what are some of the outcomes then your customers are talking about when they get you guys in there what are their business issues what are they tore what are they working on to solve yeah I mean I think the first thing is to look at you know why they're looking at us and then and then with the particular business issues that we solve and the first thing and sort of the trend that we're starting to see is the prospects and the customers that we have are looking at us because of the data that they have and its data that matters so it's important data and that's when people start to come to is that's when they look to us as they have data that's very important to them in some cases if you saw some of the UCI stuff you see that the data is you know doing live monitoring of various you know patient activity where it's not just about about about a life and monitoring a life but potentially about saving the life and systems that go down not only can't save lives but they can potentially lose them so you have a demos you want to jump into this demo here what is this all about you know the demo that the demonstration that I'm going to do for you today is I want to show you our non-stop a new product i'm going to show you how we can basically stand up a single HDFS or a single Hadoop cluster across multiple data centers and I think that's one of the tough things that people are really having trouble getting their heads wrapped around because most people when they do multi data center Hadoop they tend to do two different clusters and then synchronize the data between the two of them the way they do that is they'll use you know flume or they'll use some form of parallel ingest they'll use technologies like dis CP to copy data between the data centers and each one of those has sort of an administrative burden on them and then some various flaws in their and their underlying architecture that don't allow them to do a really really detailed job as ensuring that all blocks are replicated properly that no mistakes are ever made and again there's the administrative burden you know somebody who always has to have eyes in the system we alleviate all those things so I think the first thing I want to start off with we had somebody come to our booth and we were talking about this consensus algorithm that we that we perform and the way we synchronize multiple name nodes across multiple geographies and and again and that sort of spirit of disbelief I said you know one of the key tenants of our application is it doesn't underlie it doesn't change the behavior of the application when you go from land scope to win scope and so I said for example if you create a file in one data center and 3,000 miles apart or 7,000 miles apart from that you were to hit the same create file operation you would expect that the right thing happens what somebody gets the file created and somebody gets file already exists even if at 7,000 miles distance they both hit this button at the exact same time I'm going to do a very quick demonstration of that for you here I'm going to put a file into HDFS the my top right-hand window is in Northern Virginia and then 3,000 miles distance from that my bottom right-hand window is in Oregon I'm going to put the etsy hosts file into a temp directory in Hadoop at the exact same time 3,000 miles distance apart and you'll see that exact behavior so I've just launched them both and again if you look at the top window the file is created if you look at the bottom window it says file already exists it's exactly what you'd expect a land scope up a landscape application and the way you'd expect it to behave so that is how we are ensure consistency and that was the question that the prospect has at that distance even the speed of light takes a little time right so what are some of the tips and tricks you can share this that enable you guys to do this well one of the things that we're doing is where our consensus algorithm is a majority quorum based algorithm it's based off of a well-known consensus algorithm called paxos we have a number of significant enhancements innovations beyond that dynamic memberships you know automatic scale and things of that nature but in this particular case every transaction that goes into our system gets a global sequence number and what we're able to do is ensure that those sequence numbers are executed in the correct order so you can't create you know you can't put a delete before a create you know everything has to happen in the order that it actually happened occurred in regardless of the UN distance between data centers so what is the biggest aha moment you get from customer you show them the demo is it is that the replication is availability what is the big big feature focus that they jump on yeah I think I think the biggest ones are basically when we start crashing nodes well we're running jobs we separate the the link between the win and maybe maybe I'll just do that for you now so let's maybe kick into the demonstration here what I have here is a single HDFS cluster it is spanning two geographic territory so it's one cluster in Northern Virginia part of it and the other part is in Oregon I'm going to drill down into the graphing application here and inside you see all of the name notes so you see I have three name nodes running in Virginia three name nodes running in Oregon and the demonstration is as follows I'm going to I'm going to run Terrigen and Terra sort so in other words i'm going to create some data in the cluster I'm then going to go to sort it into a total order and then I'm going to run Tara validate in the alternate data center and prove that all the blocks replicated from one side to the other however along the way I'm going to create some failures I am going to kill some of that active name nodes during this replication process i am going to shut down the when link between the two data centers during the replication paris's and then show you how we heal from from those kinds of conditions because our algorithm treats failure is a first class citizen so there's really no way to deal in the system if you will so let's start unplug John I'm active the local fails so let's go ahead and run the Terrigen in the terrorists or I'm going to put it in the directory called cube one so we're creating about 400 megabytes of data so a fairly small set that we're going to replicate between the two data centers now the first thing that you see over here on the right-hand side is that all of these name nodes kind of sprung to life that is because in an active active configuration with multiple name nodes clients actually load balance their requests across all of them also it's a synchronous namespace so any change that I make to one immediately Curzon immediately occurs on all of them the next thing you might notice in the graphing application is these blue lines over and only in the Oregon data center the blue lines essentially represent what we call a foreign block a block that is not yet made its way across the wide area network from the site of ingest now we move these blocks asynchronously from the site of in jeff's oh that I have land speed performance in fact you can see I just finished the Terrigen part of the application all at the same time pushing data across the wide area network as fast as possible now as we start to get into the next phase of the application here which is going to run terrace sort i'm going to start creating some failures in the environment so the first thing I'm going to do is want to pick two named nodes I'm going to fail a local named node and then we're also going to fail a remote name node so let's pick one of these i'm going to pick HD p 2 is the name of the machine so want to do ssh hd2 and i'm just going to reboot that machine so as I hit the reboot button the next time the graphing application updates what you'll notice here in the monitor is that a flat line so it's no longer taking any data in but if you're watching the application on the right hand side there's no interruption of the service the application is going to continue to run and you'd expect that to happen maybe in land scope cluster but remember this is a single cluster a twin scope with 3,000 miles between the two of them so I've killed one of the six active named nodes the next thing I'm going to do is kill one of the name nodes over in the Oregon data center so I'm going to go ahead and ssh into i don't know let's pick the let's pick the bottom one HTTP nine in this case and then again another reboot operation so I've just rebooted two of the six name nose while running the job but if again if you look in the upper right-hand corner the job running in Oregon kajabi running in North Virginia continues without any interruption and see we just went from 84 to eighty eight percent MapReduce and so forth so again uninterruptedly like to call continuous availability at when distances you are playing that what does continuous availability and wins because that's really important drill down on yeah I mean I think if you look at the difference between what people traditionally call high availability that means that generally speaking the system is there there is a very short time that the system will be unavailable and then it will then we come available again a continuously available system ensures that regardless of the failures that happen around it the system is always up and running something is able to take the request and in a leaderless system like ours where no one single node actually it actually creates a leadership role we're able to continue replication we're and we're also able to continue the coordinator that's two distinct is high availability which everyone kind of know was in loves expensive and then continues availability which is a little bit kind of a the Sun or cousin I guess you know saying can you put in context and cost implementation you know from a from a from a from a perspective of a when disco deployment it's kind of a continuously available system even though people look at us as somewhat traditional disaster recovery because we are replicating data to another data center but remember it's active active that means both data centers are able to write at the same time you have you get to maximize your cluster resources and again if we go back to one of the first questions you asked what are what a customer's doing this with this what a prospects want to do they want to maximize their resource investment if they have half a million dollars sitting in another data center that only is able to perform an emergency recovery situation that means they either have to a scale the primary data center or be what they want to do is utilize existing resource in an active active configuration which is why i say continuous availability they're able to do that in both data centers maximizing all their resource so you versus the consequences of not having that would be the consequences of not being able to do that is you have a one-way synchronization a disaster occurs you then have to bring that data center online you have to make sure that all the appropriate resources are there you have to you have an administrative burden that means a lot of people have to go into action very quickly with the win disco systems right what that would look like I mean with time effort cost and you have any kind of order of magnitude spec like a gay week called some guy upside dude get in the office login you have to look at individual customer service level agreements a number that i hear thrown out very very often is about 16 hours we can be back online within 16 hours really RTO 44 when disco deployment is essentially zero because both sites are active you're able to essentially continue without without any doubt some would say some would say that's contingent availability is high available because essentially zero 16 that's 16 hours I mean any any time down bad but 16 hours is huge yeah that's the service of level agreement then everyone says but we know we can do it in five hours the other of course the other part of that is of course ensuring that once a year somebody runs through the emergency configure / it you know procedure to know that they truly can be back up in line in the service level agreement timeframe so again there's a tremendous amount of effort that goes into the ongoing administrating some great comments here on our crowd chatter out chat dot net / hadoop summit joined the conversation i'll see ya we have one says nice he's talking about how the system has latency a demo is pretty cool the map was excellent excellent visual dave vellante just weighed in and said he did a survey with Jeff Kelly said large portion twenty-seven percent of respondents said lack of enterprises great availability was the biggest barriers to adoption is this what you're referring to yeah this is this is exactly what we're seeing you know people are not able to meet the uptime requirements and therefore applications stay in proof-of-concept mode or those that make it out of proof of concept are heavily burdened by administrators and a large team to ensure that same level of uptime that can be handled without error through software configuration like Linda scope so another comment from Burt thanks Burt for watching there's availability how about security yeah so security is a good one of course we are you know we run on standard dupe distributions and as such you know if you want to run your cluster with on wire encryption that's okay if you want to run your cluster with kerberos authentication that's fine we we fully support those environments got a new use case for crowd chapel in the questions got more more coming in so send them in we're watching the crowd chat slep net / hadoop summit great questions and a lot of people aren't i think people have a hard time partial eh eh versus continues availability because you can get confused between the two is it semantics or is it infrastructure concerns what is what is the how do you differentiate between those two definitions me not I think you know part of it is semantics but but but also from a win disco perspective we like to differentiate because there really isn't that that moment of downtime there is there really isn't that switch over moment where something has to fail over and then go somewhere else that's why I use that word continuous availability the system is able to simply continue operating by clients load balancing their requests to available nodes in a similar fashion when you have multiple data centers as I do here I'm able to continue operations simply by running the jobs in the alternate data center remember that it's active active so any data ingest on one side immediately transfers to the other so maybe let me do the the next part I showed you one failure scenario you've seen all the nodes have actually come back online and self healed the next part of this I want to do an separation I want to run it again so let me kick up kick that off when I would create another directory structure here only this time I'm going to actually chop the the network link between the two data centers and then after I do that I'm going to show you some some of our new products in the works give you a demonstration of that as well well that's far enough Britain what are some of the applications that that this enables people to use the do for that they were afraid to before well I think it allows you know when we look at our you know our customer base and our prospects who are evaluating our technologies it opens up all the all the regulated industries you know things like pharmaceutical companies financial services companies healthcare companies all these people who have strict regulations auditing requirements and now have a very clear concise way to not only prove that they're replicating data that data has actually made its way it can prove that it's in both locations that it's not just in both locations that it's the correct data sometimes we see in the cases of like dis CP copying files between data centers where the file isn't actually copied because it thinks it's the same but there is a slight difference between the two when the cluster diverges like that it's days of administration hour depending on the size of the cluster to actually to put the cluster you know to figure out what went wrong what went different and then of course you have to involve multiple users to figure out which one of the two files that you have is the correct one to keep so let me go ahead and stop the van link here of course with LuAnn disco technology there's nothing to keep track of you simply allow the system to do HDFS replication because it is essentially native HDFS so I've stopped the tunnel between the two datacenters while running this job one of the things that you're going to see on the left-hand size it looks like all the notes no longer respond of course that's just I have no visibility to those nodes there's no longer replicating any data because the the tunnel between the two has been shut down but if you look on the right hand side of the application the upper right-hand window of course you see that the MapReduce job is still running it's unaffected and what's interesting is once I start replicating the data again or once i should say once i start the tunnel up again between the two data centers i'll immediately start replicating data this is at the block level so again when we look at other copy technologies they are doing things of the file level so if you had a large file and it was 10 gigabytes in size and for some reason you know your your file crash but in that in that time you and you were seventy percent through your starting that whole transfer again because we're doing block replication if you had seventy percent of your box that had already gone through like perhaps what I've done here when i start the tunnel backup which i'm going to do now what's going to happen of course is we just continue from those blocks that simply haven't made their way across the net so i've started the tunnel back up the monitor you'll see springs back to life all the name nodes will have to resync that they've been out of sync for some period of time they'll learn any transactions that they missed they'll be they'll heal themselves into the cluster and we immediately start replicating blocks and then to kind of show you the bi-directional nature of this I'm going to run Tara validate in the opposite data center over in Oregon and I'll just do it on that first directory that we created and in what you'll see is that we now wind up with foreign blocks in both sides I'm running applications at the same time across datacenters fully active active configuration in a single Hadoop cluster okay so the question is on that one what is the net net summarized that demo reel quick bottom line in two sentences is that important bottom line is if name notes fail if the wind fails you are still continuously operational okay so we have questions from the commentary here from the crowd chat does this eliminate the need for backup and what is actually transferring certainly not petabytes of data ? I mean you somewhat have to transfer what what's important so if it's important for you to I suppose if it was important for you to transfer a petabyte of data then you would need the bandwidth that support I transfer of a petabyte of data but we are to a lot of Hollywood studios we were at OpenStack summit that was a big concern a lot of people are moving to the cloud for you know for workflow and for optimization Star Wars guys were telling us off the record that no the new film is in remote locations they set up data centers basically in the desert and they got actually provisioned infrastructure so huge issues yeah absolutely so what we're replicating of course is HDFS in this particular case I'm replicating all the data in this fairly small cluster between the two sites or in this case this demo is only between two sites I could add a third site and then a failure between any two would actually still allow complete you know complete availability of all the other sites that still participate in the algorithm Brent great to have you on I want to get the perspective from you in the trenches out in customers what's going on and win disco tell us what the culture there what's going on the company what's it like to work there what's the guys like I mean we we know some of the dudes there cause we always drink some vodka with him because you know likes to tip back a little bit once in a while but like great guy great geeks but like what's what's it like it when disco I think the first you know you touched on a little piece of it at first is there are a lot of smart people at windows go in fact I know when I first came on board I was like wow I'm probably the most unsmoked person at this company but culturally this is a great group of guys they like to work very hard but equally they like to play very hard and as you said you know I've been out with cause several times myself these are all great guys to be out with the culture is great it's a it's a great place to work and you know so you know people who are who are interested should certainly yeah great culture and it fits in we were talking last night very social crowd here you know something with a Hortonworks guide so javi medicate fortress ada just saw him walk up ibm's here people are really sociable this event is really has a camaraderie feel to it but yet it's serious business and you didn't the days they're all a bunch of geeks building in industry and now it's got everyone's attention Cisco's here in Intel's here IBM's here I mean what's your take on the big guys coming in I mean I think the big guys realize that that Hadoop is is is the elephant is as large as it appears elephant is in the room and exciting and it's and everybody wants a little piece of it as well they should want a piece of it Brett thanks for coming on the cube really appreciate when discs are you guys a great great company we love to have them your support thanks for supporting the cube we appreciate it we right back after this short break with our next guest thank you

Published Date : Jun 4 2014

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

EntityCategoryConfidence
two sitesQUANTITY

0.99+

Jeff KellyPERSON

0.99+

seventy percentQUANTITY

0.99+

OregonLOCATION

0.99+

two sitesQUANTITY

0.99+

Jeff KellyPERSON

0.99+

3,000 milesQUANTITY

0.99+

VirginiaLOCATION

0.99+

Jeff RickPERSON

0.99+

BurtPERSON

0.99+

84QUANTITY

0.99+

Northern VirginiaLOCATION

0.99+

North VirginiaLOCATION

0.99+

twoQUANTITY

0.99+

five hoursQUANTITY

0.99+

3,000 milesQUANTITY

0.99+

7,000 milesQUANTITY

0.99+

two data centersQUANTITY

0.99+

BrettPERSON

0.99+

Star WarsTITLE

0.99+

10 gigabytesQUANTITY

0.99+

half a million dollarsQUANTITY

0.99+

16 hoursQUANTITY

0.99+

Brett RudensteinPERSON

0.99+

JeffPERSON

0.99+

both locationsQUANTITY

0.99+

two sentencesQUANTITY

0.99+

two filesQUANTITY

0.99+

IBMORGANIZATION

0.99+

two datacentersQUANTITY

0.99+

two data centersQUANTITY

0.99+

oneQUANTITY

0.99+

two different clustersQUANTITY

0.99+

both sidesQUANTITY

0.99+

both sitesQUANTITY

0.99+

first directoryQUANTITY

0.98+

third siteQUANTITY

0.98+

first thingQUANTITY

0.98+

firstQUANTITY

0.98+

CiscoORGANIZATION

0.98+

twenty-seven percentQUANTITY

0.98+

JohnPERSON

0.98+

first thingQUANTITY

0.98+

one sideQUANTITY

0.97+

BritainLOCATION

0.97+

todayDATE

0.97+

two definitionsQUANTITY

0.97+

OpenStackEVENT

0.96+

HortonworksORGANIZATION

0.96+

eighty eight percentQUANTITY

0.96+

last nightDATE

0.96+

both data centersQUANTITY

0.94+

each oneQUANTITY

0.94+

zeroQUANTITY

0.94+

once a yearQUANTITY

0.94+

one failureQUANTITY

0.93+

the cube and hadoop summit 2014EVENT

0.93+

two geographic territoryQUANTITY

0.93+

IntelORGANIZATION

0.92+

bothQUANTITY

0.92+

singleQUANTITY

0.92+

this yearDATE

0.91+

one data centerQUANTITY

0.91+

dupe summitEVENT

0.9+

Brett room SteinPERSON

0.9+