John Hodgson, Optum Technology - Red Hat Summit 2017

>> (Narrator) Live, from Boston, Massachusetts it's theCUBE, covering Red Hat Summit 2017, brought to you by Red Hat. >> Welcome back to Boston everybody, this is Red Hat Summit, and this is theCUBE, the leader in live tech coverage. I'm Dave Vellante, with my cohost Stu Miniman, and John Hodgson is here, he's the Senior Director of IT Program Management at Optum technology. John good to see ya. >> Good, it's good to be here. >> Fresh off the keynote, we were just talking about the large audience, a very large audience here. And Optum, you described a little bit at the keynote what Optum is with healthcare, sort of technology arm. Which is not super common but not uncommon in your world. But describe Optum and where it fits. >> So in the grand scheme of things within UnitedHealth Group you know, we have the parent company, of course, you know the Health Group, our insurance side, that does insurance, whether it's public sector for large corporations, as well as community and state government type work as UnitedHealthcare. They do all that, and then Optum is our technology side. We do really all the development, both for supporting UHC as our main customer, you know, they're truly our focus, but we also do a lot of commercial development as well for UnitedHealthcare's competitors. So big, big group, as I mentioned in the keynote. Over 10,000 developers in the company, lots of spend, I think in the last year our, just internal IT budget was like $1.2 billion in just IT development capital. So it's huge. >> Dave: Mind-boggling. >> John, you've got that internal Optum Cloud, Can you give us just kind of the breadth and depth, you said 1.2 billion, there. What is that make up, what geographies does that span, how many people support that kind of environment? >> As far as numbers of people supporting it, I think we've got a few hundred in our Enterprise Technology Services Group, that supports Optum Cloud. We started Optum Cloud probably a half a dozen years ago, and it's gone through its different iterations. And part of my job right now is all about Enterprise Cloud adoption and migration. So, we started with our own environment, we call it UCI, United, it was supposed to be Converged Infrastructure, but I call it our Cloud Infrastructure, that's really what it is. And we've continued to enhance that. So over the last few years, I think about 3.5, four years ago, we brought in Red Hat and OpenShift. We're on our third iteration of OpenShift. Very, very stable platform for us now. But we also have Azure Stack in there as well, I think even as Paul and those guys mentioned in the keynote there's a lot of different things that you can kind of pull from each one of the technology providers to help support what we're doing, kind of take the best of breed from each one of them, and use them in each solution. >> Organizations are always complaining that they spend all this money on keeping the lights on, and they're trying to make the shift, and obviously Cloud helps them do that, and things like OpenShift, etc. What's that like in your world? How much of your effort is spent on maintenance and keeping the lights on? Sounds like you got a lot of cool, new development activity. Can you describe that dynamic for us? >> Yeah, we've got a really good support staff. Our group, SSMO, when we build an application, they kind of take it back over and run everything. We've got a fabulous support team in the background. And to that end, and it's on both sides, right? We have our UnitedHealthcare applications that we build that have kind of their own feature set, because of what it's doing internally for us, versus what we do on the OptumInsight side, where it's more commercial in nature. So they have some different needs. Some of the things that we're developing, even for Cloud Scaffolding that I mentioned in the keynote. We're kind of working on both sides of the fence, there, to hit the different technologies that each one of them really need to be successful, but doing it in a way that it doesn't if you're on one side of the fence or the other, it's a capability that everybody will be able to use. So if there's a pattern on one side that you want to be able to use for a UHC application, by all means, go ahead and grab it, take it. And a lot of what we're doing now is even kind of crowdsourcing things, and utilizing the really super intelligent people that we have, over 10,000 developers. And so many of them, we've got a lot of legacy stuff. So there's some old-school guys that are still doing their thing, but we've got a lot of new people. And they want to get their hands on the new fresh stuff, and experience that. So there's really a good vibe going on right now, with how things are changing, all the TDP folks that we're bringing in. A lot of fresh college grads and things. And they love to see the new technologies, whether it's OpenShift or whatever. Lot are really getting into DevOps, trying to make that change in a big organization is difficult, we got a little ways to go with that. But that's kind of next up. >> You're an interesting case study, because you've got a lot of the old and a lot of cool innovation going on. And is it, how do you decide when to go, because DevOps is not always the answer. Sometimes waterfall is okay, you know. So, how do you make that determination, and where do you see that going? >> That's a great question, that's actually part of what my team does. So my specific team is all about Cloud adoption and migration, so our charter is really to work across the enterprise. So whether it's OptumInsight, OptumRx, UnitedHealthcare, we are working with them to evaluate their portfolios of applications to figure out legacy applications that we have that are still strategic. They've got life in them, they've got business benefit. And we want to be able to take advantage of that, but at the same time there's some of these monolithic applications that we look at how can we take that application, decompose it down into microservices and APIs, things like that, to make it available to other applications that maybe are just greenfield, are coming out now, but still need that same technology and information. So that's really what my team is doing right now. So we sit down with those teams and go through an analysis, help them develop a road map. And sometimes that road map is two or three years long. Getting to fully cloud from where they're at right now in some of these legacy applications is a journey. And it costs money, right? There's a lot of budget concerns and things like that that go with it. So that's part of what we helped develop is a business case for each one of those applications that we can help support them going back, and getting the necessary capital to do the cloud migrations and the improvements, and really the modernization of their applications. We started the program a couple of years ago and found that if you want to hang your hat on just going from old physical infrastructure, some of the original VMs that we had. And just moving over to cloud infrastructure, and whether that's UCI, OpenShift, Azure, whatever. If you're going to do your business case on that, you're going to be writing a lot of business cases before you get one approved. It's all about modernizing the applications. So if you fold in the move to new infrastructure, cloud infrastructure, along with the ability to modernize that application, get them doing agile development, getting down the DevOps path, looking at automated testing, automated deployment, zero downtime deployments. All of those things, when you add them up together and say, okay, here's what your real benefit looks like. And you're able to present that back to the business, and show them speed to market, speed to value is a new metric that we have. Getting things out there quickly. We used to do quarterly releases, or even biannual releases. And now we're at monthly, weekly, some of our applications that are more relatively new, Health4Me, if you go to the App Store, that's kind of our big app on the App Store. There's updates on a very frequent basis. >> So that's the operating model, really, that you're talking about, essentially, driving business value. We had a practitioner on a couple weeks ago, and he said, "If you just lift and shift to the cloud, "and you don't change your operating model, "you won't get a dime." >> Stu: You're missing the boat. >> Maybe there's something, some value there, a little faster, but you're talking about serious dollars, if you can change the operating model. And that's what you've found? >> Yeah absolutely, and that's the, it's a shift, and you've got to be able to prove it to the business that's there's benefit there, and sometimes that's hard. Some of these cloud concepts and things are a little nebulous, so-- >> It's hard 'cause it's soft. >> It's soft, right, yeah, I mean, you're putting the business case together, the hard stuff is easy to document, but when you're talking about the soft benefits, and you're trying to explain to them the value that they're going to get out of their team switching from a waterfall development over to agile and DevOps, and automated testing and things like that, where I can say, "Hey listen, "you know your team over here that has been, "you know we took them out of the pocket, "from actually doing their day jobs for the last week, "because they needed to test this new version? "If I can take that out of the mix, "and they don't have to do that anymore, "and they can keep on doing what they're doing "and not get a week behind, what value is that for you?" And all of a sudden they're like, "Oh really? "We don't have to do that anymore?" I'm like, "No, we can create test scripts and stuff. "We can automate your deployment. "We can make it zero downtime. "We have," there's an application that we're working on now that has 19,000 individual desktop deployments. And we're going to automate that, turn it into a software as a service application, host it on OpenShift, and completely knock that out. I mean deployments out to 19,000 people take weeks to get done. We only do a couple thousand a week, because there's obviously going to be issues. So now you've got helpdesk tickets, you've got desktop technicians that are going round, trying to fix things, or dialing in, remoting into somebody's desktop to try to help figure that all out. We can do the whole deployment in a day, and everybody logs in the next day, and they've got the new version. That kind of value in creating real cloud-based applications is what's driving the benefit for us. And they're finally starting to really see that. And as we're doing it, more application product owners are going, "Okay, now we're getting some traction. "We heard what you did over here. "Come talk to us, and let's talk "about building a road map and figuring out what we can do." >> John, one of the questions I got from the community after watching you keynote was, they want to understand how you handle security and enforce compliance in this new cloud development model. (laughs) >> That's beyond me, all I can tell you is that we have one of the most secure clouds out there. Our private cloud is beyond secure. We're working right now to try to get the public hybrid cloud space with both AWS and Azure, and working through contracts and stuff right now. But one of the sticking points is our security has to be absolutely top notch, if we're going to do anything that has HIPAA-related data, PHI, PII, PCI, any of that, it has got to be lock-solid secure. And we have a tremendous team led by Robert Booker, he's absolutely fabulous, I mean we're, our whole goal, security-wise, is don't be the next guy on the front page of the Wall Street Journal. >> You mentioned public cloud, how do you make your decisions as to what application, what data can live in which public cloud? You said you've got Azure Stack, and you've got OpenShift. How do you make those platform decisions? >> Well right now, both OpenShift and Azure Stack are on our internal private cloud. So we're in the process of kind of making that shift to move over towards public and hybrid cloud. So I'm working with folks on our team to help develop some of those processes and determine what's actually going to be allowed. And I think in a lot of cases the PHI and protected data is going to stay internal. And we'll be able to take advantage of hosting certain parts of an application on public cloud while keeping other parts of the data really secure and protected behind our private cloud. >> Red Hat made an announcement this morning with AWS, with OpenShift. >> Sounds like that might be of interest to you, would that impact what your doing? >> Absolutely, yeah, in fact I was talking with Jim and Paul back behind the screen this morning. And we were talking about that and I was like wow that is a game changer. With what we're thinking about doing in the hybrid cloud space, having all of the AWS APIs and services and stuff available to us. Part of the objection that I get from some folks now is knowing that we have this move toward public and hybrid cloud internally, and the limitations of our cloud. We're never going to be, our private Optum Cloud is never going to be AWS or Azure, it's just not. I mean they've spent billions of dollars getting those services and stuff in place. Why would we even bother to compete with that? So we do what we do well, and a big portion of that is security. But we want to be able to expand, and take advantage of the things that they have. So that's, this whole announcement of being able to take advantage of those services natively within OpenShift? If we're able to expose that, even internally, on our own private cloud? That's going to take away a lot of the objections, I think, from even our own folks, who are waiting to do the public hybrid cloud piece. >> When the Affordable Care Act hit, did your volume spike? And as things, there's a tug of war now in Washington, it could change again, does that drive changes in your application development in terms of the volume of requests that come in, and compliance things that you have to adhere to? And if so, does having a platform that's more agile, how does that affect your ability to respond? >> Yeah it does, I mean when we first got into the ACA, there was a number of markets that we got into. And there was definitely a ramp-up in development, new things that we had to do on the exchanges. Stuff like that. I mean we even had groups from Optum that were participating directly with the federal government, because some of their exchanges were having issues, and they needed some help from us. So we had a whole team that was kind of embedded with the federal government, helping them out, just based on our experience doing it. And, yeah, having the flexibility, in our own cloud, to be able to able to spin up environments quickly, shut them down, all that, really it's invaluable. >> So the technology business moves so fast, I mean it wasn't that long ago when people saw the first virtualized servers and went Oh my gosh, this is going to change the world. And now it's like, wow we got to do better, and containers. And so you've gone for this amazing transformation, I mean, I think it was 17 developers to 1,600, which is just mind-boggling. Okay, and that's, and you've got technologies that have helped you do that, but five years down the road there's going to be a what's next. So what is next for you? As you break out your telescope, what do you see? >> God, I don't know, I mean I never would have predicted containers. >> Even though they've been around forever, we-- >> Yeah I mean when we first went to VMs, you know back in the day I was a guy in the server room, racking and stacking servers and running cables, and doing all that, so I've seen it go from one extreme to the next. And going from VMs was a huge switch. Building our own private cloud was amazing to be a part of, and now getting into the container side of things, hybrid cloud, I think for us, really, the next big step for us is the hybrid cloud. So we're in the process of getting that, I assume by the end of this year, early next, we'll be a few steps into the hybrid cloud space. And then beyond that, gosh I don't know. >> So that's really extending the operating model into that hybrid cloud notion, bringing that security that you talked about, and that's, you got a lot of work to do. >> John: That's a big task in itself. >> Let's not go too far beyond that, John. Alright well listen, thanks for coming on theCUBE, it was really a pleasure having you. >> Yeah, thanks for having me guys, appreciate it. >> You're welcome, alright keep it right there everybody, Stu and I will be back with our next guest. This is theCUBE, we're live from Red Hat Summit in Boston. We'll be right back. (electronic music)

Published Date : May 3 2017

SUMMARY :

brought to you by Red Hat. and John Hodgson is here, And Optum, you described a little bit at the keynote So in the grand scheme of things within UnitedHealth Group What is that make up, what geographies does that span, of the technology providers to help support and things like OpenShift, etc. Some of the things that we're developing, and where do you see that going? and really the modernization of their applications. So that's the operating model, really, And that's what you've found? and you've got to be able to prove it to the business "If I can take that out of the mix, John, one of the questions I got from the community of the Wall Street Journal. How do you make those platform decisions? and protected data is going to stay internal. with AWS, with OpenShift. and take advantage of the things that they have. So we had a whole team that was kind of embedded So the technology business moves so fast, God, I don't know, I mean I never and now getting into the container side of things, So that's really extending the operating model it was really a pleasure having you. Stu and I will be back with our next guest.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
UnitedHealthcare	ORGANIZATION	0.99+
John Hodgson	PERSON	0.99+
UnitedHealth Group	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Dave	PERSON	0.99+
UHC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Robert Booker	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Paul	PERSON	0.99+
$1.2 billion	QUANTITY	0.99+
Affordable Care Act	TITLE	0.99+
Health Group	ORGANIZATION	0.99+
17 developers	QUANTITY	0.99+
19,000 people	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
Stu	PERSON	0.99+
Jim	PERSON	0.99+
App Store	TITLE	0.99+
Washington	LOCATION	0.99+
1.2 billion	QUANTITY	0.99+
three years	QUANTITY	0.99+
1,600	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
both sides	QUANTITY	0.99+
four years ago	DATE	0.99+
Boston	LOCATION	0.99+
last year	DATE	0.99+
Optum	ORGANIZATION	0.99+
Enterprise Technology Services Group	ORGANIZATION	0.99+
Red Hat Summit	EVENT	0.99+
one	QUANTITY	0.99+
Red Hat Summit 2017	EVENT	0.98+
both	QUANTITY	0.98+
each solution	QUANTITY	0.98+
half a dozen years ago	DATE	0.98+
Azure Stack	TITLE	0.98+
over 10,000 developers	QUANTITY	0.98+
Over 10,000 developers	QUANTITY	0.97+
DevOps	TITLE	0.97+
last week	DATE	0.97+
third iteration	QUANTITY	0.97+
first	QUANTITY	0.96+
ACA	TITLE	0.96+
OpenShift	TITLE	0.96+
HIPAA	TITLE	0.96+
five years	QUANTITY	0.96+
Optum Technology	ORGANIZATION	0.95+
each one	QUANTITY	0.95+
federal government	ORGANIZATION	0.94+
one side	QUANTITY	0.94+
next day	DATE	0.94+
billions of dollars	QUANTITY	0.94+
UCI	ORGANIZATION	0.93+
SSMO	ORGANIZATION	0.93+
this morning	DATE	0.92+
a day	QUANTITY	0.92+
couple weeks ago	DATE	0.92+
Azure	TITLE	0.91+
couple of years ago	DATE	0.91+
Optum Cloud	TITLE	0.89+

Brett Rudenstein - Hadoop Summit 2014 - theCUBE - #HadoopSummit

the cube and hadoop summit 2014 is brought to you by anchor sponsor Hortonworks we do have do and headline sponsor when disco we make hadoop invincible okay welcome back and when we're here at the dupe summit live is looking valance the cube our flagship program we go out to the events expect a signal from noise i'm john per year but Jeff Rick drilling down on the topics we're here with wind disco welcome welcome Brett room Stein about senior director tell us what's going on for you guys I'll see you at big presence here so all the guys last night you guys have a great great booth so causing and the crew what's happening yeah I mean the show is going is going very well what's really interesting is we have a lot of very very technical individuals approaching us they're asking us you know some of the tougher more technical in-depth questions about how our consensus algorithm is able to do all this distributor replication which is really great because there's a little bit of disbelief and then of course we get to do the demonstration for them and then suspend disbelief if you will and and I think the the attendance has been great for our brief and okay I always get that you always we always have the geek conversations you guys are a very technical company Jeff and I always comment certainly de volada and Jeff Kelly that you know when disco doesn't has has their share pair of geeks and that dudes who know they're talking about so I'm sure you get that but now them in the business side you talk to customers I want to get into more the outcome that seems to be the show focused this year is a dupe of serious what are some of the outcomes then your customers are talking about when they get you guys in there what are their business issues what are they tore what are they working on to solve yeah I mean I think the first thing is to look at you know why they're looking at us and then and then with the particular business issues that we solve and the first thing and sort of the trend that we're starting to see is the prospects and the customers that we have are looking at us because of the data that they have and its data that matters so it's important data and that's when people start to come to is that's when they look to us as they have data that's very important to them in some cases if you saw some of the UCI stuff you see that the data is you know doing live monitoring of various you know patient activity where it's not just about about about a life and monitoring a life but potentially about saving the life and systems that go down not only can't save lives but they can potentially lose them so you have a demos you want to jump into this demo here what is this all about you know the demo that the demonstration that I'm going to do for you today is I want to show you our non-stop a new product i'm going to show you how we can basically stand up a single HDFS or a single Hadoop cluster across multiple data centers and I think that's one of the tough things that people are really having trouble getting their heads wrapped around because most people when they do multi data center Hadoop they tend to do two different clusters and then synchronize the data between the two of them the way they do that is they'll use you know flume or they'll use some form of parallel ingest they'll use technologies like dis CP to copy data between the data centers and each one of those has sort of an administrative burden on them and then some various flaws in their and their underlying architecture that don't allow them to do a really really detailed job as ensuring that all blocks are replicated properly that no mistakes are ever made and again there's the administrative burden you know somebody who always has to have eyes in the system we alleviate all those things so I think the first thing I want to start off with we had somebody come to our booth and we were talking about this consensus algorithm that we that we perform and the way we synchronize multiple name nodes across multiple geographies and and again and that sort of spirit of disbelief I said you know one of the key tenants of our application is it doesn't underlie it doesn't change the behavior of the application when you go from land scope to win scope and so I said for example if you create a file in one data center and 3,000 miles apart or 7,000 miles apart from that you were to hit the same create file operation you would expect that the right thing happens what somebody gets the file created and somebody gets file already exists even if at 7,000 miles distance they both hit this button at the exact same time I'm going to do a very quick demonstration of that for you here I'm going to put a file into HDFS the my top right-hand window is in Northern Virginia and then 3,000 miles distance from that my bottom right-hand window is in Oregon I'm going to put the etsy hosts file into a temp directory in Hadoop at the exact same time 3,000 miles distance apart and you'll see that exact behavior so I've just launched them both and again if you look at the top window the file is created if you look at the bottom window it says file already exists it's exactly what you'd expect a land scope up a landscape application and the way you'd expect it to behave so that is how we are ensure consistency and that was the question that the prospect has at that distance even the speed of light takes a little time right so what are some of the tips and tricks you can share this that enable you guys to do this well one of the things that we're doing is where our consensus algorithm is a majority quorum based algorithm it's based off of a well-known consensus algorithm called paxos we have a number of significant enhancements innovations beyond that dynamic memberships you know automatic scale and things of that nature but in this particular case every transaction that goes into our system gets a global sequence number and what we're able to do is ensure that those sequence numbers are executed in the correct order so you can't create you know you can't put a delete before a create you know everything has to happen in the order that it actually happened occurred in regardless of the UN distance between data centers so what is the biggest aha moment you get from customer you show them the demo is it is that the replication is availability what is the big big feature focus that they jump on yeah I think I think the biggest ones are basically when we start crashing nodes well we're running jobs we separate the the link between the win and maybe maybe I'll just do that for you now so let's maybe kick into the demonstration here what I have here is a single HDFS cluster it is spanning two geographic territory so it's one cluster in Northern Virginia part of it and the other part is in Oregon I'm going to drill down into the graphing application here and inside you see all of the name notes so you see I have three name nodes running in Virginia three name nodes running in Oregon and the demonstration is as follows I'm going to I'm going to run Terrigen and Terra sort so in other words i'm going to create some data in the cluster I'm then going to go to sort it into a total order and then I'm going to run Tara validate in the alternate data center and prove that all the blocks replicated from one side to the other however along the way I'm going to create some failures I am going to kill some of that active name nodes during this replication process i am going to shut down the when link between the two data centers during the replication paris's and then show you how we heal from from those kinds of conditions because our algorithm treats failure is a first class citizen so there's really no way to deal in the system if you will so let's start unplug John I'm active the local fails so let's go ahead and run the Terrigen in the terrorists or I'm going to put it in the directory called cube one so we're creating about 400 megabytes of data so a fairly small set that we're going to replicate between the two data centers now the first thing that you see over here on the right-hand side is that all of these name nodes kind of sprung to life that is because in an active active configuration with multiple name nodes clients actually load balance their requests across all of them also it's a synchronous namespace so any change that I make to one immediately Curzon immediately occurs on all of them the next thing you might notice in the graphing application is these blue lines over and only in the Oregon data center the blue lines essentially represent what we call a foreign block a block that is not yet made its way across the wide area network from the site of ingest now we move these blocks asynchronously from the site of in jeff's oh that I have land speed performance in fact you can see I just finished the Terrigen part of the application all at the same time pushing data across the wide area network as fast as possible now as we start to get into the next phase of the application here which is going to run terrace sort i'm going to start creating some failures in the environment so the first thing I'm going to do is want to pick two named nodes I'm going to fail a local named node and then we're also going to fail a remote name node so let's pick one of these i'm going to pick HD p 2 is the name of the machine so want to do ssh hd2 and i'm just going to reboot that machine so as I hit the reboot button the next time the graphing application updates what you'll notice here in the monitor is that a flat line so it's no longer taking any data in but if you're watching the application on the right hand side there's no interruption of the service the application is going to continue to run and you'd expect that to happen maybe in land scope cluster but remember this is a single cluster a twin scope with 3,000 miles between the two of them so I've killed one of the six active named nodes the next thing I'm going to do is kill one of the name nodes over in the Oregon data center so I'm going to go ahead and ssh into i don't know let's pick the let's pick the bottom one HTTP nine in this case and then again another reboot operation so I've just rebooted two of the six name nose while running the job but if again if you look in the upper right-hand corner the job running in Oregon kajabi running in North Virginia continues without any interruption and see we just went from 84 to eighty eight percent MapReduce and so forth so again uninterruptedly like to call continuous availability at when distances you are playing that what does continuous availability and wins because that's really important drill down on yeah I mean I think if you look at the difference between what people traditionally call high availability that means that generally speaking the system is there there is a very short time that the system will be unavailable and then it will then we come available again a continuously available system ensures that regardless of the failures that happen around it the system is always up and running something is able to take the request and in a leaderless system like ours where no one single node actually it actually creates a leadership role we're able to continue replication we're and we're also able to continue the coordinator that's two distinct is high availability which everyone kind of know was in loves expensive and then continues availability which is a little bit kind of a the Sun or cousin I guess you know saying can you put in context and cost implementation you know from a from a from a from a perspective of a when disco deployment it's kind of a continuously available system even though people look at us as somewhat traditional disaster recovery because we are replicating data to another data center but remember it's active active that means both data centers are able to write at the same time you have you get to maximize your cluster resources and again if we go back to one of the first questions you asked what are what a customer's doing this with this what a prospects want to do they want to maximize their resource investment if they have half a million dollars sitting in another data center that only is able to perform an emergency recovery situation that means they either have to a scale the primary data center or be what they want to do is utilize existing resource in an active active configuration which is why i say continuous availability they're able to do that in both data centers maximizing all their resource so you versus the consequences of not having that would be the consequences of not being able to do that is you have a one-way synchronization a disaster occurs you then have to bring that data center online you have to make sure that all the appropriate resources are there you have to you have an administrative burden that means a lot of people have to go into action very quickly with the win disco systems right what that would look like I mean with time effort cost and you have any kind of order of magnitude spec like a gay week called some guy upside dude get in the office login you have to look at individual customer service level agreements a number that i hear thrown out very very often is about 16 hours we can be back online within 16 hours really RTO 44 when disco deployment is essentially zero because both sites are active you're able to essentially continue without without any doubt some would say some would say that's contingent availability is high available because essentially zero 16 that's 16 hours I mean any any time down bad but 16 hours is huge yeah that's the service of level agreement then everyone says but we know we can do it in five hours the other of course the other part of that is of course ensuring that once a year somebody runs through the emergency configure / it you know procedure to know that they truly can be back up in line in the service level agreement timeframe so again there's a tremendous amount of effort that goes into the ongoing administrating some great comments here on our crowd chatter out chat dot net / hadoop summit joined the conversation i'll see ya we have one says nice he's talking about how the system has latency a demo is pretty cool the map was excellent excellent visual dave vellante just weighed in and said he did a survey with Jeff Kelly said large portion twenty-seven percent of respondents said lack of enterprises great availability was the biggest barriers to adoption is this what you're referring to yeah this is this is exactly what we're seeing you know people are not able to meet the uptime requirements and therefore applications stay in proof-of-concept mode or those that make it out of proof of concept are heavily burdened by administrators and a large team to ensure that same level of uptime that can be handled without error through software configuration like Linda scope so another comment from Burt thanks Burt for watching there's availability how about security yeah so security is a good one of course we are you know we run on standard dupe distributions and as such you know if you want to run your cluster with on wire encryption that's okay if you want to run your cluster with kerberos authentication that's fine we we fully support those environments got a new use case for crowd chapel in the questions got more more coming in so send them in we're watching the crowd chat slep net / hadoop summit great questions and a lot of people aren't i think people have a hard time partial eh eh versus continues availability because you can get confused between the two is it semantics or is it infrastructure concerns what is what is the how do you differentiate between those two definitions me not I think you know part of it is semantics but but but also from a win disco perspective we like to differentiate because there really isn't that that moment of downtime there is there really isn't that switch over moment where something has to fail over and then go somewhere else that's why I use that word continuous availability the system is able to simply continue operating by clients load balancing their requests to available nodes in a similar fashion when you have multiple data centers as I do here I'm able to continue operations simply by running the jobs in the alternate data center remember that it's active active so any data ingest on one side immediately transfers to the other so maybe let me do the the next part I showed you one failure scenario you've seen all the nodes have actually come back online and self healed the next part of this I want to do an separation I want to run it again so let me kick up kick that off when I would create another directory structure here only this time I'm going to actually chop the the network link between the two data centers and then after I do that I'm going to show you some some of our new products in the works give you a demonstration of that as well well that's far enough Britain what are some of the applications that that this enables people to use the do for that they were afraid to before well I think it allows you know when we look at our you know our customer base and our prospects who are evaluating our technologies it opens up all the all the regulated industries you know things like pharmaceutical companies financial services companies healthcare companies all these people who have strict regulations auditing requirements and now have a very clear concise way to not only prove that they're replicating data that data has actually made its way it can prove that it's in both locations that it's not just in both locations that it's the correct data sometimes we see in the cases of like dis CP copying files between data centers where the file isn't actually copied because it thinks it's the same but there is a slight difference between the two when the cluster diverges like that it's days of administration hour depending on the size of the cluster to actually to put the cluster you know to figure out what went wrong what went different and then of course you have to involve multiple users to figure out which one of the two files that you have is the correct one to keep so let me go ahead and stop the van link here of course with LuAnn disco technology there's nothing to keep track of you simply allow the system to do HDFS replication because it is essentially native HDFS so I've stopped the tunnel between the two datacenters while running this job one of the things that you're going to see on the left-hand size it looks like all the notes no longer respond of course that's just I have no visibility to those nodes there's no longer replicating any data because the the tunnel between the two has been shut down but if you look on the right hand side of the application the upper right-hand window of course you see that the MapReduce job is still running it's unaffected and what's interesting is once I start replicating the data again or once i should say once i start the tunnel up again between the two data centers i'll immediately start replicating data this is at the block level so again when we look at other copy technologies they are doing things of the file level so if you had a large file and it was 10 gigabytes in size and for some reason you know your your file crash but in that in that time you and you were seventy percent through your starting that whole transfer again because we're doing block replication if you had seventy percent of your box that had already gone through like perhaps what I've done here when i start the tunnel backup which i'm going to do now what's going to happen of course is we just continue from those blocks that simply haven't made their way across the net so i've started the tunnel back up the monitor you'll see springs back to life all the name nodes will have to resync that they've been out of sync for some period of time they'll learn any transactions that they missed they'll be they'll heal themselves into the cluster and we immediately start replicating blocks and then to kind of show you the bi-directional nature of this I'm going to run Tara validate in the opposite data center over in Oregon and I'll just do it on that first directory that we created and in what you'll see is that we now wind up with foreign blocks in both sides I'm running applications at the same time across datacenters fully active active configuration in a single Hadoop cluster okay so the question is on that one what is the net net summarized that demo reel quick bottom line in two sentences is that important bottom line is if name notes fail if the wind fails you are still continuously operational okay so we have questions from the commentary here from the crowd chat does this eliminate the need for backup and what is actually transferring certainly not petabytes of data ? I mean you somewhat have to transfer what what's important so if it's important for you to I suppose if it was important for you to transfer a petabyte of data then you would need the bandwidth that support I transfer of a petabyte of data but we are to a lot of Hollywood studios we were at OpenStack summit that was a big concern a lot of people are moving to the cloud for you know for workflow and for optimization Star Wars guys were telling us off the record that no the new film is in remote locations they set up data centers basically in the desert and they got actually provisioned infrastructure so huge issues yeah absolutely so what we're replicating of course is HDFS in this particular case I'm replicating all the data in this fairly small cluster between the two sites or in this case this demo is only between two sites I could add a third site and then a failure between any two would actually still allow complete you know complete availability of all the other sites that still participate in the algorithm Brent great to have you on I want to get the perspective from you in the trenches out in customers what's going on and win disco tell us what the culture there what's going on the company what's it like to work there what's the guys like I mean we we know some of the dudes there cause we always drink some vodka with him because you know likes to tip back a little bit once in a while but like great guy great geeks but like what's what's it like it when disco I think the first you know you touched on a little piece of it at first is there are a lot of smart people at windows go in fact I know when I first came on board I was like wow I'm probably the most unsmoked person at this company but culturally this is a great group of guys they like to work very hard but equally they like to play very hard and as you said you know I've been out with cause several times myself these are all great guys to be out with the culture is great it's a it's a great place to work and you know so you know people who are who are interested should certainly yeah great culture and it fits in we were talking last night very social crowd here you know something with a Hortonworks guide so javi medicate fortress ada just saw him walk up ibm's here people are really sociable this event is really has a camaraderie feel to it but yet it's serious business and you didn't the days they're all a bunch of geeks building in industry and now it's got everyone's attention Cisco's here in Intel's here IBM's here I mean what's your take on the big guys coming in I mean I think the big guys realize that that Hadoop is is is the elephant is as large as it appears elephant is in the room and exciting and it's and everybody wants a little piece of it as well they should want a piece of it Brett thanks for coming on the cube really appreciate when discs are you guys a great great company we love to have them your support thanks for supporting the cube we appreciate it we right back after this short break with our next guest thank you

Published Date : Jun 4 2014

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
seventy percent	QUANTITY	0.99+
Oregon	LOCATION	0.99+
two sites	QUANTITY	0.99+
Jeff Kelly	PERSON	0.99+
3,000 miles	QUANTITY	0.99+
Virginia	LOCATION	0.99+
Jeff Rick	PERSON	0.99+
Burt	PERSON	0.99+
84	QUANTITY	0.99+
Northern Virginia	LOCATION	0.99+
North Virginia	LOCATION	0.99+
two	QUANTITY	0.99+
five hours	QUANTITY	0.99+
3,000 miles	QUANTITY	0.99+
7,000 miles	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
Brett	PERSON	0.99+
Star Wars	TITLE	0.99+
10 gigabytes	QUANTITY	0.99+
half a million dollars	QUANTITY	0.99+
16 hours	QUANTITY	0.99+
Brett Rudenstein	PERSON	0.99+
Jeff	PERSON	0.99+
both locations	QUANTITY	0.99+
two sentences	QUANTITY	0.99+
two files	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
two datacenters	QUANTITY	0.99+
two data centers	QUANTITY	0.99+
one	QUANTITY	0.99+
two different clusters	QUANTITY	0.99+
both sides	QUANTITY	0.99+
both sites	QUANTITY	0.99+
first directory	QUANTITY	0.98+
third site	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first	QUANTITY	0.98+
Cisco	ORGANIZATION	0.98+
twenty-seven percent	QUANTITY	0.98+
John	PERSON	0.98+
first thing	QUANTITY	0.98+
one side	QUANTITY	0.97+
Britain	LOCATION	0.97+
today	DATE	0.97+
two definitions	QUANTITY	0.97+
OpenStack	EVENT	0.96+
Hortonworks	ORGANIZATION	0.96+
eighty eight percent	QUANTITY	0.96+
last night	DATE	0.96+
both data centers	QUANTITY	0.94+
each one	QUANTITY	0.94+
zero	QUANTITY	0.94+
once a year	QUANTITY	0.94+
one failure	QUANTITY	0.93+
the cube and hadoop summit 2014	EVENT	0.93+
two geographic territory	QUANTITY	0.93+
Intel	ORGANIZATION	0.92+
both	QUANTITY	0.92+
single	QUANTITY	0.92+
this year	DATE	0.91+
one data center	QUANTITY	0.91+
dupe summit	EVENT	0.9+
Brett room Stein	PERSON	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for UCI: