Jagane Sundar, WANdisco - BigDataNYC - #BigDataNYC - #theCUBE

>> Announcer: Live from New York, it's theCUBE covering BigData New York City 2016, brought to you by headline sponsors Cisco, IBM, Nvidia, and our ecosystem sponsors. Now here are your hosts, Dave Vellante and Peter Burris. >> Welcome back to theCUBE everybody. This is BigData NYC and we are covering wall to wall, we've been here since Monday evening. We we're with Nvidia, Nvidia talking about deep learning, machine learning. Yesterday we had a full slate, we had eight data scientists up on stage yesterday and then we covered the IBM event last night, the rooftop party. Saw David Richards there, hanging out with him, and wall to wall today and tomorrow. Jagane Sundar is here, he is the CTO of WANdisco, great to see you again Jagane. >> Thanks for having me Dave. >> You're welcome. It's been a while since you and I sat down and I know you were on theCUBE recently at Oracle Headquarters, which I was happy to see you there and see the deals that are going on you've got good stuff going on with IBM, good stuff going on with Oracle, the Cloud is eating the world as we sort of predicted and knew but everybody wanted to put their head in the sand but you guys had to accommodate that didn't you. >> We did and if you remember us from a few years ago we were very very interested in the Hadoop space but along the journey we realized that our replication platform is actually much bigger than Hadoop. And the Cloud is just a manifestation of that vision. We had this ability to replicate data, strongly consistent, across wide area networks in different data centers and across storage systems so you can go from HDFS to a Cloud storage system like S3 or Azure Wasabi and we will do it with strong consistency. And that turned out to be a bigger deal than actually providing just replication for the Hadoop platform. So we expanded beyond our initial Hadoop Forex and now we're big in the Cloud. We replicate data to many Cloud providers and customers use us for many use cases like disaster recovery, migration, active/active, Cloud bursting, all of those interesting use cases. >> So any time I get you on theCUBE I like to refresh the 101 for me and for the audience that may not be familiar with it but you say strongly consistent, versus you hear the term eventual consistency, >> Jugane: Correct. >> What's the difference, why is the latter inadequate for the applications that you're serving. >> Right so when people say eventually consistent, what they don't remember is that eventually consistent systems often have different data in the different replicas and once in a while, once every five minutes or 15 minutes, they have to run an anti-entropy process to reconcile the differences and entropy is the total randomness right if you go back to your physics, high school physics. What you're really talking about is having random data and once every 10 minutes making it reconcile and the reconciliation process is very messy, it's like last right winds and the notion of time becomes important, how do you keep time accurate between those. Companies like Google have wonderful infrastructure where they have GPS and atomic clocks and they can do a better job but for the regular enterprise user that's a hard problem so often you get wrong data that's reconciled. So asking the same query you may get different answers and your different replicas. That's a bad sign, you want it consistent enough so you can guarantee results. >> Dave: And you've done this with math, right? >> Exactly, our basis is an algorithm called Paxos, which was invented by a gentleman called Leslie Lamport back in '89 but it took many decades for that algorithm to be widely understood. Our own chief scientists spent over a decade developing those, adding enhancements to make it run over the wide area network. The end result is a strongly consistent system, mathematically proven, that runs over the wide area network and it's completely resistant to failure of all sorts. >> That allows you to sort of create the same type of availability, data consistency as you mentioned Google with the atomic clocks, Spanner I presume, is this fascinating, I mean when the paper came out I was, my eyes were bleeding reading it and but that's the type of capability that you're able to bring to enterprises right? >> That's exactly right, we can bring similar capabilities across diverse networks. You can have regular networking gear, time synchronized by NTP, out in the Cloud, things are running in a virtual machine where time adrift most of the time, people don't realize that VMs are pretty bad at keeping time and all you get up in the Cloud is VMS. Across all those enviroments we can give you strongly consistent replication at the same quality that Google does with their hardware. So that's the value that we bring to the Fortune 500. >> So increasingly enterprises are recognizing that data has an, I don't want to say intrinsic value but data is a source of value in context all by itself. Independent of any hardware, independent of any software. That it's something that needs to be taken care of and you guys have an approach for ensuring that important aspects of it are better taken care of. Not the least of which, is that you can provide an option to a customer who may make a bad technology choice one day to make a better technology choice the next day and not be too worried about dead ending themselves. I'm reminded of the old days when somebody who was negotiating an IBM main frame deal would put an Amdahl coffee cup in front of IBM or put an Oracle coffee cup in front of SAP. Do you find customers metaphorically putting a WANdisco coffee cup in front of those different options and say these guys are ensuring that our data remains ours? >> Customers are a lot more sophisticated now, the scenarios that you pointed out are very very funny but what customers come to us for is the exact same thing, the way they ask it is, I want to move to Cloud X, but I want to make sure that I can also run on Cloud Y and I want to do it seamlessly without any downtime on my on-prem applications that are running. We can give them that. Not only are they building a disaster recovery environment, often they're experimenting with multiple Clouds at the same time and may the better Cloud win. That puts a lot of competition and pressure on the actual Cloud applications they're trying. That's a manifestation in modern Cloud terms of the coffee cup competitor in the face that you just pointed out. Very funny but this how customers are doing it these days. >> So are you using or are they starting to, obviously you are able to replicate with high fidelity with strong fidelity, strong consistency, large volumes of data. Are you starting to see customers, based on that capability actually starting to redesign how they set up their technology plant? >> Absolutely, when customers were talking about hybrid Cloud which was pretty well hyped a year or so ago, they basically had some data on-prem and some other data in the Cloud and they were doing stuff but what we brought to them was the ability to have the same data both on-prem and in the Cloud, maybe you had a weekly analytics job that took a lot of resources. You'd burst that out into the Cloud and run it up there, move the result of that analytics job back on-prem. You'd have it with strong consistency. The result is that true hybrid Cloud is enabled when only when you have the same exact data available in all of your Cloud locations. We're the only company that can provide that so we've got customers that are expanding their Cloud options because of the data consistency we offer. >> And those Cloud options are obviously are increasing >> Jugane: They are. >> But there's also a recognition that it's as we gain more experience with Cloud, that different workloads are better than others as we move up there. Now Oracle with some of their announcements last week may start to push the envelope on that a little bit but as you think about where the need for moving large volumes of data with high, with strong consistency what types of applications do you think people are focusing on? Is it mainly big data or are there other application styles or job types that you think are going to become increasingly important? >> So we've got much more than big data, one of the big sources of leads for us now is our capability to migrate netapp filers up into the Cloud and that has suddenly become very important because an example I'd like to give is a big financial firm that has all of its binaries and applications and user data and netapp filers, the actual data is in HDFS on-prem. They're moving their binaries from the netapp up into the Cloud in a specific Cloud windows equal into the filer and the big data part of it from HDFS up into Cloud object store, we are the only platform that can deal with both in the strong consistent manner that I've talked about and we're a single replication platform so that gives them the ability to make the sort of a migration with very low risk. One of the attributes of our migration is that we do it with no downtime. You don't have to take your online, your on-prem environment offline in order to do the migration so they are doing that so we see a lot of business from that sort of migration efforts where people have data in mass filers, people have data in other non-HDFS storage systems. We're happy to migrate all of those. Our replication platform approach, which we've taken in the last year and a half or so is really paying off in that respect. >> And you couldn't do that with conventional migration techniques because it would take too long, you'd have to freeze the applications? >> A couple of things, one you'd probably have to take the applications offline, second you'd be using tools of periodic synchronization variety such as RSYNC and anybody in the devops or operations whose ever used RSYNC across the wide area network will tell you how bad that experience is. It really is a very bad experience. We've got capability to migrate netapp filer data without imposing a load on the netapp's on-prem so we can do it without pounding the crap out of the netapp's server such that they can't offer service to their existing customers. Very low impact on the network configuration, application configuration. We can go in, start the migration without downtime, maybe it takes two, three days for the data to get up over there because of mavenlink. After that is done, you can start playing with it up in the Cloud. And you can cut over seamlessly so there's so real downtime, that's the capability we've seen. >> But you've also mentioned one data type, binaries, they can't withstand error propagation. >> Jugane: Absolutely. >> And so being able to go to a customer and say you're going to have to move these a couple times over the course of the next n-months or years, as a consequence of the new technology that's now available and we can do so without error propagation is going to have a big impact on how well their IT infrastructure, their IT asset base runs in five years. >> Indeed, indeed. That's very important. Having the ability to actually start the application, having the data in a consistent and true form so you can start, for example, the data base and have it mount the actual data so you can use it up in the Cloud, those are capabilities that are very important to customers. >> So there's another application. If you think about, you tend to be more bulk, the question I'm going to ask is and at what point in time is the low threshold in terms of specific types of data movement. Here's why I'm asking. IOT data is a data source or is a use-case that has often the most stringent physical constraints possible. Time, speed of light, has an implication but also very importantly, this notion of error propagation really matters. If you go from a sensor to a gateway to another gateway to another gateway you will lose bits along the way if you're not very careful. >> Correct. >> And in a nuclear power plant, that doesn't work that way. >> Jugane: Yeah. >> Now we don't have to just look at a nuclear power plant as an example but there's increasingly industrial IOTs starting to dramatically impact not just life and death circumstances but business success or failure. What types of smaller batch use-cases do you guys find yourselves operating in, in places like IOT where this notion of error or air control strong consistency is so critical? >> So one of the most popular applications that use our replication is Spark and Spark Streaming which as you can imagine is a big part of most IOT infrastructure, we can do replication such that you ingest into the closest data center, you go from your server or your car or whatever to the closest data center, you don't have to go multiple hops. We will take care consistency from there on. What that gives you is the ability to say I have 12 data centers with my IOT infrastructure running, one data center goes down, you don't have a downtime at all. It's only the data that was generated inside the data center that's lost. All client machines connecting to that data center will simply connect to another data center, strong replication continues, this gives you the ability to ingest at very large volumes while still maintaining the consistency and IOT is a big deal for us, yes. >> We're out of time but I got a couple of last minute questions if I may. So when you integrate with IBM, Oracle, what kind of technical issues do you encounter, what kind of integration do you have to do, is it lightweight, heavyweight, middleweight? >> It's middleweight I would say. IBM is a great example, they have a deep integration with our product and some of the authentication technology they use was more advanced than what was available in open source at that time. We did a little of work, and they did a little bit of work to make that work, but other than that, it's a pretty straight forward process. The end result is that they have a number of their applications where this is a critical part of their infrastructure. >> Right, and then road map. What can you tell us about, what should we look for in the future, what kind of problems are you going to be solving? >> So we look at our platform as the best replication engine in the world. We're building an SDK, we expect custom plugins for different other applications, we expect more high-speed streaming data such as IOT data, we want to be the choice for replication. As for the plugins themselves, they're getting easier and easier to build so you'll see wide coverage from us. >> Jugane, thanks so much for coming to theCUBE, always a pleasure to have you. >> Thank you for having me. >> You're welcome. Alright keep it right there everybody, we'll be back to wrap. This is theCUBE, we're live from NYC. We'll be right back. (upbeat electronic music)

Published Date : Sep 29 2016

SUMMARY :

brought to you by headline great to see you again Jagane. and see the deals that are going on but along the journey we realized for the applications that you're serving. So asking the same query you runs over the wide area network So that's the value that we is that you can provide the scenarios that you pointed So are you using or You'd burst that out into the Cloud or job types that you think are going to and the big data part of it from HDFS and anybody in the devops or operations they can't withstand error propagation. as a consequence of the new and have it mount the actual the question I'm going to ask is that doesn't work that way. do you guys find yourselves operating in, What that gives you is the ability to say do you have to do, and some of the authentication you going to be solving? engine in the world. for coming to theCUBE, This is theCUBE, we're live from NYC.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Peter Burris	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jagane	PERSON	0.99+
Jagane Sundar	PERSON	0.99+
Google	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
two	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
David Richards	PERSON	0.99+
yesterday	DATE	0.99+
Cloud X	TITLE	0.99+
12 data centers	QUANTITY	0.99+
Cloud Y	TITLE	0.99+
tomorrow	DATE	0.99+
last week	DATE	0.99+
three days	QUANTITY	0.99+
five years	QUANTITY	0.99+
New York	LOCATION	0.99+
SAP	ORGANIZATION	0.99+
Jugane	PERSON	0.99+
One	QUANTITY	0.99+
Leslie Lamport	PERSON	0.99+
both	QUANTITY	0.99+
Yesterday	DATE	0.99+
Monday evening	DATE	0.99+
'89	DATE	0.99+
WANdisco	ORGANIZATION	0.98+
last night	DATE	0.98+
today	DATE	0.98+
Amdahl	ORGANIZATION	0.97+
one day	QUANTITY	0.97+
over a decade	QUANTITY	0.97+
single	QUANTITY	0.97+
Cloud	TITLE	0.95+
Hadoop	TITLE	0.95+
BigData	ORGANIZATION	0.95+
S3	TITLE	0.95+
one	QUANTITY	0.95+
next day	DATE	0.94+
eight data scientists	QUANTITY	0.93+
a year or so ago	DATE	0.9+
five minutes	QUANTITY	0.88+
BigDataNYC	ORGANIZATION	0.88+
once	QUANTITY	0.88+
Spark	TITLE	0.87+
few years ago	DATE	0.87+
one data center	QUANTITY	0.86+
Azure Wasabi	TITLE	0.86+
BigData	EVENT	0.84+
Paxos	OTHER	0.81+
101	QUANTITY	0.79+
one data	QUANTITY	0.77+
once every 10 minutes	QUANTITY	0.77+
last year and a half	DATE	0.77+
CTO	PERSON	0.76+
theCUBE	TITLE	0.75+
next n-months	DATE	0.74+
York City 2016	EVENT	0.71+
Oracle Headquarters	ORGANIZATION	0.67+
couple	QUANTITY	0.63+
Fortune 500	ORGANIZATION	0.58+
many	QUANTITY	0.58+
WANdisco	COMMERCIAL_ITEM	0.55+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for York City 2016: