David Richards, WANdisco | theCUBE NYC 2018

Live from New York, it's theCUBE. Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone. This is theCUBE live in New York City for our CUBE NYC event, #cubenyc. This is our ninth year covering the big data ecosystem going back to the original Hadoop world, now it's evolved to essentially all things AI, future of AI. Peter Burris is my cohost. He gave a talk two nights ago on the future of AI presented in his research. So it's all about data, it's all about the cloud, it's all about live action here in theCUBE. Our next guest is David Richards, who's been in the industry for a long time, seen the evolution of Hadoop, been involved in it, has been a key enabler of the technology, certainly enabling cloud recovery replication for cloud, welcome back to theCUBE. It's good to see you. >> It's really good to be here. >> I got to say, you've been on theCUBE pretty much every year, I think every year, we've done nine years now. You made some predictions and calls that actually happened. Like five years ago you said the cloud's going to kill Hadoop. Yeah, I think you didn't say that off camera, but it might (laughing) maybe you said it on camera. >> I probably did, yeah. >> [John] But we were kind of pontificating but also speculating, okay, where does this go? You've been right on a lot of calls. You also were involved in the Hadoop distribution business >>back in the day. Oh god. >> You got out of that quickly. (laughing) You saw that early, good call. But you guys have essentially a core enabler that's been just consistently performing well in the market both on the Hadoop side, cloud, and as data becomes the conversation, which has always been your perspective, you guys have had a key in part of the infrastructure for a long time. What's going on? Is it still doing deals, what's? >> Yes, I mean, the history of WANdisco's play and big data in Hadoop has been, as you know because you've been with us for a long time, kind of an interesting one. So we back in sort of 2013, 2014, 2015 we built a Hadoop-specific product called Non-Stop NameNode and we had a Hadoop distribution. But we could see this transition, this change in the market happening. And the change wasn't driven necessarily by the advent of new technology. It was driven by overcomplexity associated with deploying, managing Hadoop clusters at scale because lots of people, and we were talking about this off-camera before, can deploy Hadoop in a fairly small way, but not many companies are equipped or built to deploy massive scale Hadoop distributions. >> Sustain it. >> They can't sustain it, and so the call that I made you know, actions speak louder than words. The company rebuilt the product, built a general purpose data replication platform called WANdisco Fusion that, yes, supported Hadoop but also supported object store and cloud technologies. And we're now seeing use cases in cloud certainly begin to overtake Hadoop for us for the first time. >> And you guys have a patent that's pretty critical in all this, right? >> Yeah. So there's some real IP. >> Yes, so people often make the mistake of calling us a data replication business, which we are, but data replication happens post-consensus or post-agreement, so the very heart of WANdisco of 35 patents are all based around a Paxos-based consensus algorithm, which wasn't a very cool thing to talk about now with the advent of blockchain and decentralized computing, consensus is at the core of pretty much that movement, so what WANdisco does is a consensus algorithm that enables things like hybrid cloud, multi cloud, poly cloud as Microsoft call it, as well as disaster recovery for Hadoop and other things. >> Yeah, as you have more disparate parts working together, say multi cloud, I mean, you're really perfectly positioned for multi cloud. I mean, hybrid cloud is hybrid cloud, but also multi cloud, they're two different things. Peter has been on the record describing the difference between hybrid cloud and multi cloud, but multi cloud is essentially connecting clouds. >> We're on a mission at the moment to define what those things actually are because I can tell you what it isn't. A multi cloud strategy doesn't mean you have disparate data and processes running in two different clouds that just means that you've got two different clouds. That's not a multi cloud strategy. >> [Peter] Two cloud silos. >> Yeah, correct. That's kind of creating problems that are really going to be bad further down the road. And hybrid cloud doesn't mean that you run some operations and processes and data on premise and a different siloed approach to cloud. What this means is that you have a data layer that's clustered and stretched, the same data that's stretched across different clouds, different on-premise systems, whether it's Hadoop on-premise and maybe I want to build a huge data lake in cloud and start running complex AI and analytics processes over there because I'm, less face it, banks et cetera ain't going to be able to manage and run AI themselves. It's already being done by Amazon, Google, Microsoft, Alibaba, and others in the cloud. So the ability to run this simultaneously in different locations is really important. That's what we do. >> [John] All right, let me just ask this directly since we're filming and we'll get a clip out of this. What is the definition of hybrid cloud? And what is the definition of multi cloud? Take, explain both of those. >> The ability to manage and run the same data set against different applications simultaneously. And achieve exactly the same result. >> [John] That's hybrid cloud or multi cloud? >> Both. >> So they're the same. >> The same. >> You consider hybrid cloud multi cloud the same? >> For us it's just a different end point. It's hybrid kind of mean that you're running something implies on-premise. A multi cloud or poly cloud implies that you're running between different cloud venues. >> So hybrid is location, multi is source. >> Correct. >> So but let's-- >> [David] That's a good definition. >> Yes, but let's unpack this a little bit because at the end of the day, what a business is going to want to do is they're going to want to be able to run apply their data to the best service. >> [David] Correct. >> And increasingly that's what we're advising our clients to think about. >> [David] Yeah. >> Don't think about being an AWS customer, per se, think about being a customer of AWS services that serve your business. Or IBM services that serve your business. But you want to ensure that your dependency on that service is not absolute, and that's why you want to be able to at least have the option of being able to run your data in all of these different places. >> And I think the market now realizes that there is not going to be a single, dominant vendor for cloud infrastructure. That's not going to happen. Yes, it happened, Oracle dominated in relational data. SAP dominated for ERP systems. For cloud, it's democratized. That's not going to happen. So everybody knows that Amazon probably have the best serverless compute lambda functions available. They've got millions of those things already written or in the process of being written. Everybody knows that Microsoft are going to extend the wonderful technology that they have on desktop and move that into cloud for analytics-based technologies and so on. The Google have been working on artificial intelligence for an elongated period of time, so vendors are going to arbitrage between different cloud vendors. They're going to choose the best of brood approach. >> [John] They're going to go to Google for AI and scale, they're going to go to Amazon for robustness of services, and they're going to go to Microsoft for the Suite. >> [Peter] They're going to go for the services. They're looking at the services, that's what they need to do. >> And the thing that we'll forget, that we don't at WANdisco, is that that requires guaranteed consistent data sets underneath the whole thing. >> So where does Fusion fit in here? How is that getting traction? Give us some update. Are you working with Microsoft? I know we've been talking about Amazon, what about Microsoft? >> So we've been working with Microsoft, we announced a strategic partnership with them in March where we became a tier zero vendor, which basically means that we're partnered with them in lockstep in the field. We executed extremely well since that point and we've done a number of fairly large, high-profile deals. A retailer, for example, that was based in Amazon didn't really like being based in Amazon so had to build a poly cloud implementation to move had to buy scale data from AWS into Azure, that went seamlessly. It was an overnight success. >> [John] And they're using your technology? >> They're using our technology. There's no other way to do that. I think the world has now, what Microsoft and others have realized, CDC technology changed data capture. Doesn't work at this kind of scale where you batch up a bunch of changes and then you ship them, block shipping or whatever, every 15 minutes or so. We're talking about petabyte scale ingest processes. We're talking about huge data lakes, that that technology simply doesn't work at this kind of scale. >> [John] We've got a couple minutes left, I want to just make sure we get your views on blockchain, you mentioned consensus, I want to get your thoughts on that because we're seeing blockchain is certainly experimental, it's got, it's certainly powering money, Bitcoin and the international markets, it's certainly becoming a money backbone for countries to move billions of dollars out. It's certainly in the tank right now about 600 million below its mark in January, but blockchain is fundamentally supply chain, you're seeing consensus, you're seeing some of these things that are in your realm, what's your view? >> So first of all, at WANdisco, we separate the notion of cryptocurrency and blockchain. We see blockchain as something that's been around for a long time. It's basically the world is moving to decentralization. We're seeing this with airlines, with supermarkets, and so on. People actually want to decentralize rather that centralize now. And the same thing is going to happen in the financial industry where we don't actually need a central transaction coordinator anymore, we don't need a clearinghouse, in other words. Now, how do you do that? At the very heart of blockchain is an incorrect assumption. So must people think that Satoshi's invention, whoever that may be, was based around the blockchain itself. Blockchain is pieced together technologies that doesn't actually scale, right? So it takes game-theoretic approach to consensus. And I won't get, we don't have enough time for me to delve into exactly what that means, but our consensus algorithm has already proven to scale, right? So what does that mean? Well, it means that if you want to go and buy a cup of coffee at the Starbucks next door, and you want to use a Bitcoin, you're going to be waiting maybe half an hour for that transaction to settle, right? Because the-- >> [John] The buyer's got to create a block, you know, all that step's in one. >> The game-theoretic approach basically-- >> Bitcoin's running 500,000 transactions a day. >> Yeah. That's eight. >> There's two transactions per second, right? Between two and eight transactions per second. We've already proven that we can achieve hundreds of thousands, potentially millions of agreements per second. Now the argument against using Paxos, which is what our technology's based on, is it's too complicated. Well, no shit, of course it's too complicated. We've solved that problem. That's what WANdisco does. So we've filed a patent >> So you've abstracted the complexity, that's your job. >> We've extracted the complexity. >> So you solve the complexity problem by being a complex solution, but you're making and abstracting it even easier. >> We have an algorithmic not a game-theoretic approach. >> Solving the scale problem Correct. >> Using Paxos in a way that allows real developers to be able to build consensus algorithm-based applications. >> Yes, and 90% of blockchain is consensus. We've solved the consensus problem. We'll be launching a product based around Hyperledger very soon, we're already in tests and we're already showing tens of thousands of transactions per second. Not two, not 2,000, two transactions. >> [Peter] The game theory side of it is still going to be important because when we start talking about machines and humans working together, programs don't require incentives. Human beings do, and so there will be very, very important applications for this stuff. But you're right, from the standpoint of the machine-to-machine when there is no need for incentive, you just want consensus, you want scale. >> Yeah and there are two approaches to this world of blockchains. There's public, which is where the Bitcoin guys are and the anarchists who firmly believe that there should be no oversight or control, then there's the real world which is permission blockchains, and permission blockchains is where the banks, where the regulators, where NASDAQ will be when we're trading shares in the future. That will be a permission blockchain that will be overseen by a regulator like the SEC, NASDAQ, or London Stock Exchange, et cetera. >> David, always great to chat with you. Thanks for coming on, again, always on the cutting edge, always having a great vision while knocking down some good technology and moving your IP on the right waves every time, congratulations. >> Thank you. >> Always on the next wave, David Richards here inside theCUBE. Every year, doesn't disappoint, theCUBE bringing you all the action here. Cube NYC, we'll be back with more coverage. Stay with us; a lot more action for the rest of the day. We'll be right back; stay with us for more after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media has been a key enabler of the technology, I got to say, you've been on theCUBE [John] But we were kind of pontificating back in the day. and as data becomes the conversation, in the market happening. and so the call that I made So there's some real IP. consensus is at the core of Peter has been on the record at the moment to define So the ability to run this simultaneously What is the definition of hybrid cloud? and run the same data set implies that you're running is they're going to want to be able to run our clients to think about. of being able to run your data that there is not going to and they're going to go to They're looking at the services, And the thing that we'll forget, How is that getting traction? in lockstep in the field. and then you ship them, Bitcoin and the international markets, And the same thing is going to happen got to create a block, 500,000 transactions a day. That's eight. Now the argument against using Paxos, So you've abstracted the So you solve the complexity problem We have an algorithmic not Solving the scale problem to be able to build consensus We've solved the consensus problem. is still going to be important because and the anarchists who firmly believe that Thanks for coming on, again, always on the action for the rest of the day.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Peter	PERSON	0.99+
Google	ORGANIZATION	0.99+
David Richards	PERSON	0.99+
SEC	ORGANIZATION	0.99+
NASDAQ	ORGANIZATION	0.99+
March	DATE	0.99+
two	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
January	DATE	0.99+
AWS	ORGANIZATION	0.99+
2014	DATE	0.99+
millions	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
2013	DATE	0.99+
WANdisco	ORGANIZATION	0.99+
London Stock Exchange	ORGANIZATION	0.99+
2015	DATE	0.99+
New York City	LOCATION	0.99+
nine years	QUANTITY	0.99+
both	QUANTITY	0.99+
two transactions	QUANTITY	0.99+
eight	QUANTITY	0.99+
five years ago	DATE	0.99+
New York	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
half an hour	QUANTITY	0.99+
35 patents	QUANTITY	0.99+
hundreds of thousands	QUANTITY	0.99+
2,000	QUANTITY	0.99+
Both	QUANTITY	0.99+
ninth year	QUANTITY	0.98+
first time	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
Hadoop	TITLE	0.98+
SAP	ORGANIZATION	0.98+
Starbucks	ORGANIZATION	0.98+
Paxos	ORGANIZATION	0.98+
two nights ago	DATE	0.97+
single	QUANTITY	0.97+
two approaches	QUANTITY	0.97+
500,000 transactions a day	QUANTITY	0.97+
about 600 million	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.96+
Satoshi	PERSON	0.92+
two different clouds	QUANTITY	0.91+
NYC	LOCATION	0.89+
one	QUANTITY	0.88+
theCUBE	EVENT	0.87+

Jagane Sundar, WANdisco | AWS Summit SF 2018

>> Voiceover: Live from the Moscone Center, it's theCUBE. Covering AWS Summit San Francisco 2018. Brought to you by Amazon Web Services. >> Welcome back, I'm Stu Miniman and this is theCUBE's exclusive coverage of AWS Summit here in San Francisco. Happy to welcome back to the program Jagane Sundar, who is the CTO of WANdisco. Jagane, great to see you, how have you been? >> Well, been great Stu, thanks for having me. >> All right so, every show we go to now, data really is at the center of it, you know. I'm an infrastructure guy, you know, data is so much of the discussion here, here in the cloud in the keynotes, they were talking about it. IOT of course, data is so much involved in it. We've watched WANdisco from the days that we were talking about big data. Now it's you know, there's AI, there's ML. Data's involved, but tell us what is WANdisco's position in the marketplace today, and the updated role on data? >> So, we have this notion, this brand new industry segment called live data. Now this is more than just itty-bitty data or big data, in fact this is cloud-scale data located in multiple regions around the world and changing all the time. So you have East Coast data centers with data, West Coast data centers with data, European data centers with data, all of this is changing at the same time. Yet, your need for analytics and business intelligence based on that is across the board. You want your analytics to be consistent with the data from all these locations. That, in a sense, is the live data problem. >> Okay, I think I understand it but, you know, we're not talking about like, in the storage world there was like hot data, what's hot and cold data. And we talked about real-time data for streaming data and everything like that. But how do you compare and contrast, you know, you said global in scope, talked about multi-region, really talking distributed. From an architectural standpoint, what's enabling that to be kind of the discussion today? Is it the likes of Amazon and their global reach? And where does WANdisco fit into the picture? >> So Amazon's clearly a factor in this. The fact that you can start up a virtual machine in any part of the world in a matter of minutes and have data accessible to that VM in an instant changes the business of globally accessible data. You're not simply talking about a primary data center and a disaster recovery data center anymore. You have multiple data centers, the data's changing in all those places, and you want analytics on all of the data, not part of the data, not on the primary data center, how do you accomplish that, that's the challenge. >> Yeah, so drill into it a little bit for us. Is this a replication technology? Is this just a service that I can spin up? When you say live, can I turn it off? How do those kind of, when I think about all the cloud dynamics and levers? >> So it is indeed based on active-active replication, using a mathematically strong algorithm called Paxos. In a minute, I'll contrast that with other replication technologies, but the essence of this is that by using this replication technology as a service, so if you are going up to Amazon's web services and you're purchasing some analytics engine, be it Hive or Redshift or any analytics engine, and you want to have that be accessible from multiple data centers, be available in the face of data center or entire region failure, and the data should be accessible, then you go with our live data platform. >> Yeah so, we want you to compare and contrast. What I think about, you know, I hear active-active, speed of light's always a challenge. You know globally, you have inconsistency it's challenging, there's things like Google Spanner out there to look at those. You know, how does this fit compared to the way we've thought of things like replication and globally distributed systems in the past? >> Interesting question. So, ours great for analytics applications, but something like Google Spanner is more like a MySQL database replacement that runs into multiple data centers. We don't cater to that and database-transaction type of applications. We cater to analytics applications of batch, very fast streaming applications, enterprise data warehouse-type analytics applications, for all of those. Now if you take a look inside and see what kind of replication technology will be used, you'll find that we're better than the other two different types. There are two different types of existing replication technologies. One is log shipping. The traditional Oracle, GoldenGate-type, ship the log, once the change is made to the primary. The second is, take a snapshot and copy differences between snapshots. Both have their deficiencies. Snapshot of course is time-based, and it happens once in a while. You'll be lucky if you can get one day RTO with those sorts of things. Also, there's an interesting anecdote that comes to mind when I say that because the Hadoop folks in their HTFS, implemented a version of snapshot and snapdiff. The unfortunate truth is that it was engineered such that, if you have a lot of changes happening, the snapshot and snapdiff code might consume too much memory and bring down your NameNode. That's undesirable, now your backup facility just brought down your main data capability. So snapshot has its deficiencies. Log shipping is always active/passive. Contrast that with our technology of live data, whereat you can have multiple data centers filled with data. You can write your data to any of these data centers. It makes for a much more capable system. >> Okay, can you explain, how does this fit with AWS and can it live in multi-clouds, what about on-premises, the whole you know, multi and hybrid cloud discussion? >> Interesting, so the answer is yes. It can live in multiple regions within the same cloud, multiple reasons within different clouds. It'll also bridge data that exists on your on-prem, Hadoop or other big data systems, or object store systems within Cloud, S3 or Azure, or any of the BLOB stores available in the cloud. And when I say this, I mean in a live data fashion. That means you can write to your on-prem storage, you can also write to your cloud buckets at the same time. We'll keep it consistent and replicated. >> Yeah, what are you hearing from customers when it comes to where their data lives? I know last time I interviewed David Richards, your CEO, he said the data lakes really used to be on premises, now there's a massive shift moving to the public clouds. Is that continuing, what's kind of the breakdown, what are you hearing from customers? >> So I cannot name a single customer of ours who is not thinking about the cloud. Every one of them has a presence on premise. They're looking to grow in the cloud. On-prem does not appear to be on a growth path for them. They're looking at growing in the cloud, they're looking at bursting into the cloud, and they're almost all looking at multi-cloud as well. That's been our experience. >> At the beginning of the conversation we talked about data. How are customers doing you know, exploiting and leveraging or making sure that they aren't having data become a liability for them? >> So there are so many interesting use cases I'd love to talk about, but the one that jumps out at me is a major auto manufacturer. Telematics data coming in from a huge number, hundreds of thousands, of cars on the road. They chose to use our technology because they can feed their West Coast car telematics into their West Coast data center, while simultaneously writing East Coast car data into the East Coast data center. We do the replication, we build the live data platform for them, they run their standard analytics applications, be it Hadoop-sourced or some other analytics applications, they get consistent answers. Whether you run the analytics application on the East Coast or the West Coast, you will get the same exact answer. That is very valuable because if you are doing things like fault detection, you really don't want spurious detection because the data on the West Coast was not quite consistent and your analytics application was led astray. That's a great example. We also have another example with a top three bank that has a regulatory concern where they need to operate out of their backup data centers, so-called backup data center, once every three months or so. Now with live data, there is no notion of active data center and backup data center. All data centers are active, so this particular regulatory requirement is extremely simple for them to implement. They just run their queries on one of the other data centers and prove to the regulators that their data is indeed live. I could go on and on about a number of these. We also have a top two retailer who has got such a volume data that they cannot manage it in one Hadoop cluster. They use our technology to create the live data data link. >> One of the challenges always, customers love the idea of global but governance, compliance, things like GDPR pop up. Does that play into your world? Or is that a bit outside of what WANdisco sees? >> It actually turns out to be an important consideration for us because if you think about it, when we replicate the data flows through us. So we can be very careful about not replicating data that is not supposed to be replicated. We can also be very careful about making sure that the data is available in multiple regions within the same country if that is the requirement. So GDPR does play a big role in the reason why many of our customers, particularly in the financial industry, end up purchasing our software. >> Okay, so this new term live data, are there any other partners of yours that are involved in this? As always, you want like a bit of an ecosystem to help build out a wave. >> So our most important partners are the cloud vendors. And they're multi-region by nature. There is no idea of a single data center or a single region cloud, so Microsoft, Amazon with AWS, these are all important partners of ours, and they're promoting our live data platform as part of their strategy of building huge hybrid data lakes. >> All right, Jagane give us a little view looking forward. What should we expect to see with live data and WANdisco through the rest of 2018? >> Looking forward, we expect to see our footprint grow in terms with dealing with a variety of applications, all the way from batch, pig scripts that used to run once a day to hive that's maybe once every 15 minutes to data warehouses that are almost instant and queryable by human beings, to streaming data that pours things into Kafka. We see the whole footprint of analytics databases growing. We see cross-capability meaning perhaps an Amazon Redshift to an Azure or SQL EDW replication. Those things are very interesting to us, to our customers, because some of them have strengths in certain areas and other have strengths in other areas. Customers want to exploit both of those. So we see us as being the glue for all world-scale analytics applications. >> All right well, Jagane, I appreciate you sharing with us everything that's happening at WANdisco. This new idea of live data, we look forward to catching up with you and the team in the future and hearing more about the customers and everything on there. We'll be back with lots more coverage here from AWS Summit here in San Francisco. I'm Stu Miniman, you're watching theCUBE. (electronic music)

Published Date : Apr 4 2018

SUMMARY :

Brought to you by Amazon Web Services. and this is theCUBE's exclusive coverage data really is at the center of it, you know. and changing all the time. Is it the likes of Amazon and their global reach? The fact that you can start up a virtual machine about all the cloud dynamics and levers? but the essence of this is that by using and globally distributed systems in the past? ship the log, once the change is made to the primary. That means you can write to your on-prem storage, Yeah, what are you hearing from customers They're looking at growing in the cloud, At the beginning of the conversation we talked about data. or the West Coast, you will get the same exact answer. One of the challenges always, of our customers, particularly in the financial industry, As always, you want like a bit of an ecosystem So our most important partners are the cloud vendors. What should we expect to see with live data We see the whole footprint to catching up with you and the team in the future

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
David Richards	PERSON	0.99+
Jagane	PERSON	0.99+
San Francisco	LOCATION	0.99+
Jagane Sundar	PERSON	0.99+
Stu Miniman	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Stu	PERSON	0.99+
One	QUANTITY	0.99+
East Coast	LOCATION	0.99+
Both	QUANTITY	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.98+
MySQL	TITLE	0.98+
West Coast	LOCATION	0.98+
two different types	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
one day	QUANTITY	0.98+
Kafka	TITLE	0.98+
S3	TITLE	0.97+
Moscone Center	LOCATION	0.97+
Oracle	ORGANIZATION	0.96+
once a day	QUANTITY	0.95+
Google Spanner	TITLE	0.95+
single data center	QUANTITY	0.95+
NameNode	TITLE	0.94+
hundreds of thousands	QUANTITY	0.94+
today	DATE	0.93+
theCUBE	ORGANIZATION	0.92+
Azure	TITLE	0.91+
WANdisco	TITLE	0.9+
snapdiff	TITLE	0.89+
SQL EDW	TITLE	0.89+
Redshift	TITLE	0.88+
single customer	QUANTITY	0.87+
AWS Summit	EVENT	0.87+
AWS Summit San Francisco 2018	EVENT	0.86+
single region	QUANTITY	0.85+
2018	DATE	0.84+
snapshot	TITLE	0.81+
Jagane	ORGANIZATION	0.76+
three bank	QUANTITY	0.74+
once every 15 minutes	QUANTITY	0.73+
European	LOCATION	0.73+
AWS Summit SF 2018	EVENT	0.71+
once	QUANTITY	0.7+
Cloud	TITLE	0.65+
every three months	QUANTITY	0.64+
GoldenGate	ORGANIZATION	0.57+
of cars	QUANTITY	0.55+
minute	QUANTITY	0.53+
Paxos	ORGANIZATION	0.53+
HTFS	TITLE	0.53+
Hive	TITLE	0.49+
Hadoop	ORGANIZATION	0.41+
BLOB	TITLE	0.4+

David Richards, WANdisco - BigDataNYC - #BigDataNYC - #theCUBE

(silence) (upbeat techno music) >> Narrator: Live from New York, it's theCUBE, covering Big Data NYC 2016, brought to you by headline sponsors: Cisco... IBM... Nvidia, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. >> Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. David Richards is here. He's the CEO of WANdisco, a long time CUBE alum. Great to see you again. >> Great to be back. >> It was good fun hanging out with last night and a good surprise at the IBM event. There was good action across the street. >> Yeah, you're both looking surprisingly well, actually. >> (Dave laughs) Yes. >> Well, we also heard about the WANdisco versus theCUBE golf tournament, that apparently theCUBE just did really, really well in it and WANdisco went running away with their tail between their legs. >> Well, I talked to Furrier last night. I said, "David Richards was telling me "that he kicked your butt on the golf course." He goes, "Yeah, that's true, actually." (laughter) >> I think I've got some video proof that he actually gave me $20 live on air because, of course, his wallet was empty. (laughter) He was blowing the dust off it, you know? >> Of course, yeah, the body swerve. >> Alligator arms. >> So David, it's, again, great to see you again. You guys have been in this business since day one, and things are evolving. How are things changing for WANdisco? >> So, when we first came into this market, back in the mid-2006, 2007, and then we obviously made a bunch of acquisitions around 2011 and 2012 that took us headlong into the big data marketplace. We pretty much had a completely different business model to our business model now. Then, we had a product called Non-Stop NameNode... My God, can you imagine that? (Dave laughs) That was very focused on the Hadoop marketplace because, at that time, we believed, like everybody else, that Hadoop was going to take over the world, people were going to move to commoditized servers, open-source software, and solve the huge storage problems that they were going to have from both a cost and efficiency perspective. What I think has happened, or is happening right now, is this evolution, and it really is more of a revolution than an evolution is taking place, where workloads, and we were discussing this last night, are moving at massive scale to cloud, and people are really skipping that step, where we thought they were going to have 5, 10,000 sort of clusters on-premise, but now they have some clusters on-prem, but the bulk of the workloads are actually moving into cloud. I was just discussing with George, off-camera a few minutes ago, why that is happening, and there's a lot of applications that are very efficient. The cloud packs are up there ready to use, off the shelf, and it becomes very simplistic, and to be quite frank, do we really care anymore about all these different open-source components? Is the CIO waking up in the middle of the night thinking, oh, my God, am I going to use Ignite, am I going to use Spark, am I going to use Pig, am I going to use Hive, et cetera, et cetera, et cetera? Of course they're not. They really just want to-- Let's inverse the question to ourselves. If you were going to start a competitor to Uber tomorrow, would you go and build a data center (Dave laughs) or would you just throw up a thousand servers up in the cloud and have done with it, and use all the apps that are up there? Of course, the answer's simple, so that's really what's happening. >> Well, one of the things that I... I wrote a piece of research a million years ago in which I prognosticated, the Dictionary Word of the Day, that the value of middleware was inversely proportional to the degree to which anybody knew anything about it. (Dave laughs) CIOs are waking up and asking those questions today, which is an indication that they're creating a problem. >> Yep. >> Infrastructure has to do no harm in the organization. I had a CIO friend for years who still asks his chief CTO, "To what degree is infrastructure creating a problem "for me today?" >> Yeah. >> And if it's creating a problem, it's a problem. >> Mm-hmm. >> You don't want to have to know about this stuff, and so what degree are you helping companies mask some of those... that visibility, so that people can spend less time worrying about the infrastructure? >> So, what we're focused on is a business model that has gone from direct, where we were hiring out a very large direct sales force enterprise, the classic enterprise sales guys that would go knock on doors, knock deals down, go and sell to the Global 1000s, to an indirect model, and we announced that OAM, recently with IBM, IBM Big Replicate, that is under the covers, is WANdisco Fusion, which is a great deal for us. So, our focus very much is on data movement, and data movement between data centers, for companies that want to stay on-prem, and between data centers and in and out of cloud seamlessly, and the word there is seamlessly. So, we worked very hard for the past 18 months on our product such that anybody can go to, if you want to go to the AWS Marketplace, you can, in a few clicks, begin to replicate petabyte-scale in and out of cloud, and we think, and we were discussing this last night, that the hybrid-cloud model is really fascinating, so the ability to take data on-premise, query it in cloud, get complete consistency between on-prem and cloud, but also have all the efficiency in the cloud economics, the elasticity, all the applications that exist in cloud, and I think that model is really interesting, and what's interesting is, I'm not sure that the little guys can execute in that model other than, like we're doing, veer on OAM, an indirect model. So, I'm not sure whether or not, just to go back to the conversation, CIOs are as concerned as they used to be about which Hadoop distribution, for example, they're using. I never hear that question anymore. That question was a 2012, 2013 question. What the CIOs are now concerned about is the economics of cloud, and how do I get that less than $5 per terabyte of data economics that I get in a cloud environment. >> Well, but also increasingly, they're talking about the use cases. >> David: Yeah. >> They want to get their people... They don't want to replicate the Linux or Unix versus NT wars of the 1990s, which was made possible because they were focused on what accounting package am I going to run? Am I going to run it-- >> Yeah. >> on this or that? You know, it was known process, unknown technology. In today's universe, it's unknown process, and they don't want to know as much about the technology, so they're focused on how do I get my men and women focused on use cases that are delivering value for their business. >> Exactly, and the economics question is really simple. Am I going to build a massive, partially used, elastic infrastructure on-premise or am I just going to go and use the elastic infrastructure that already exists in the cloud? That's a no-brainer. That's already happening, and the good news for us, the good news for WANdisco, is it's precisely what we do. It's a data movement problem. Now, I'm bound to say that, but it is actually a data movement problem. In this idea that you have data that changes, active transactional data, as we call it, so the active transactional data movement is a really hard problem. You can't just take a snapshot, right? A file scan and then a snapshot and then move the data, and that's the problem that all the other data replication guys have got. That's what IBM, OAM, that's why we've got strategic partnerships with companies like Oracle, like Amazon, and why I'm sure we'll be announcing things in due course with the other cloud vendors, like Google, for example, and Microsoft with their Azure products. They all have that problem, so data movement, in and out of cloud, if it's batch, if it's static, if it's archival data, easy problem to solve. There's a million and one different replication products. >> Dave: Right. >> You can use rsync if you really wanted to do that, but active transactional data, data that changes, data that moves, you know, at petabyte scale, hard problem. That's the problem that we solve. >> Because you've got speed of light problems and you're exposing yourself to data loss-- >> Yep. >> if something goes wrong. >> Peter: Fidelity is a problem. >> An eventual consistency replication model-- >> Yeah, it... >> doesn't work. You can't... If I'm query... We've got a customer that's trying to look at cardiographs, right, in and out of cloud. I mean, would you really feel comfortable in your cardiograph eventually getting into the cloud and being analyzed? You know, would you? You've got to be absolutely crystal clear that the data is completely consistent from the stuff that I'm generating on-premise versus the models that I'm building in cloud. It's vitally important. >> Well, I would imagine there's regulations, in certain industries anyway, that-- >> Oh, yeah, absolutely. >> require that eventual consistency doesn't fit, right? >> Yeah. Well, I mean, at the moment, without us, that's all you got, I'm afraid... >> Okay. >> Well, so, I'm on a mission, let me and I want to get your take on it, that we always talk about elastic infrastructure, which is a given workload, being able to scale up and scale down. >> David: Yeah. >> I think it's time to start talking about plastic infrastructure-- >> David: Oh, yeah, I like it. >> where a given workload, but a reconfiguration of how that workload is applied because of the value of data, because of integration, because of the need to be able to move in response to business needs. So we talk about plastic infrastructure, where we are reconfiguring based on policy and rules and some other things. What do you think about that? >> I love it, and the reason I love it is because, just to take a step back, the definition of hybrid cloud is... You would imagine it would be relatively simple, but to me, a hybrid means that you have... You know, it's a bit like a hybrid golf club. It's neither a driver nor an iron. It's somewhere in between. So, you have the same workload that can exist both on-premise and in the cloud. I can use both the cloud and on-premise interchangeably. What hybrid cloud actually means, for all the vendors, and this is their dirty little secret, it means that you have some workloads running against some data in the cloud and others that will run against some data on-premise. Now, why do they do that? Because they have to. Because they can't guarantee complete consistency between on-premise and cloud. Our definition of hybrid cloud is exactly the same data, if you want, between on-premise and cloud, and I love this plastic phrase, the idea of repurposing all of those applications, and they can live anywhere. It doesn't matter 'cause it's the same data. >> Yeah, so we have two terms we have to copyright here, plastic infrastructure. >> Plastic... >> What was the other one we heard? >> Data portfolio. >> Data portfolio, yeah. We'll run the tape back >> Plastic infrastructure. (laughter) >> Plastic infrastructure. >> I'm going to steal it (laughs). >> Please do, you know? But the key thing is, as these technologies get more deeply embedded within business and how the business runs, it's incumbent upon the technology leadership to be able to rapidly be able to reconfigure the infrastructure in response to what the business needs. That's not elasticity. >> Yeah. >> That's plasticity. >> I love it, absolutely. (Peter laughs) And I think you're touching on something that's changing, and what we discussed earlier, which is that CIOs aren't waking up in the middle of the night thinking, am I going to use Pig or Hive or any of those other open source components. They're thinking about the applications that they're going to build. How am I actually going to start using this data? And I think the agenda's kind of moved on, and walking around the whole... There's still a little bit of confusion. You still have people talking about infrastructure like it really still matters. I'm not absolutely sure it does. >> Well, so let's talk about that. We got a few minutes or something like that. >> Dave: It matters when it breaks, you know? >> What's that? >> It matters when it breaks. >> It sure does matter when it breaks. >> You know, but otherwise, nobody wants to think about it. >> No, yeah, because like I said earlier, it's the degree to which-- >> We have time, but I want to explore the new distribution model as well. >> Yeah, go ahead. >> Let me do that, get that out, tick that box, if I can. Help me understand, David, how it all works. So you, the partnership with IBM and others, you mentioned Amazon, how does it work? You are in the IBM cloud offering? IBM is actually selling that offering? Is it a branded IBM product? >> So, it's in the big data analytics and cloud offerings. So, at the moment, IBM are very focused, as you know, on owning the platform. IBM, as a company, have the own the platform. >> Dave: Yeah, absolutely. >> So, I'm delighted to say that we're embedded into their platform. Now, they had a big launch of some products last night. >> Yeah. >> I know that they were talking about IBM Big Replicate, which is 100% white label OAM of WANdisco Fusion to solve some very specific problems, primarily around data movement. So, at the hybrid cloud, how do I punch data out into clouds, run the analytics against it, and be sure that I'm going to get the right results? That's what Big Replicate solves, and also, they're moving into mixed environments, whether they're NetApp, just kind of Teradata environment, SAS-based environments, or whether a customer already has an existing distribution of, say, Cloudera or Hortonworks, so they can live alongside that, so we can replicate data between existing deployments, where they may have already made a strategic decision to go with one of those distributions, and also be able to migrate not just into IBM Big Insights, but also into their cloud offering, so that's a great deal for us. We're not... They're selling it themselves. I mean, obviously we've done a lot of field enablement, trained 5,000 or so IBM sales rep, and, you know, if a small company like WANdisco, or a small company like virtually any of the vendors in there that are not in the Global 1000 list, the go-to market has to be indirect. >> And so you're... Totally agree, and so you're basically, if I understand it correctly, you're moving what are conventional filers into the cloud. Customers are doing that. >> Oh. >> How fast is that happening and why are they doing that? >> My, God. I mean, we have not announced this product yet, but we're in the middle of launching it. It's, at scale, moving petabyte-scale data from, and this is transactional data, so it's a hard problem to solve, right, so it's an active data... It's an active transactional data replication problem. So, a lot of... The dirty little secret in the cloud is that a lot of those NFS filers have not moved yet-- >> Right. >> And why haven't they moved? 'Cause they can't. Because you can't just... You know, if you were to travel, one of the customaries of banks and travel companies is they can't press pause in their organization, do a file scan that's going to take six months, and then turn it back on again, and hey, presto, it's in the cloud. You can't do that. So, you kind of have to... At every single migration of those filers, of any sort of data, is a hybrid model, so you have to be able to run both on-prem and cloud while that migration is happening, and there, I can tell you, are a lot, a hell of a lot of NetApp filers that are going to move very soon here, in time. >> Dave: Oh, 'cause that's the problem that you solve. Otherwise, you'd have to freeze everything, which would kill your business, so you can't do it. >> Yeah, so when human beings imagine things, we're always imagining small use cases, small sets, like moving a few files into Dropbox or something, and that's okay that I can't edit those files for the few seconds it takes to move. I took a look at a deal the other day that was 3 billion files. (Dave laughs) Right, 3 billion. You can't even... My brain can't even calculate that, right? That's a three to six month data movement, and Amazon, for example, thought of this product called Snowball, which-- >> Yeah. >> You know, no techy ever believes this story, but, of course, they FedEx a box, a ruggedized hard drive to you essentially, a ruggedized server that you pour your data into it and then you mail it back to them and they can put it there. That doesn't work, of course, for transactional data, for data that changes all the time. >> These are hard problems to solve, and I go to market, getting back to your question, it is all about indirect, you know? So, AWS, a strategic partnership, that, Oracle, a strategic partnership, that, IBM... And as I said, I'm sure that we'll be doing things with Google and Microsoft soon, and they're the five partnerships that I really care about, to be quite frank with you. >> Mm-hmm. and this comes back to this notion of infrastructure, the value of infrastructure, and just to touch on it for a second, so many years ago, when we were doing client-server, >> David: Mm-hmm. >> We would test it on a local area network and deploy it on a WAN (David laughs) and wonder why it blew up. >> David: Yeah. >> The realities of the speed of light and the practical limitations have a real impact on design, and so where infrastructure still matters is we still have to worry about design, we still have to worry about legacy financial assets, how we're deploying those assets, and I want to come back to this because we were talking earlier about data as an asset, the value of data within the business, and you don't want to be limited by the legacy as you try to find new ways of generating value out of your data, and what you guys are trying to allow is that the data can be moved in response to the use case as opposed to the use case not being made possible because of the legacy decisions about where to put your data. >> David: That's precisely it, and I don't think that any CIO, in their right mind, wants to continue with the huge maintenance costs, maintenance payments they have to make to some of those vendors, some of those NFS-based vendors. They need to shut them down. They have to figure out a way to move them into cloud so you get cloud economics, and also be able to query the data in a massively efficient way. You simply cannot do that at the moment. They simply cannot do that at the moment, so, as I said, as we continue to launch these products in the marketplace, I'm sure you'll see, at scale, some pretty large companies surprising-- You know, the two that spring to my mind are that the regulators in the US and the UK, Fenero and the FCA, are both in the process of their moving all into cloud, 100% into cloud, and I would expect to see that trend continue. I mean, the re:Invent... I don't want to talk about another-- and we're here at Strata, but the AWS re:Invent, I would expect to see several major financial service companies announcing cloud strategy. >> Yeah, and Fenero's a big user of the AWS cloud. They talk about it pretty aggressively, and really interesting use case there. So, yeah, so we got to end. What's next for you guys? You've mentioned you're going to be at re:Invent, you're going to be at World of Watson (laughs)? Where are we going to find you next? >> Both of those. Obviously, the white label with IBM is a really interesting deal for us. I can't talk about deal flow yet 'cause it's our end of quarter at the moment, but I can tell you that they're doing a pretty damn good job of selling, so we're in execution mode at the moment, where we've already announced some key partnerships. There'll be more key partnerships to come, I'm sure. We're obviously chasing deals down with some of the other cloud vendors, and I'd expect to see us announcing some interesting new customer wins in the coming days and weeks. >> Dave: Great. Well, congratulations on the momentum and the renewed strategy. I love it, and I appreciate you coming to theCUBE. >> Always a pleasure. >> All right, keep it right there, buddy. We'll be back with our next guest. This is theCUBE. We're live at Big Data NYC, Strata and Hadoop World. Be right back. (spacey electronica music)

Published Date : Sep 29 2016

SUMMARY :

brought to you by headline sponsors: Great to see you again. and a good surprise at the IBM event. Yeah, you're both looking and WANdisco went running away butt on the golf course." He was blowing the dust off it, you know? great to see you again. Let's inverse the question to ourselves. that the value of middleware no harm in the organization. And if it's creating a and so what degree are so the ability to take data on-premise, they're talking about the use cases. Am I going to run it-- as much about the technology, and that's the problem That's the problem that we solve. that the data is completely consistent Well, I mean, at the moment, without us, being able to scale up and scale down. because of the need to be but to me, a hybrid means that you have... Yeah, so we have two terms We'll run the tape back Plastic infrastructure. in response to what the business needs. that they're going to build. Well, so let's talk about that. You know, but otherwise, to explore the new You are in the IBM cloud offering? So, it's in the big data analytics So, I'm delighted to the go-to market has to be indirect. into the cloud. The dirty little secret in the cloud is and hey, presto, it's in the cloud. the problem that you solve. for the few seconds it takes to move. for data that changes all the time. and I go to market, getting and this comes back to this notion and deploy it on a WAN (David laughs) and the practical limitations You simply cannot do that at the moment. going to be at re:Invent, and I'd expect to see us announcing and the renewed strategy. Strata and Hadoop World.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
FCA	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Fenero	ORGANIZATION	0.99+
David Richards	PERSON	0.99+
$20	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
George	PERSON	0.99+
2012	DATE	0.99+
three	QUANTITY	0.99+
100%	QUANTITY	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
OAM	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
3 billion	QUANTITY	0.99+
two terms	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
5,000	QUANTITY	0.99+
US	LOCATION	0.99+
two	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Linux	TITLE	0.99+
mid-2006	DATE	0.99+
Both	QUANTITY	0.99+
both	QUANTITY	0.99+
five partnerships	QUANTITY	0.99+
2013	DATE	0.99+
tomorrow	DATE	0.99+
Unix	TITLE	0.98+
2011	DATE	0.98+
one	QUANTITY	0.98+
six month	QUANTITY	0.98+
5, 10,000	QUANTITY	0.98+
last night	DATE	0.97+
less than $5 per terabyte	QUANTITY	0.97+
Hadoop World	LOCATION	0.97+
1990s	DATE	0.96+
Dropbox	ORGANIZATION	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for NameNode: