Jaspreet Singh, Druva & Jake Burns, Live Nation | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting: Big Data Silicon Valley. Brought to you by SiliconANGLE Media, and its ecosystem partners. >> Welcome back, everyone, we're here live at San Jose for Big Data SV, Big Data Silicon Valley. I'm John Furrier, cohost of theCUBE. We're here with two great guests, Jaspreet Singh, founder and CEO of Druva, and Jake Burns, VP of Cloud Services of Live Nation Entertainment. Welcome to theCUBE, so what's going on with Cloud? Apps are out there, backup, recovery, what's going on? >> So, we went all in with AWS, and late 2015 and through 2016 we moved all of our corporate infrastructure into AWS, and I think we're a little bit unique in that situation, so in terms of our posture, we're 100% Cloud. >> John: Jaspreet, what's going on with you guys in the Cloud, because we've talked about this before, with a lot of the apps in the cloud, backup is really important. What's the key thing that you guys are doing together with Live Nation? >> Sure, so I think the notion of data is now pretty much everywhere. The data is captured, controlled in data center, now it's getting decentralized into getting into apps and ecosystems, and softwares and services deployed either at the edge or in the Cloud. As the data gets more and more decentralized, the notion of data management, bead backup, BD discovery. Anything has to get more and more centralized. And we strongly believe the epicenter of this whole data management has to move to Cloud. So, Druva is a size based provider for data management. And we work with Live Nation to predict the apps not just in the data center. But, also at the edge and also the Cloud data center. The applications deployed in the Cloud, be it Live Nation or Ticketmaster. >> And what are some of the workloads you guys are backing up? That's with Druva. >> Yeah so, it's pretty much all corporate, IT applications. You know, typical things you'd find in any IT shop really. So, you know, we have our financial systems and we have some of our smaller ticketing systems and you know, corporate websites. Things of that nature. So, it's like we have 120 applications that are running and it's just really kind of one of everything. >> We were talking before we came on camera about the history of computing and the Cloud has obviously changed the game. How would you compare the Cloud as a trend relative to operationalizing the role of data and obviously GDPR, Ransomware. These are things that now with the perimeter gone. There's worries. So now, how do you guys look at the Cloud? So Jake, I will start with you. If you can compare and contrast, where we have come from and where we are going. Role of the Cloud. Significant primary, expanding. How would you compare that? And how would you talk to someone who says Hey I'm still in the data center world? What's going on with Cloud? >> Well, yeah, it's significant and it's expanding, both. And you know, it's really transforming the way we do business. So you know just from a high level, things like shortening the time to market for applications, going from three to six months just to get a proof of concept started to today, you know, in the Cloud. Being able to innovate really by trying things trying to... we try 20 different things, decide what works, what doesn't work. And at very low cost. So, it allows us to really do things that just weren't possible before. So, also, we we move more quickly because, you know, we're not afraid of making mistakes. If we provision infrastructure and we don't get it right the first time, we just change it. You know, that's something that we would just never be able to do previously in the data center. So to answer your question, everything is different. >> And as a service model's been kind of key. Is the consumption on your end different like I mean radically different? Like give an example of like how much time would be saved or taken to use other the traditional approaches. >> Oh for sure. You know, in the role of IT has completely changed because you know, instead of worrying about nuts and bolts and servers and storage arrays and data centers. You know, we could really focus on the things that are important to the business. You know, those things delivering results for the business. So, bringing value, bringing applications online and trying things that are going to help you know, us do business rather than focusing on all the minutiae. All that stuff's now been outsourced to Cloud providers. So, really, we kind of have a similar head count and staff. But, we are focused on things that bring value rather than things that are just kind of frivolous. >> Jaspreet, you guys have been very successful startup growing rapidly. The Cloud been a good friend that trend is your friend with the Cloud. >> What's different operationally that you guys are tapping into? What's that tail wind for Druva that's making you guys successful? And is it the ease of use? Is it the ease of consumption? Is it the tech? What's the secret to success with Druva? >> Sure, so, we believe cloud is a very big business transformation trend more than a technology trend. It's how you consumer service with a fixed SLA, with a fixed service agreement across the globe. So, it's ease of consumption. It's simplicity of views. It's orchestration. It's cost control. All those things. So, our promise to our customers is the complexity of data management, backups, archives, data production, which is a risk mitigation project. You know, can be completely abstracted by a simple service. For example, you know, Live Nation consumers, consumer drove a service through Amazon Marketplace. So, think about consuming a critical service like data management through simplicity of marketplace, pay as you go, as you consume the service. Across the globe. In the US, in Australia, and Europe. And also, helps the vendors like us to innovate better. Because we have a control environment to understand how different customers are using the service and be able to orchestrate better security pusher, better threat prevention, better cost control. DevOps. So, it improves the pusher of the service being offered and helps the customer consumer. >> You both are industry veterans by today's standards unless you're like 24 doing some of the cryptocurrency stuff that, you know, doesn't know the old IT baggage. How would you guys view the multi-Cloud conversation? Because we hear that all the time. Multi-Cloud has come up so many times. What does it mean? Jake, what does multi-Cloud actually mean? Is it the same workload across multiple Clouds? Is it the fact that there is multiple Clouds? Certainly, there will be multiple Clouds? But, so, help us digest what that even means these days. >> Yeah, that's a great question and it's a really interesting topic. Multi-Cloud is one of those things where, you know, there's so many benefits to using more than one Cloud provider. But, there are also a lot of pitfalls. So, people really underestimate the difference in the technology and the complexity of managing the technology when you change Cloud providers. I'm talking primarily about infrastructure service providers like Amazon web services. So, you know, I think there's a lot of good reasons to be multi-Cloud to get the best features out of different providers, to not have, you know, the risk of having all your data in one place with one vendor. But, you know, it needs to be done in such a way where you don't take that hit in overhead and complexity and you know, I think that's kind of a prohibitive barrier for most enterprises. >> And what are the big pitfalls that you see? Is it mainly underestimating the stack complexity between them or is it more of just operational questions? I mean what is the pitfalls that you've observed? >> Yeah, so, moving from like a typical IT data center environment to public Cloud provider like AWS. You're essentially asking all your technical staff to start speaking in a new language. Now if you were to introduce a second Cloud provider to that environment, now you're asking them to learn a third language as well. And that's a lot to ask. So, you really have two scenarios where you can make that work today without using a third party. And that's ask all of your staff to know both and that's just not feasible. Or have two tech teams. One for each Cloud platform. That's really not something businesses want to do. So, I think the real answer is to rely on a third party that can come in and abstract one of those Cloud complexities Well, one of those Cloud providers out. So, you don't have to directly manage it. And in that way, you can get the benefit of being multi-Cloud, that data protection of being multi-Cloud. But, not have to introduce that complexity to your environment. >> To provide some abstraction layer. Some sort of software approach. >> Yeah, like for example, if you have your primary systems in AWS, and you use a software like Druva Phoenix to backup your data and you put that data into a second Cloud provider. You don't have to an account with that second Cloud provider. You don't have to have the risk of associating without a complexity associated without that is I think is a very >> And that's where you're looking for differentiation. We look at venues, say hey don't make me work harder. >> Right. >> And add new staff. Solve the problem. >> Yeah, it's all about solving problems right? And that's why we're doing this. >> So, Druva talk about this thing. Because we talked about it earlier about To me we could be oh we're on Azure. Well, they have Office 365 of course they're going to have Microsoft. A lot of people have a lot going on and AWS. So, maybe we're not there at the world where you can actually use provision across Clouds, the same workload, It would be nice to have that someday if it was seamless. But, I think that's might be the nirvana. But at the end of the day, an enterprise might have Office 365 and some Azure. But, I got some mostly Amazon over here I'm doing a lot of development on and doing a DevOps, and I'm on-prim. How do you talk to that? Because that's like you got to backup Office 365, you got to do the on-prim thing, you got to do the Amazon thing. How do you guys solve that problem? What's the conversation? >> Absolutely. I think over time we believe best of breed will win. So, people will deploy different type of cloud for different workloads. Pete's has hosted IaaS or platform like PaaS. When they do that, when they host multiple services, softwares to deploy services. I think its hard to control where the data will go. What we can orchestrate or anybody can orchestrate is the centralizing the data management part of it. So, Druva has the best pusher, has the best coverage across multiple heterogeneous Cloud breed. You know. Services like Office 365, Box, or Saleforce or B platforms like S3 or Dynono DB through our product called Apollo or hosted platforms like what Live Nation is using through our Phoenix product line. So getting the breadth of coverage, consistency of policies on a single platform is what will make enterprises adopt what's best out there without worrying about how you build abstraction for data management. >> Jake, what's the biggest thing you see people who are moving to the Cloud for the first time? What are they struggling with? Is it the idea that there's no perimeter? Is it staff training? I mean what are some of the as people move from Test Dev and or start to put in production the Cloud? What are some of the critical things they should think about? >> Yeah, there are so many of them. But first, really, its just getting buy in, you know, from your technical staff because, you know, in an enterprise environment you bring in a Cloud provider it's very easily framed to hold as if we're just being outsourced right? So, I think getting past that barrier first and really getting through to folks and letting them know that really this is good for you. This is not bad for you. You're going to be learning a new skill, very valuable skill, and you're going to be more effective at your job. So, I think that's the first thing. After that, once you start moving to the Cloud, then, the thing that becomes apparent very quickly is cost control. So, you know, the thing with public Cloud is you know, before you had this really kind of narrow range of what IT could cost. Now with the traditional data center, now we have this huge range. And yes, it can be cheaper than it was before. But, it can also be far more expensive than it was before. >> So, service is sprawled or just not paying attention? Both? >> Well, you essentially you're giving your engineers a blank check. So, you need to have some governance and, you know, you really need to think about things that you didn't have to think about before. You're paying for consumption. So, you really have to watch your consumption. >> So, take me thorough the mental model of D duplication in the Cloud. Because I'm trying to like visualize it or grok it a little bit. Okay, so, the Cloud is out there, data's everywhere. And do I move the compute to the data? How does the backup and recovery and data management work? And does D Doup change with Cloud? Because some people think I got my D Doup already and I'm on premise. I've been doing these old solutions. How does D Doup specifically change in the Cloud or does it? >> I know scale changes. You're looking at, you know, the best D Doup systems, if you look historically, you know, were 100 terabyte, 200 terabyte, Dedup indexes, data domain. The scale changes, you know, customers expect massive scale in Cloud. Our largest customer had 10 perabyte in a single Dedup index. It's 100x scale difference compared to what traditional systems could do. Number two, you could create a quality of service which is not really bound by a fixed, you know, algorithm like variable lent or whatever. So, you can optimize a Dedup very clearly for the right workload. The right Dedup for the right workload. So, you may Dedup off of 365 differently than your VMware instances, compared to your Oracle databases or your Endpoint workload. So, it helps you that as a service business model helps you create a custom, tailored solution for the right data. And bring the scale. We don't have the complexity of scale. But, to get the benefit of scale. All, you know, simply managing the cloud. >> Jake, what's it like working with Druve? What's the benefit that they bring to you guys? >> Yeah, so, specifically around backups for our enterprise systems, you know, that's a difficult challenge to solve natively in the Cloud. Especially if you're going to be limited to using Cloud native tools. So, it's really it's a really perfect use case for a third party provider. You know, people don't think about this much but in the old days, in the data center, you know, our backups went offsite into a vault. They were on tapes. It was very difficult for us to lose those or for them to be erased accidentally or even intentionally. Once you go into the Cloud, especially if you're all in with the Cloud, like we are. Everything is easier. And so, accidents are easier also. You know, deleting your data is easier. So, you know, what we really want and what a lot of enterprises want. >> And security too is a potential >> Absolutely, yeah. And so, what we want is we want to get some of that benefit, you know, back that we had from that inefficiency that we had beforehand. We love all the benefits of the Cloud. But, we want to have our data protected also. So, this is a great role for a company like Druva to come in and offer a product like Phoenix and say, you know, we're going to handle we're going to handle your backups for you essentially. So, you're going to put it in a safe place. We're going to secure it for you. And we're going to make sure it's secure for you. And doing it software is a service like Druva does with Phoenix. I think is the absolute right way to go. It's exactly what you need. >> Well, congratulations Jake Burns, Vice President in Cloud services. >> Thank you. >> At Live Nation entertainment. Jaspreet Singh, CEO of Druva, great to have you on. Congratulations on your success. >> Thank you. >> Inside the tornado called Cloud computing. A lot more stuff coming. More CUBE coverage coming up after this short break. Be right back. (electronic music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media, Welcome to theCUBE, so what's going on with Cloud? So, we went all in with AWS, What's the key thing that you guys are doing and services deployed either at the edge or in the Cloud. you guys are backing up? So, you know, we have our financial systems And how would you talk to someone who says to today, you know, in the Cloud. Is the consumption on your end different on the things that are important to the business. Jaspreet, you guys have been very successful So, it improves the pusher of the service being offered that, you know, doesn't know the old IT baggage. to not have, you know, the risk And in that way, you can get the benefit To provide some abstraction layer. and you put that data into a second Cloud provider. And that's where you're looking for differentiation. Solve the problem. And that's why we're doing this. Because that's like you got to backup So, Druva has the best pusher, So, you know, the thing with public Cloud is So, you really have to watch your consumption. And do I move the compute to the data? the best D Doup systems, if you look historically, So, you know, what we really want to get some of that benefit, you know, back in Cloud services. Jaspreet Singh, CEO of Druva, great to have you on. Inside the tornado called Cloud computing.

ENTITIES

Entity	Category	Confidence
Jake Burns	PERSON	0.99+
Jaspreet Singh	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Live Nation Entertainment	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
US	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Jake	PERSON	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
100x	QUANTITY	0.99+
three	QUANTITY	0.99+
San Jose	LOCATION	0.99+
One	QUANTITY	0.99+
Jaspreet	PERSON	0.99+
Office 365	TITLE	0.99+
one	QUANTITY	0.99+
Live Nation	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Druva	ORGANIZATION	0.99+
200 terabyte	QUANTITY	0.99+
first	QUANTITY	0.99+
120 applications	QUANTITY	0.99+
Both	QUANTITY	0.99+
100%	QUANTITY	0.99+
100 terabyte	QUANTITY	0.99+
second	QUANTITY	0.99+
Phoenix	ORGANIZATION	0.99+
two scenarios	QUANTITY	0.99+
late 2015	DATE	0.98+
six months	QUANTITY	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
Ticketmaster	ORGANIZATION	0.98+
2016	DATE	0.98+
10 perabyte	QUANTITY	0.98+
two great guests	QUANTITY	0.97+
S3	TITLE	0.97+
Cloud	TITLE	0.97+
one vendor	QUANTITY	0.97+
GDPR	TITLE	0.97+
single platform	QUANTITY	0.96+
Oracle	ORGANIZATION	0.96+
Big Data SV	ORGANIZATION	0.96+
Azure	TITLE	0.95+
365	QUANTITY	0.95+
today	DATE	0.94+
20 different things	QUANTITY	0.94+
Big Data Silicon Valley	ORGANIZATION	0.94+
Druva Phoenix	TITLE	0.93+
Druva	TITLE	0.93+
one place	QUANTITY	0.93+
Cloud Services	ORGANIZATION	0.92+
more than one Cloud	QUANTITY	0.91+
two tech teams	QUANTITY	0.91+
first thing	QUANTITY	0.89+
DB	TITLE	0.89+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2014	DATE	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Mertic	PERSON	0.99+
Mike Olson	PERSON	0.99+
Shaun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Shaun Connolly	PERSON	0.99+
Centric	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2016	DATE	0.99+
4.1 billion	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
two	QUANTITY	0.99+
100 million	QUANTITY	0.99+
five	QUANTITY	0.99+
2011	DATE	0.99+
Mount Fuji	LOCATION	0.99+
US	LOCATION	0.99+
seven	QUANTITY	0.99+
185 million	QUANTITY	0.99+
eight years	QUANTITY	0.99+
four years	QUANTITY	0.99+
10x	QUANTITY	0.99+
Dahl Jeppe	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
one	QUANTITY	0.99+
MuleSoft	ORGANIZATION	0.99+
2025	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
2012	DATE	0.99+
Munich, Germany	LOCATION	0.98+
Hadoop	TITLE	0.98+
DataWorks 2017	EVENT	0.98+
Wei Wang	PERSON	0.98+
Wei	PERSON	0.98+
10%	QUANTITY	0.98+
eight years	QUANTITY	0.98+
20x	QUANTITY	0.98+
Hortonworks Hadoop Summit	EVENT	0.98+
end of 2016	DATE	0.98+
three billion dollars	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.97+

Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE

>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.

Published Date : Mar 15 2017

SUMMARY :

it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Jeff Rick	PERSON	0.99+
George	PERSON	0.99+
Jeff	PERSON	0.99+
Fred Rice	PERSON	0.99+
Mike Dusenberry	PERSON	0.99+
IBM	ORGANIZATION	0.99+
2017	DATE	0.99+
San Francisco	LOCATION	0.99+
John Furrier	PERSON	0.99+
San Jose	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
505 Howard Street	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Frederick Reiss	PERSON	0.99+
Spark Technology Center	ORGANIZATION	0.99+
Fred	PERSON	0.99+
IBM Spark Technology Center	ORGANIZATION	0.99+
one	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Spark 2.2	TITLE	0.99+
three hours	QUANTITY	0.99+
Watson	ORGANIZATION	0.99+
UC Berkeley	ORGANIZATION	0.99+
one server	QUANTITY	0.99+
Spark	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Python	TITLE	0.99+
each server	QUANTITY	0.99+
both	QUANTITY	0.99+
each	QUANTITY	0.99+
each database	QUANTITY	0.98+
Big Data Week	EVENT	0.98+
Pagoda Lounge	LOCATION	0.98+
Strata Hadoob Conference	EVENT	0.98+
System ML	TITLE	0.98+
Big Data SV	EVENT	0.97+
each API	QUANTITY	0.97+
ML Live	TITLE	0.96+
today	DATE	0.96+
Thomas Vehicles	ORGANIZATION	0.96+
Apache System ML	TITLE	0.95+
Big Data	EVENT	0.95+
Apache Spark	TITLE	0.94+
Watson for Marketing	ORGANIZATION	0.94+
Sparking Water	TITLE	0.94+
first	QUANTITY	0.94+
one front	QUANTITY	0.94+
Big Data SV 2016	EVENT	0.94+
IBM Spark Technology Center	ORGANIZATION	0.94+
about 25 designers	QUANTITY	0.93+

Wikibon Big Data Market Update pt. 2 - Spark Summit East 2017 - #SparkSummit - #theCUBE

(lively music) >> [Announcer] Live from Boston, Massachusetts, this is the Cube, covering Sparks Summit East 2017. Brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Sparks Summit in Boston, everybody. This is the Cube, the worldwide leader in live tech coverage. We've been here two days, wall-to-wall coverage of Sparks Summit. George Gilbert, my cohost this week, and I are going to review part two of the Wikibon Big Data Forecast. Now, it's very preliminary. We're only going to show you a small subset of what we're doing here. And so, well, let me just set it up. So, these are preliminary estimates, and we're going to look at different ways to triangulate the market. So, at Wikibon, what we try to do is focus on disruptive markets, and try to forecast those over the long term. What we try to do is identify where the traditional market research estimates really, we feel, might be missing some of the big trends. So, we're trying to figure out, what's the impact, for example, of real time. And, what's the impact of this new workload that we've been talking about around continuous streaming. So, we're beginning to put together ways to triangulate that, and we're going to show you, give you a glimpse today of what we're doing. So, if you bring up the first slide, we showed this yesterday in part one. This is our last year's big data forecast. And, what we're going to do today, is we're going to focus in on that line, that S-curve. That really represents the real time component of the market. The Spark would be in there. The Streaming analytics would be in there. Add some color to that, George, if you would. >> [George] Okay, for 60 years, since the dawn of computing, we have two ways of interacting with computers. You put your punch cards in, or whatever else and you come back and you get your answer later. That's batch. Then, starting in the early 60's, we had interactive, where you're at a terminal. And then, the big revolution in the 80's was you had a PC, but you still were either interactive either with terminal or batch, typically for reporting and things like that. What's happening is the rise of a new interaction mode. Which is continuous processing. Streaming is one way of looking at it but it might be more effective to call it continuous processing because you're not going to get rid of batch or interactive but your apps are going to have a little of each. So, what we're trying to do, since this is early, early in its life cycle, we're going to try and look at that streaming component from a couple of different angles. >> Okay, as I say, that's represented by this Ogive curve, or the S-curve. On the next slide, we're at the beginning when you think about these continuous workloads. We're at the early part of that S-curve, and of course, most of you or many of you know how the S-curve works. It's slow, slow, slow. For a lot of effort, you don't get much in return. Then you hit the steep part of that S-curve. And that's really when things start to take off. So, the challenge is, things are complex right now. That's really what this slide shows. And Spark is designed, really, to reduce some of that complexity. We've heard a lot about that, but take us through this. Look at this data flow from ingest, to explore, to process, to serve. We talked a lot about that yesterday, but this underscores the complexity in the marketplace. >> [George] Right, and while we're just looking mostly at numbers today, the point of the forecast is to estimate when the barriers, representing complexities, start to fall. And then, when we can put all these pieces together, in just explore, process, serve. When that becomes an end-to-end pipeline. When you can start taking the data in on one end, get a scientist to turn it into a model, inject it into an application, and that process becomes automated. That's when it's mature enough for the knee in the curve to start. >> And that's when we think the market's going to explode. But now so, how do you bound this. Okay, when we do forecasts, we always try to bound things. Because if they're not bounded, then you get no foundation. So, if you look at the next slide, we're trying to get a sense of real-time analytics. How big can it actually get? That's what this slide is really trying to-- >> [George] So this one was one firm's take on real-time analytics, where by 2027, they see it peaking just under-- >> [Dave] When you say one firm, you mean somebody from the technology district? >> [George] Publicly available data. And we take it as as a, since they didn't have a lot of assumptions published, we took it as, okay one data point. And then, we're going to come at it with some bottoms-up end top-down data points, and compare. >> [Dave] Okay, so the next slide we want to drill into the DBMS market and when you think about DBMS, you think about the traditional RDBMS and what we know, or the Oracle, SQL Server, IBMDB2's, etc. And then, you have this emergent NewSQL, and noSQL entrance, which are, obviously, we talked today to a number of folks. The number of suppliers is exploding. The revenue's still relatively small. Certainly small relative to the RDBMS marketplace. But, take us through what your expectations is here, and what some of the assumptions are behind this. >> [George] Okay, so the first thing to understand is the DBMS market, overall, is about $40 billion of which 30 billion goes to online transaction processing supporting real operational apps. 10 billion goes to Orlap or business intelligence type stuff. The Orlap one is shrinking materially. The online transaction processing one, new sales is shrinking materially but there's a huge maintenance stream. >> [Dave] Yeah which companies like Oracle and IBM and Microsoft are living off of that trying to fund new development. >> We modeled that declining gently and beginning to accelerate more going out into the latter years of the tenure period. >> What's driving that decline? Obviously, you've got the big sucking sound of a dup in part, is driving that. But really, increasingly it's people shifting their resources to some of these new emergent applications and workloads and new types of databases to support them right? But these are still, those new databases, you can see here, the NewSQL and noSQL still, relatively, small. A lot of it's open source. But then it starts to take off. What's your assumption there? >> So here, what's going on is, if you look at dollars today, it's, actually, interesting. If you take the noSQL databases, you take DynamoDB, you take Cassandra, Hadoop, HBase, Couchbase, Mongo, Kudu and you add all those up, it's about, with DynamoDB, it's, probably, about 1.55 billion out of a $40 billion market today. >> [Dave] Okay but it's starting to get meaningful. We were approaching two billion. >> But where it's meaningful is the unit share. If that were translated into Oracle pricing. The market would be much, much bigger. So the point it. >> Ten X? >> At least, at least. >> Okay, so in terms of work being done. If there's a measure of work being done. >> [George] We're looking at dollars here. >> Operations per second or etcetera, it would be enormous. >> Yes, but that's reflective of the fact that the data volumes are exploding but the prices are dropping precipitously. >> So do you have a metric to demonstrate that. We're, obviously, not going to show it today but. >> [George] Yes. >> Okay great, so-- >> On the business intelligence side, without naming names, the data warehouse appliance vendors are charging anywhere from 25,000 per terabyte up to, when you include running costs, as high as 100,000 a terabyte. That their customers are estimating. That's not the selling cost but that's the cost of ownership per terabyte. Whereas, if you look at, let's say Hadoop, which is comparable for the off loading some of the data warehouse work loads. That's down to the 5K per terabyte range. >> Okay great, so you expect that these platforms will have a bigger and bigger impact? What's your pricing assumption? Is prices going to go up or is it just volume's going to go through the roof? >> I'm, actually, expecting pricing. It's difficult because we're going to add more and more functionality. Volumes go up and if you add sufficient functionality, you can maintain pricing. But as volumes go up, typically, prices go down. So it's a matter of how much do these noSQL and NewSQL databases add in terms of functionality and I distinguish between them because NewSQL databases are scaled out version of Oracle or Teradata but they are based on the more open source pricing model. >> Okay and NoSQL, don't forget, stands for not only SQL, not not SQL. >> If you look at the slides, big existing markets never fall off a cliff when they're in the climb. They just slowly fade. And, eventually, that accelerates. But what's interesting here is, the data volumes could explode but the revenue associated with the NoSQL which is the dark gray and the NewSQL which is the blue. Those don't explode. You could take, what's the DBMS cost of supporting YouTube? It would be in the many, many, many billions of dollars. It would support 1/2 of an Oracle itself probably. But it's all open source there so. >> Right, so that's minimizing the opportunity is what you're saying? >> Right. >> You can see the database market is flat, certainly flattish and even declining but you do expect some growth in the out years as part of that evasion, that volume, presumably-- >> And that's the next slide which is where we've seen that growth come from. >> Okay so let's talk about that. So the next slide, again, I should have set this up better. The X-axis year is worldwide dollars and the horizontal axis is time. And we're talking here about these continuous application work loads. This new work load that you talked about earlier. So take us through the three. >> [George] There's three types of workloads that, in large part, are going to be driving most of this revenue. Now, these aren't completely, they are completely comparable to the DBMS market because some of these don't use traditional databases. Or if they do, they're Torry databases and I'll explain that. >> [Dave] Sure but if I look at the IoT Edge, the Cloud and the micro services and streaming, that's a tail wind to the database forecast in the previous slide, is that right? >> [George] It's, actually, interesting but the application and infrastructure telemetry, this is what Splunk pioneered. Which is all the torrents of data coming out of your data center and your applications and you're trying to manage what's going on. That is a database application. And we know Splunk, for 2016, was 400 million. In software revenue Hadoop was 750 million. And the various other management vendors, New Relic, AppDynamics, start ups and 5% of Azure and AWS revenue. If you add all that up, it comes out to $1.7 billion for 2016. And so, we can put a growth rate on that. And we talked to several vendors to say, okay, how much will that work load be compared to IoT Edge Cloud. And the IoT Edge Cloud is the smart devices at the Edge and the analytics are in the fog but not counting the database revenue up in the Cloud. So it's everything surrounding the Cloud. And that, actually, if you look out five years, that's, maybe, 20% larger than the app and infrastructure telemetry but growing much, much faster. Then the third one where you were talking about was this a tail wind to the database. Micro server systems streaming are very different ways of building applications from what we do now. Now, people build their logic for the application and everyone then, stores their data in this centralized external database. In micro services, you build a little piece of the app and whatever data you need, you store within that little piece of the app. And so the database requirements are, rather, primitive. And so that piece will not drive a lot of database revenue. >> So if you could go back to the previous slide, Patrick. What's driving database growth in the out years? Why wouldn't database continue to get eaten away and decline? >> [George] In broad terms, the overall database market, it staying flat. Because as prices collapse but the data volumes go up. >> [Dave] But there's an assumption in here that the NoSQL space, actually, grows in the out years. What's driving that growth? >> [George] Both the NoSQL and the NewSQL. The NoSQL, probably, is best serving capturing the IoT data because you don't need lots of fancy query capabilities for concurrency. >> [Dave] So it is a tail wind in a sense in that-- >> [George] IoT but that's different. >> [Dave] Yeah sure but you've got the overall market growing. And that's because the new stuff, NewSQL and NoSQL is growing faster than the decline of the old stuff. And it's not in the 2020 to 2022 time frame. It's not enough to offset that decline. And then they have it start growing again. You're saying that's going to be driven by IoT and other Edge use cases? >> Yes, IoT Edge and the NewSQL, actually, is where when they mature, you start to substitute them for the traditional operational apps. For people who want to write database apps not who want to write micro service based apps. >> Okay, alright good. Thank you, George, for setting it up for us. Now, we're going to be at Big Data SV in mid March? Is that right? Middle of March. And George is going to be releasing the actual final forecast there. We do it every year. We use Spark Summit to look at our preliminary numbers, some of the Spark related forecasts like continuous work loads. And then we harden those forecasts going into Big Data SV. We publish our big data report like we've done for the past, five, six, seven years. So check us out at Big Data SV. We do that in conjunction with the Strada events. So we'll be there again this year at the Fairmont Hotel. We got a bunch of stuff going on all week there. Some really good programs going on. So check out siliconangle.tv for all that action. Check out Wikibon.com. Look for new research coming out. You're going to be publishing this quarter, correct? And of course, check out siliconangle.com for all the news. And, really, we appreciate everybody watching. George, been a pleasure co-hosting with you. As always, really enjoyable. >> Alright, thanks Dave. >> Alright, to that's a rap from Sparks. We're going to try to get out of here, hit the snow storm and work our way home. Thanks everybody for watching. A great job everyone here. Seth, Ava, Patrick and Alex. And thanks to our audience. This is the Cube. We're out, see you next time. (lively music)

Published Date : Feb 9 2017

SUMMARY :

Brought to you by Databricks. of the Wikibon Big Data Forecast. What's happening is the rise of a new interaction mode. On the next slide, we're at the beginning for the knee in the curve to start. So, if you look at the next slide, And then, we're going to come at it with some bottoms-up [Dave] Okay, so the next slide we want to drill into the [George] Okay, so the first thing to understand and IBM and Microsoft are living off of that going out into the latter years of the tenure period. you can see here, the NewSQL and you add all those up, [Dave] Okay but it's starting to get meaningful. So the point it. Okay, so in terms of work being done. it would be enormous. that the data volumes are exploding So do you have a metric to demonstrate that. some of the data warehouse work loads. the more open source pricing model. Okay and NoSQL, don't forget, but the revenue associated with the NoSQL And that's the next slide which is where and the horizontal axis is time. in large part, are going to be driving of the app and whatever data you need, What's driving database growth in the out years? the data volumes go up. that the NoSQL space, actually, grows is best serving capturing the IoT data because And it's not in the 2020 to 2022 time frame. and the NewSQL, actually, And George is going to be releasing This is the Cube.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Patrick	PERSON	0.99+
George	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Seth	PERSON	0.99+
30 billion	QUANTITY	0.99+
Alex	PERSON	0.99+
two billion	QUANTITY	0.99+
2016	DATE	0.99+
$40 billion	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2027	DATE	0.99+
20%	QUANTITY	0.99+
five years	QUANTITY	0.99+
New Relic	ORGANIZATION	0.99+
Orlap	ORGANIZATION	0.99+
$1.7 billion	QUANTITY	0.99+
10 billion	QUANTITY	0.99+
2020	DATE	0.99+
Boston	LOCATION	0.99+
Ava	PERSON	0.99+
mid March	DATE	0.99+
third one	QUANTITY	0.99+
last year	DATE	0.99+
AppDynamics	ORGANIZATION	0.99+
2022	DATE	0.99+
yesterday	DATE	0.99+
Wikibon	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
two days	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
400 million	QUANTITY	0.99+
750 million	QUANTITY	0.99+
YouTube	ORGANIZATION	0.99+
today	DATE	0.99+
5%	QUANTITY	0.99+
Middle of March	DATE	0.99+
Sparks Summit	EVENT	0.99+
first slide	QUANTITY	0.99+
three	QUANTITY	0.99+
two ways	QUANTITY	0.98+
Boston, Massachusetts	LOCATION	0.98+
early 60's	DATE	0.98+
about $40 billion	QUANTITY	0.98+
one firm	QUANTITY	0.98+
this year	DATE	0.98+
Ten X	QUANTITY	0.98+
Spark Summit	EVENT	0.97+
25,000 per terabyte	QUANTITY	0.97+
80's	DATE	0.97+
Databricks	ORGANIZATION	0.97+
DynamoDB	TITLE	0.97+
three types	QUANTITY	0.97+
Both	QUANTITY	0.96+
Sparks Summit East 2017	EVENT	0.96+
Spark Summit East 2017	EVENT	0.96+
this week	DATE	0.95+
Spark	TITLE	0.95+

Rob Bearden, Hortonworks - Executive On-the-Ground #theCUBE

>> Voiceover: On the Ground, presented by The Cube. Here's your host John Furrier. (techno music) >> Hello, everyone. Welcome to a special On the Ground executive interview with Rob Bearden, the CEO of Hortonworks. I'm John Furrier with The Cube. Rob, welcome to this On the Ground. >> Thank you. >> So I got to ask you, you're five years old this year, your company Hortonworks in June, have Hadoop Summit coming up, what a magical run. You guys went public. Give us a quick update on Hortonworks and what's going on. The five-year birthday, any special plans? >> Well, we're going to actually host the 10-year birthday party of Hadoop, which is you know, started at Yahoo! and open-source community. So everyone's invited. Hopefully you'll be able to make it as well. We've accomplished a lot in the last five years. We've grown to over 1000 employees, over 900 customers. This year is our first full year of being a public company, and the street has us at $265 million dollars in billings. So tremendous progress has happened and we've seen the entire data architecture begin to re-platform around Hadoop now. >> CEOs across the globe are facing profound challenges, data, cloud, mobile, obviously this digital transformation. What are you seeing our there as you talk to your customers? >> Well they view that the digital transformation is a massive opportunity for value creation, for that enterprise. And they realize that they can really shift their business models from being very reactive post-transaction to actually being able to consolidate all of the new paradigm data with the existing transaction data and actually get to a very pro-active model pre-transaction. And so they understand their customer's patterns. They understand the kinds of things that their customers want to buy before they ever engage in the procurement process. And they can make better and more compelling offers at better price points and be able to serve their customers better, and that's really the transformation that's happening and they realize the value of that creation between them and their customer. >> And one of the exciting things about The Cube is we go to all these different industry events and you were speaking last week at an event where data is at the center of the value proposition around digital transformation, and that's really been the key trend that we've been seeing consistently, that buzz word digital transformation. What does that mean to you? Because this is coming up over and over again around this digital platform, digital weathers, digital media or digital engagement. It's all around data. What's your thoughts and what is from your perspective digital transformation? >> Well, it's about being able to derive value from your data and be able to take that value back to your customers under your supply chain, and to be able to create a completely new engagement with how you're managing your interaction with your customers and your supply chain from the data that they're generating and the data that you have about them. >> When you talk to CEOs and people in the business out in the field, how much of this digital transformation do you see as real in terms of progress, real progress? In terms of total transitions, or is it just being talked about now? What's your progress bar meter? How would you peg this trend? >> I would say we're at four and I believe we'll be at six by the end of 2016. And it's one of the biggest movements I've seen since the '90s and ERP, because it's so transformational into the business model by being able to transform the data that we have about our collective entity and our collective customer and collective supply chain, and be able to apply predictive and real-time interactions against that data as events and occurrences are happening, and to be able to quickly offer products and services, and the velocity that that creates to modernization and the value creation back is at a pace that's never been able to happen. And they've really understood the importance of doing that or being disintermediated in their existing spaces. >> You mention ERP, it kind of shows our age, but I'll ask the question. Back in the '90s ERP, CRM, these were processes that were well known, that people automated with technology which was at that time unknown. You got a riser-client server technology, local area networking, TCP IP was emerging, so you got some unknown technology stuff happening, but known processes that were being automated and hence saw that boom. Now you mention today, it's interesting because Peter Burris at Wikibon's thesis says today the processes are unknown and the technology's known, so there's now a new dynamic. It's almost flipped upside-down where this digital transformation is exact opposite. IoT is a great use case where all these unknown things are coming into the enterprise that are value opportunities. Get the technology knows, so now the challenge is how to use technology, to deploy it, and be agile to capture and automate these future and/or real-time unknown processes. Your thoughts on that premise. >> The answers are buried in the data, is the great news, and so the technology as you said is there, and you have these new, unknown processes through Internet of Things, the new paradigm data sets with sensors and clickstream and mobile data. And the good news is they generate the data and we can apply technology to the data through AI and machine learning to really make sure that we understand how to transform the value out of that, out of those data sets. >> So how does IT deal with this? 'Cause going back 30 years IT was a clear line of sight, again, automating those known processes. Now you have unknown opportunities, but you have to be in a position for that. Call that cloud, call that DevOps, call that data driven, whatever the metaphor is. People are being agile, be ready for it. How is that different now and what is the future of data in that paradigm? And how does a customer come to grips and rationalize this notion of I need a clear line of sight of the value, not knowing what the processes is about data. What should they be doing? >> Well, we don't know the processes necessarily, per se, but we do know what the data is telling us because we can bring all that data under management. We can apply the right kind of algorithms, the right kind of tools on it, to give us the outcomes that we want and have the ability to monetize and unlock that value very quickly. >> Hortonworks architecture is kind of designed now at the last Hadoop Summit in Dublin. We heard about the platform. Your architecture's going beyond Hadoop, and it says Hadoop Summit and Hadoop was the key to big data. Going beyond Hadoop means other things. What does that mean for the customer? Because now they're seeing these challenges. How does Hortonworks describe that and what value do you bring to those customers? >> Big data was about data at rest and being able to drive the transformation that it has, being able to consolidate all the transactional platforms into central data architecture. Being able to bring all the new paradigm data sets to the mobile, the clickstream, the IoT data, and bring that together and be able to really transition from being reactive post-transaction to be able to be predictive and interactive pre-transaction. And that's a very, very powerful value proposition and you create a lot of value doing that, but what's really learned through that process is in the digital transformation journey, that actually the further upstream that we can get to engaging with the data, even if we can get to it at the point of origination at the furthest edge, at the point of center, at the actual time of clickstream and we can engage with that data as those events and occurrences are happening and we can process against those events as their happening, it creates higher levels of value. So from the Hortonworks platform we have the ability to manage data at rest with Hadoop, as well as data in motion with the Hortonworks data flow platform. And our view is that we must be able to engage with all the data all the time. And so we bring the platforms to bring data under management from the point of origination all the way through as it's in motion, and to the point it comes at rest and be able to aggregate those interactions through the entire process. >> It's interesting, you mention real-time, and one of the ideas of Hadoop was it was always going to be a data warehouse killer, 'cause it makes a lot of sense. You can store the data. It's unstructured data and you can blend in structured on top of that and build on top of that. Has that happened? And does real-time kind of change that equation? Because there's still a role for a data warehouse. If someone has an investment are they being modernized? Clear that up for me because I just can't kind of rationalize that yet. Data warehouses are old, the older ones, but they're not going away any time soon from what we're hearing. Your thoughts as Hadoop as the data warehouse killer. >> Yeah, well, our strategy from day one has never been to go in and disintermediate any of the existing platforms or any of the existing applications or services. In fact, to the contrary. What we wanted to do and have done from day one is be able to leverage Hadoop as an extension of those data platforms. The DW architecture has limitations to it in terms of how much data pragmatically and economically is really viable to go into the data warehouse. And so our model says let's bring more data under management as an extension to the existing data warehouses and give the existing data warehouses the ability to have a more holistic view of data. Now I think the next generation of evolution is happening right now and the enterprise is saying that's great. We're able to get more value longer from our existing data warehouse and tools investment by bringing more data under management, leveraging a combined architecture of Hadoop and data warehouse. But now they're trying to redefine really what does the data warehouse of the future look like, and it's really about how we make decisions, right? And at what point do we make decisions because in the world of DW today it assumes that data's aggregated post-transaction, right? In the new world of data architecture that's across the IT landscape, it says we want to engage with data from the point it's originated, and we want to be able to process and make decisions as events and as occurrences and as opportunities arise before that transaction potentially ever happens. And so the data warehouse of the future is much different in terms of how and when a decision's made and when that data's processed. And in many cases it's pre-transaction versus post-transaction. >> Well also I would just add, and I want to get your thoughts on this, real-time, 'cause now in the moment at the transaction we now have cloud resources and potentially other resources that could become available. Why even go to the data warehouses? So how has real-time changed the game? 'Cause data in motion kind of implies real-time whether it's IoT or some sort of bank transaction or something else. How has real-time changed the game? >> Well, it's at what point can we engage with the customer, but what it really has established is the data has to be able to be processed whether it be on Prim, in the cloud, or in a hybrid architecture. And we can't be constrained by where the data's processed. We need to be able to take the processing to the data versus having to wait for the data to come to the processing. And I think that's the very powerful part of cloud, the on Prim, and software to find networking, and when you bring all of those platforms together, you get the ability to have a very powerful and elastic processing capability at any point in the life cycle of the data. And we've never been able to put all those pieces together on an economically viable model. >> So I got to ask you, you guys are five years old in June, Hadoop's only 10 years old. Still young, still kind of in the early days, but yet you guys are public company. How are you guys looking at the growth strategy for you guys? 'Cause the trend is for people to go private. You guys went public. You're out in the open. Certainly your competitor Cloud ARIS is private, but people can get that they're kind of behind the curtain. Some say public with a $3 billion dollar graduation, but for the most part you're public. So the question is how are you guys going to sustain the growth? What is the growth strategy? What's your innovation strategy? >> Well if you look at the companies that are going private, those are the companies that are the older platforms, the older technologies, in a very mature market that have not been able to innovate those core platforms and they sort of reached their maturity cycle, and I think going private gives them the ability to do that innovation, maybe change their licensing model, the subscription, and make some of the transformations they need to make. I have no doubt they'll be very successful doing that. Our situation's much different. As the modern IT landscape is re-architecting itself almost across every layer. If you look at what's happening in the networking layer going to SDN. Certainly in our space with data and it's moving away from just transactional siloed environments to central data architectures and next generation data platforms. And being able to go all the way out to the edge and bring data under management through the entire movement cycle. We're in a market that we're able to innovate rapidly. Not only in terms of the architecture of the data platform being able to bring batch, real-time applications together simultaneously on a central data set and consolidate all of the data, but also then be able to move out and do the data in motion and be able to control an entire life cycle. There's a tremendous amount of innovation that's going to happen there, and these are significant growth markets. Both the data in motion and the data at rest market. The data at rest market's a $50 billion dollar marketplace. The data in motion market is a $1 trillion dollar TAM. So when you look at the massive opportunity to create value in these high growth markets, in the ability to innovate and create the next generation data platforms, there's a lot of room for growth and a lot of room for scale. And that's exactly why you should be public when you're going though these large growth markets in a space that's re-platforming, because the CIO wants to understand and have transparent visibility into their platform partners. They want to know how you're doing. Are you executing the plan? Or are you hiding behind a facade of one perception or another. >> Or pivoting or some sort of re-architecture. >> Right, so I think it's very appropriate in a high growth, high innovation market where the IT platforms are going through a re-architecture that you actually are public going through that growth phase. Now it forces discipline around how you operationalize the business and how you run the business, but I think that's very healthy for both the tech and the company. >> Michael Dell told me he wanted to go private mainly because he had to do some work essentially behind the curtain. Didn't want the 90-day shot clock, the demands of Wall Street. Other companies do it because the can't stand alone. They don't have a platform and they're constantly pivoting internally to try to grope and find that groove swing, if you will. You're saying that you guys have your groove swing and as Dave Velanti always says, always get behind a growing total adjustment market or TAM, you saying that. Okay, I buy that. So the TAM's growing. What are you guys doing on the platform side that's enabling your customers to re-platform and take advantage of their current data situation as well as the upcoming IoT boom that's being forecasted? >> Well, the first thing is the genesis of which we started the company around, which is we transformed Hadoop from being a batch architecture, single data set, single application, to being able to actually manage a central data architecture where all data comes under management and be able to drive and evolve from batch to batch interactive and real-time simultaneously over that central data set. And then making sure that it's truly an enterprise viable, enterprise ready platform to manage mission critical workloads at scale. And those are the areas where we're continuing to innovate around security, around data governance, around life cycle management, the operations and the management consoles. But then we want to expand the markets that we operate in and be world class and best tech on planet Earth for that data at rest and our core Hadoop business. But as we then see the opportunities to go out to the edge and from the point of origination truly manage and bring that data under management through its entire life cycle, through the movement process and create value. And so we want to continue to extend the reach of when we have data under management and the value we bring to the data through its entire life cycle. And then what's next is you have that data in its life cycle. You then move into the modern data applications, and if you look at what we've done with cyber security and some of the offerings that we've engaged in the cyber security space, that was our first entry. And that's proven to be a significant game changer for us and our customers both. >> Cyber security certainly a big data problem. Also a cloud opportunity with the horsepower you can get with computing. Give us the update. What are you seeing there from a traction standpoint? What's some of the level of engagements your having with enterprises outside of the NSA and the big government stuff, which I'm sure they're customers don't have to disclose that, but for the most part a normal enterprise are constantly planning as if they are already attacked and they're having different schemes that they're deploying. How are they using your platform for that right now? >> Well, the nature of attacks has changed. And it's evolved from just trying to find the hole in the firewall or where we get into the gateway, to how we find a way through a back door and just hang out in your network and watch for patterns and watch for the ability to aggregate relationships and then pose as a known entity that you can then cascade in. And in the world of cyber security you have to be able to understand those anomalies and be able to detect those anomalies that sit there and watch for their patterns to change. And as you go through a whole life cycle of data management between a cloud on Prim and a hybrid architecture, it opens up many, many opportunities for the bad guys to get in and have very new schemes. And our cyber security models give the ability to really track how those anomalies are attaching, where the patterns are emerging, and to be able to detect that in real-time and we're seeing the major enterprises shift to these new models, and it's become a very big part of our growth. >> So I got to change gears and ask you about open-source. You've been an open-source really from the beginning, I would call first generation commercial. But it was not a tier one citizen at that time. It was an alternative to other privatery platforms, whether you look at the network stack or certainly from software. Now today it's tier one. Still we hear business people kind of like, well, open-source. Why should a business executive care about opens-source now? And what would you say to that person who's watching about the benefits of open-source and some of the new models that could help them. >> Well, open-source in general's going to give a number of things. One, it's going to probably provide the best tech, the most innovation in a space, whether that be at the network layer or whether that be at the middle wear layer, the tools layer or certainly the data layer. And you're going to see more innovation typically happen on those platforms much faster and you've got transparent visibility into it. And it brings an ecosystem with it and I think that's really one of the fundamental issues that someone should be concerned with is what does the ecosystem around my tech look like? An open-source really draws forward a very big ecosystem in terms of innovators of the tech, but also enablers of the tech and adopters of the tech in terms of incremental applications, incremental tool sets. And what it does and the benefit to the end customer is the best tech, the most innovation, and typically operating models that don't generate lock in for 'em, and it gives them optionality to use the tech in the most appropriate architecture in the best economic model without being locked in to a proprietary path that they end up with no optionality. >> So talk about the do-it-yourself mentality. In IT that's always been frowned upon because it's been expensive, time-consuming, yet now with organic open-source and now with cloud, you saw that first generation do-it-yourself, standing up stuff on Amazon, whatnot, is being very viable. It funded shadow IT and a variety of other great things around virtualization, visualization, and so on. Today we're seeing that same pattern swing back to do-it-yourself, is good for organic innovation but causes some complexities. So I want to get your thoughts on this because this seems to be a common thread on our Cube interviews and at Hadoop Summit and at Big Data SV as part of Big Data Week when we were in town. We heard from customers and we heard the following: It's still complex and the total cost of ownership's still too high. That seems to be the common theme for slowing down the rapid acceleration of Hadoop and its ecosystem in general. One, do you agree with that? And two, if so, or what would be than answer to make that go faster? >> Well, I think you're seeing it accelerate. I think you're seeing the complexities dwindle away through both innovation and the tech and the maturing of the tech, as well as just new tool sets and applications that are leveraging it, that take away any complexity that was there. But what I think has been acknowledged is, the value that it creates and that it's worth the do-it-yourself and bringing together the spare techs because the innovation that it brings, the new architectures and the value that it creates as these platforms move into the different use cases that they're enabling. >> So I got to ask you this question. I know you're not going to like it and all the people always say, well John, why does everyone always ask that same question? You guys have a radically different approach than Cloudera. It's the number one question. I get ask them about Cloudera. Cloudera, ask them about Hortonworks. You guys have been battling. They were first. You guys came right fast followers second. With the Yahoo! thing we've been following you guys since day one. Explain the difference between Cloudera, because now a couple things have changed over the past few years. One is, Hadoop wasn't the be all end all for big data. There's been a lot of other things certainly SPARK and some other stuff happening, but yet now enterprises are adopting and coexisting with other stuff. So we've seen Cloudera make some pivots. They certainly got some good technology, but they've had some good right answers and some wrong answers. How've you guys been managing it because you're now public, so we can see all the numbers. We know what the business is doing. But relative to the industry, how are you guys compared to Cloudera? What's the differences? And what are you guys doing differently that makes Hortonworks a better vendor than Cloudera? >> I can't speak to all the Cloudera models and strategies. What I'll tell you is the foundation of our model and strategy is based on. When we founded the company we were as you mentioned, three of four years post Cloudera's founding. We felt like we needed to evolve Hadoop in terms of the architecture, and we didn't want to adopt the batch-oriented architecture. Instead we took the core Hadoop platform and through YARN enabled it to bring a central data architecture together as well as be able to be generating batch interactive in real-time applications, leveraging YARN as the data operating system for Hadoop. And then the real strategy behind that was to open up the data sets, open up the different types of use cases, be able to do it on a central data architecture. But then as other processing engines emerged, whether it be a SPARK as you brought up or some of the other ones that we see coming down the pipe, we can then integrate those engines through YARN onto the central data platform. And we open up the number of opportunities, and that's the core basis. I think that's different than some of the other competitor's technology architecture. >> Looking back now five years, are there moves that you were going to make that others have made, that you look back and say I'm glad we didn't do that given today's landscape? >> What I'm glad we did do is open up to the most use cases and workloads and data sets as possible through YARN, and that's proven to be a very, very, fundamentally differentiation of our model and strategy for anybody in the Hadoop space certainly. And I'm also very happy that we saw the opportunity about a year ago that it needed to be more than just about data at rest on Hadoop, and that actually to truly be the next generation data architecture, that you've got to be able to provide the platforms for data at rest and data in motion and our acquisition of Onyara, to be able to get the NiFi technology so that we're truly capturing the data from the point of origination all the way through the movement cycle until it comes at rest has given us now the ability to do a complete life cycle management for an entire data supply chain. And those decisions have proven to be very, very differentiation between us and any of our other competitors and it's opened up some very, very big markets. More importantly, it's accelerated the time to value that our customers get in the use cases that they're enabling through us. >> How would you talk about the scenario that people are saying about Hadoop not being the end all be all industry? At the same time, 'cause big data, as Aroon Merkey said on the Keblan Dublin. It's bigger than Hadoop now, but Hadoop has become synonymous with big data generally. Where's the leadership coming from in your mind? Because we're certainly not seeing it on the data warehouse side, 'cause those guys still have the old technology, trying to co-exist and re=platform for the future. So question is, is Hortonworks viewing Hadoop as still leading generically as a big data industry or has it become a sidebar of the big data industry? >> Of Hadoop? Hadoop is the platform, and we believe ground zero for big data. But we believe it's bigger than that. It's about all data and being able to manage the entire life cycle of all data, and that starts from the point of origination, until it comes at rest, and be able to continue to drive that entire life cycle. Hadoop certainly is the underpinning of the platform for big data, but it's really got to be about all data. Data at rest, data in motion, and what you'll see is the next leg in this is, the modern data applications that then emerge from that. >> How has the ecosystem in the Hadoop industry, I would agree with by the way the Hadoop players are leading big data in general in terms of innovation. The ecosystem's been a big part of it. You guys have invested in it. Certainly a lot of developers and open-source. How has the ecosystem changed given the current situation from where it was? And where do you see the ecosystem going? With the re-platforming not everyone can have a platform. There's a ton of guys out there that have tools, that are looking for a home, they're trying to figure out the chessboard on what's going on with the ecosystem. What's your thoughts of the current situation and how it will evolve in your view? >> Well, I think one of the strongest statements from day one is whether it's EDW or BI or relational, none of the traditional platform players say the way you solve your big data problem is with my platform. They to a company have a Hadoop platform strategy of some form to bring all of that huge volume of big data under management, and it fits our model very well in that we're not trying to disintermediate, but extend those platforms by leveraging HDP as an extension of their platform. And what that's done is it's created pool markets. It's brought Hadoop into the enterprise with a very specific value proposition in use case, bringing more data under management for that tool, that application, or that platform. And then the enterprises has realized there's other opportunities beyond that. And new use cases and new data sets, we can also gain more leverage from. And that's what's really accelerated-- >> So you see growth in the ecosystem? >> We're actually seeing exponential acceleration of the growth around the ecosystem. Not only in terms of the existing platform and tools and applications for either adopting Hadoop, but now new start-up companies building completely from scratch applications just for the big data sets. >> Let's talk about STARS. We were talking before we sat down about the challenges being an entrepreneur. You mentioned the exponential acceleration of entrepreneurs coming into the ecosystem. That's a safe harbor right now. It seems to be across the board. And a lot of the big platforms have robust, growing ecosystems. What's the current landscape of STARS? I know you're an active investor yourself and you're involved in a lot of different start-up conversations and advisor. What's your view of the current landscape right now? Series A, B, C, growth. Stalling. What needs to be in place for these companies to be successful? What are some of the things that you're seeing? >> You have to be surgically focused right now or on a very particular problem set, maybe even by industry. And understand how to solve the problem and have an absolute correlation to a value proposition and a very well defined and clear model of how you're going to go solve that problem, monetize it, and scale. Or you have to have an incredibly well-financed and deep war chest to go after a platform play that's going after a very large TAM that is enabling a re-platforming at one of the levels and the new IT landscape. >> So laser focus in a stack or vertical, and/or a huge cash from funded benchmark or other VCs, tier one VCs, to have a differentiator. They have to have some sort of enabler. >> To enable a next generation platform and something that's very transformational as a platform that really evolves the IT stack. >> What strategies would you advise entrepreneurs in terms of either white spaces to attack and/or their orientation to this new data layer? Because if this plays out as we were talking about, you're going to have a horizontal data layer where you need eye dropper ability. Need to have data in motion, but data aware. Smart data you integrate into disparate systems. Breaking down the siloed concept. How should an entrepreneur develop or look at that? Is there a certain model you've seen work successfully? Is there a certain open-source group they can jump into? What thoughts would you share? 'Cause this seems to be the toughest nut to crack for entrepreneurs. >> Right now you're seeing a massive shift in the IT data architecture, is one example. You're seeing another massive shift in the network architecture. For example, the SDN, right? You're seeing I think a big shift in the kinds of applications getting away from application functionality to data enabled applications. And I think it's important for the entrepreneur to understand where in the landscape do they really want to position? Where do they bring intellectual capital that can be monetized? Some of the areas that I think you'll see emerge very quickly in the next four, six, eight quarters are the new optimization engines, and so things around AI and machine learning. And now that we have all of the data under management through its entire life cycle, how do I now optimize both where that data's processed, in the cloud or on Prim, or as it's in motion. And there's a massive opportunity through software defined networking to actually come in and now optimize at the purest price point and/or efficiency where that data's managed, where that data's stored, and let it continue to reap the benefits. Just as Amazon's done in retail, if you like this, you should look at that. Just as Yahoo! did, I'll point out with Hadoop, it's advertising models and strategies of being able to put specific content in front of you. Those kinds of opportunities are now available for the processing and storage of data through the entire life cycle across any architectural strategy. >> Are you seeing data from a developer's standpoint being instrumental in their use cases? Meaning as I'm developing on top a data platforms like Hortonworks or others, where there's disparate data, what's their interaction? What's their relationship to the data? How are they using it? What do they need to know? Where's the line in terms of their involvement in the data? >> Well, what we're seeing is very big movement with the developed community that they now want to be able to just let the data tell them where the application service needs to be. Because in the new world of data they understand what the entity relationships are with their customers and the patterns that their customers happening. They now can highly optimize when their customers are about to cross over into from one event to the other, and what that typically means and therefore what the inverted action should be to create the best experience with their customer, to create a higher level of service, to be able to create a better packaged price point at a better margin. They also have the ability to understand it in real-time based on what the data trend is flowing, how well their product's performing. Any obstacles or issues that are happening with their product. So they don't want to have to have application logic that then they run a report on three days, three weeks after some events happened. They now are taking the data and as that data and events are happening in the data and it's telling them what to do and they're able to prescriptively act on whatever event or circumstance unfold from that. >> So they want the data now. They want real-time data embedded in the apps as on the front line developer. >> And they want to optimize what that data is doing as it's unfolding through its natural life cycle. >> Let's talk with your customer base and what their expectations are. What questions should a customer or potential customer ask to their big data vendor as they look at the future? What are the key questions they should ask? >> They should really be comparing what is your architectural strategy, first and foremost. For managing data. And what kinds of data can I manage? What are the limitations in your architecture? What workloads and data sets can't I manage? What are the latency issues that your architecture would create for me? What's your business model that's associated with us engaging together? How much of the life cycle can you enable of my data? How secure are you making my data? What kind of long tail of visibility and chain of custody can I have around the governance? What kind of governance standards are you applying to the data? How much of my governance standards can you help me automate? How easy is it to operate and how intuitive is it? How big is your ecosystem? What's your road map and your strategy? What's next in your application stack? >> So enterprises are looking at simplicity. They're looking for total cost of ownership. How is big data innovation going to solve that problem? Because with IoT, again, a lot of new stuff's happening really, really fast. How do they get their arms around this simplicity question in this total cost of ownership? How should they be thinking about it? >> Well, what the Hadoop platforms have to do and the data in motion platforms have to do is to be able to bring the data under management and bring all of the enterprise services that they have in their existing data platforms, in the areas of security, in the areas of management, in the areas of data governance, so they can truly run mission critical workloads at scale with all the same levels of predictability that they have in isolation, in their existing proprietary platforms. And be able to do it in a way that's very intuitive for their existing platforms to be able to access it, very intuitive for their operations teams to be able to manage it, and very clean and easy for their existing tools and platforms investments to leverage it. >> On the industry landscape right now what are you seeing if a consolidation? Some are saying we're seeing some consolidation. Lot of companies going private. You're seeing people buckle down. It's almost a line. If you weren't born before a certain date for the company, you might have the wrong architecture. Certainly enterprises re-platform, I would agree with that, but as a supplier to customers, you're one of the young guys. You were born in the cloud. You were born in open-source, Hortonworks. Not everyone else is like that, and certainly Oracle's like one of the big guys that keep on doing well. IBM's been around. But they're all changing, as well. And certainly a lot of these growth companies pre-IPO are kind of being sold off. What's your take on the current situation with the bubble, the softening, whatever people calling it. What's your thoughts? >> I think you see some companies who got caught up and if we sort of unpack that to the ones who are going private now, those are the companies that have operated in a very mature market space. They were able to not innovate as much as they would probably have liked to, they're probably locked into a proprietary technology in a non-subscription model of some sort. Maybe a perpetual license model. And those are very different models than the enterprise wants to adopt today and their ability to innovate and grow because the market shrank, forced them to go into very constrained environments. And ultimately, they can be great companies. They have great value propositions, but they need to go through transformations that don't include a 90-day shot clock in the public market. In the markets where there's maybe, I was in the B round or the C round and I was focused on providing a niche offering into one of those mature spaces that's becoming disintermediated or evolve quickly because an open-source company has come into the space or that section of IT stack has morphed into more of a cloud-centric or SAP-centric or an open-source centric environment. They got cut short. Their market's gone away. Their market shrunk. They can't innovate their way out of it. And they then ultimately have to find a different approach, and they may or may not be able to get the financing to do that. We're in a much different position. >> Certainly the down round. We're seeing down rounds from the high valuations. That's the first sign of trouble. >> That's the first sign. I've gotten three calls this week from companies that are liquidating and have two weeks to find a new home. >> Great, we'll look for some furniture for our new growing SiliconANGLE office. >> I think you'll have some good values. >> You personally, looking back over five year now in this journey, what an incredible run you guys have had and fun to watch you guys. What's the biggest thing that surprised you and what's the biggest thing that's happened? If you can talk about those two things 'cause again, a lots happened. The markets changed significantly. You guys went public. You got a big office here. What surprised you and what was the biggest thing that you think was the catalyst of the current trajectory? >> How quickly the market grew. We saw from day one when we started the company that this was a billion dollar opportunity, and that was the bar for starting whatever we did. We were looking for new opportunities. We had to see a billion dollar opportunity. How quickly we have seen the growth and the formation of the market in general. And then how quickly some of the new opportunities have opened up, in particular around streaming, Internet of Things, the new paradigm data sets, and how quickly the enterprises have seen the ability to create a next generation data architecture and the aggressiveness in which their moving to do that with Hadoop. And then how quickly in the last year it swung to also being able to want to bring data in motion under management, as well. >> If you could talk to a customer right here, right now, and they asked you the following question, Rob, look around the corner five years out. Tell me something that someone else can't see that you see, that I should be aware of in my business. And why should I go with Hortonworks? >> It's going to be a table stake requirement to be able to understand from whether it be your customer or your supply chain from the point they begin to engage and the first step towards engaging with your product or your service, what they're trying to accomplish, and to be able to interact with them from that first inception point. It's also going to be table stakes to understand to be able to monitor your product in real-time, and be able to understand how well it's performing, down to the component level so that you can make real-time corrections, improvements, and be able to do that on the fly. The other thing that you're going to see is that it's going to be a table stake requirement to be able to aggregate the data that's happened in that life cycle and give your customer the ability to monetize the data about them. But you as the enterprise will be responsible for creating anonymity, confidentiality and security of the data. But you're going to have to be able to provide the data about your customers and give them the ability to if they choose to monetize the data about them, that the ability to do so. >> So I get that correct, you're basically saying 100% digital. >> Oh, it's by far, within the next five years, absolutely. If you do not have a full digital model, in most industries you'll be disintermediated. >> Final question. What's the big bet that you're making right now at Hortonworks? That you say we're pinning the company on blank, fill in the blank. >> It's not about big data. It's about all data under management. >> Rob, thanks so much for spending the time here On the Ground. Rob Bearden, CEO of Hortonworks here for an executive On the Ground. I'm John for The Cube. Thanks for watching. (techno music)

Published Date : Jun 24 2016

SUMMARY :

Voiceover: On the Ground, Welcome to a special On the Ground executive interview So I got to ask you, and the street has us at $265 million dollars in billings. CEOs across the globe are facing profound challenges, and that's really the transformation that's happening and that's really been the key trend and the data that you have about them. and the value creation back is at a pace so now the challenge is how to use technology, and so the technology as you said is there, line of sight of the value, and have the ability to monetize and unlock What does that mean for the customer? the ability to manage data at rest with Hadoop, and one of the ideas of Hadoop was it was And so the data warehouse of the future So how has real-time changed the game? the data has to be able to be processed whether it be So the question is how are you guys going to of the data platform being able to bring batch, for both the tech and the company. So the TAM's growing. and the value we bring to the data What's some of the level of engagements for the bad guys to get in and have very new schemes. and some of the new models that could help them. and adopters of the tech in terms of So talk about the do-it-yourself mentality. and the tech and the maturing of the tech, and all the people always say, and that's the core basis. it's accelerated the time to value that our customers get or has it become a sidebar of the big data industry? and that starts from the point of origination, How has the ecosystem in the Hadoop industry, say the way you solve your big data problem acceleration of the growth around the ecosystem. And a lot of the big platforms have robust, and have an absolute correlation to a value proposition They have to have some sort of enabler. that really evolves the IT stack. 'Cause this seems to be the toughest nut and let it continue to reap the benefits. They also have the ability to understand it as on the front line developer. And they want to optimize what that data is doing What are the key questions they should ask? How much of the life cycle can you How is big data innovation going to solve that problem? and the data in motion platforms have to do and certainly Oracle's like one of the big guys and their ability to innovate and grow We're seeing down rounds from the high valuations. That's the first sign. for our new growing SiliconANGLE office. and fun to watch you guys. have seen the ability to create and they asked you the following question, that the ability to do so. So I get that correct, If you do not have a full digital model, What's the big bet that you're making right now It's about all data under management. for an executive On the Ground.

ENTITIES

Entity	Category	Confidence
Rob Bearden	PERSON	0.99+
Dave Velanti	PERSON	0.99+
Peter Burris	PERSON	0.99+
Rob	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
$3 billion	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
two weeks	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
Aroon Merkey	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
90-day	QUANTITY	0.99+
three days	QUANTITY	0.99+
June	DATE	0.99+
two things	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
first sign	QUANTITY	0.99+
first entry	QUANTITY	0.99+
five years	QUANTITY	0.99+
last week	DATE	0.99+
one	QUANTITY	0.99+
Dublin	LOCATION	0.99+
both	QUANTITY	0.99+
$1 trillion dollar	QUANTITY	0.99+
over 900 customers	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
over 1000 employees	QUANTITY	0.99+
$50 billion dollar	QUANTITY	0.99+
three calls	QUANTITY	0.99+
first	QUANTITY	0.99+
Hadoop	TITLE	0.99+
last year	DATE	0.99+
six	QUANTITY	0.99+
$265 million dollars	QUANTITY	0.98+
Big Data Week	EVENT	0.98+
three weeks	QUANTITY	0.98+
one example	QUANTITY	0.98+
Series A	OTHER	0.98+
Keblan Dublin	ORGANIZATION	0.98+
this week	DATE	0.98+
first step	QUANTITY	0.98+
Hadoop Summit	EVENT	0.98+
Yahoo!	ORGANIZATION	0.98+
Both	QUANTITY	0.97+
first generation	QUANTITY	0.97+
this year	DATE	0.97+
One	QUANTITY	0.97+
four	QUANTITY	0.96+
This year	DATE	0.96+
Today	DATE	0.96+
10-year birthday	QUANTITY	0.96+
Hadoop	ORGANIZATION	0.95+
end of 2016	DATE	0.95+

Ritika Gunnar & David Richards - #BigDataSV 2016 - #theCUBE

>> Narrator: From San Jose, in the heart of Silicon Valley, it's The Cube, covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data Week, Big Data SV Strata Hadoop. This is The Cube, SiliconANGLE's flagship program. We go out to the events and extract the signals from the noise. I'm John Furrier, my co-host is Peter Burris. Our next guest is Ritika Gunnar, VP of Data and Analytics at IBM and David Richards is the CEO of WANdisco. Welcome to The Cube, welcome back. >> Thank you. >> It's a pleasure to be here. >> So, okay, IBM and WANdisco, why are you guys here? What are you guys talking about? Obviously, partnership. What's the story? >> So, you know what WANdisco does, right? Data replication, active-active replication of data. For the past twelve months, we've been realigning our products to a market that we could see rapidly evolving. So if you had asked me twelve months ago what we did, we were talking about replicating just Hadoop, but we think the market is going to be a lot more than that. I think Mike Olson famously said that this Hadoop was going to disappear and he was kind of right because the ecosystem is evolving to be a much greater stack that involves applications, cloud, completely heterogeneous storage environment, and as that happens the partnerships that we would need have to move on from just being, you know, the sort of Hadoop-specific distribution vendors to actually something that can deliver a complete solution to the marketplace. And very clearly, IBM has a massive advantage in the number of people, the services, ecosystem, infrastructure, in order to deliver a complete solution to customers, so that's really why we're here. >> If you could talk about the stack comment, because this is something that we're seeing. Mike Olson's kind of being political when he says make it invisible, but the reality is there is more to big data than Hadoop. There's a lot of other stuff going on. Call it stack, call it ecosystem. A lot of great things are growing, we just had Gaurav on from SnapLogic said, "everyone's winning." I mean, I just love that's totally true, but it's not just Hadoop. >> It's about Alldata and it's about all insight on that data. So when you think about Alldata, Alldata is a very powerful thing. If you look at what clients have been trying to do thus far, they've actually been confined to the data that may be in their operational systems. With the advent of Hadoop, they're starting to bring in some structured and unstructured data, but with the advent of IOT systems, systems of engagement, systems of records and trying to make sense of all of that, Alldata is a pretty powerful thing. When I think of Alldata, I think of three things. I think of data that is not only on premises, which is where a lot of data resides today, but data that's in the cloud, where data is being generated today and where a majority of the growth is. When I think of Alldata, I think of structured data, that is in your traditional operational systems, unstructured and semi-structured data from IOT systems et cetera, and when I think of Alldata, I think of not just data that's on premises for a lot of our clients, but actually external data. Data where we can correlate data with, for example, an acquisition that we just did within IBM with The Weather Company or augmenting with partnerships like Twitter, et cetera, to be able to extract insight from not just the data that resides within the walls of your organization, but external data as well. >> The old expression is if you want to go fast, do it alone, if you want to go deeper and broader and more comprehensive, do it as a team. >> That's right. >> That expression can be applied to data. And you look at The Weather data, you think, hmmm, that's an outlier type acquisition, but when you think about the diversity of data, that becomes a really big deal. And the question I want to ask you guys is, and Ritika, we'll start with you, there's always a few pressure points we've seen in big data. When that pressure is relieved, you've seen growth, and one was big data analytics kind of stalled a little bit, the winds kind of shifted, eye of the storm, whatever you want to call it, then cloud comes in. Cloud is kind of enabling that to go faster. Now, a new pressure point that we're seeing is go faster with digital transformation. So Alldata kind of brings us to all digital. And I know IBM is all about digitizing everything and that's kind of the vision. So you now have the pressure of I want all digital, I need data driven at the center of it, and I've got the cloud resource, so kind of the perfect storm. What's your thoughts on that? Do you see that similar picture? And then does that put the pressure on, say, WANdisco, say hey, I need replication, so now you're under the hood? Is that kind of where this is coming together? >> Absolutely. When I think about it, it's about giving trusted data and insights to everyone within the organization, at the speed in which they need it. So when you think about that last comment of, "At the speed in which they need it," that is the pressure point of what it means to have a digitally transformed business. That means being able to make insights and decisions immediately and when we look at what our objective is from an IBM perspective, it's to be able to enable our clients to be able to generate those immediate insights, to be able to transform their business models and to be able to provide the tooling and the skills necessary, whether we have it organically, inorganically, or through partnerships, like with WANdisco to be able to do that. And so with WANdisco, we believe we really wanted to be able to activate where that data resides. When I talk about Alldata and activation of that data, WANdisco provided to us complementary capabilities to be able to activate that data where it resides with a lot of the capabilities that they're providing through their fusion. So, being able to have and enable our end-users to have that digitally infused set of reactive type of applications is absolutely something... >> It's like David, we talk about, and maybe I'm oversimplifying your value proposition, but I always look at WANdisco as kind of the five nines of data, right? You guys make stuff work, and that's the theme here this year, people just want it to work, right? They don't want to have it down, right? >> Yeah, we're seeing, certainly, an uptick in understanding about what high availability, what continuous availability means in the context of Hadoop, and I'm sure we'll be announcing some pretty big deals moving forward. But we've only just got going with IBM. I would, the market should expect a number of announcements moving forward as we get going with this, but here's the very interesting question associated with cloud. And just to give you a couple of quick examples, we are seeing an increasing number of Global 1,000 companies, Fortune 100 companies move to cloud. And that's really important. If you would have asked me 12 months ago, how is the market going to shape up, I'd have said, well, most CIO's want to move to cloud. It's already happening. So, FINRA, the major financial regulator in the United States is moving to cloud, publicly announced it. The FCA in the UK publicly announced they are moving 100% to cloud. So this creates kind of a microcosm of a problem that we solve, which is how do you move transactional data from on-premise to cloud and create a sort of hybrid environment. Because with the migration, you have to build a hybrid cloud in order to do that anyway. So, if it's just archive systems, you can package it on a disk drive and post it, right? If we're talking about transactional data, i.e, stuff that you want to use, so for example, a big travel company can't stop booking flights while they move their data into the cloud, right? They would take six months to move petabyte scale data into cloud. We solve that problem. We enable companies to move transactional data from on-premise into cloud, without any interruption to services. >> So not six months? >> No, not six months. >> Six hours? >> And you can keep on using the data while it is in transit. So we've been looking for a really simplistic problem, right, to explain this really complex algorithm that we've got that you know does this active-active replication stuff. That's it, right? It's so simple, and nobody else can do it. >> So no downtime, no disruption to their business? >> No, and you can use the cloud or you can use the on-prem applications while the data is in transit. >> So when you say all cloud, now we're on a theme, Alldata, all digital, all cloud, there's a nuance there because most, and we had Gaurav from SnapLogic talk about it, there's always going to be an on-prem component. I mean, probably not going to see 100% everyone move to the cloud, public cloud, but cloud, you mean hybrid cloud essentially, with some on-prem component. I'm sure you guys see that with Bluemix as well, that you've got some dabbling in the public cloud, but ultimately, it's one resource pool. That's essentially what you're saying. >> Yeah, exactly. >> And I think it's really important. One of the things that's very attractive e about the WANdisco solution is that it does provide that hybridness from the on-premises to cloud and that being able to activate that data where it resides, but being able to do that in a heterogeneous fashion. Architectures are very different in the cloud than they are on premises. When you look at it, your data like may be as simple as Swift object store or as S3, and you may be using elements of Hadoop in there, but the architectures are changing. So the notion of being able to handle hybrid solutions both on-premises and cloud with the heterogeneous capability in a non-invasive way that provides continuous data is something that is not easily achieved, but it's something that every enterprise needs to take into account. >> So Ritika, talk about the why the WANdisco partnership, and specifically, what are some of the conversations you have with customers? Because, obviously there's, it sounds like, the need to go faster and have some of this replication active-active and kind of, five nines if you will, of making stuff not go down or non-disruptive operations or whatever the buzzword is, but you know, what's the motivation from your standpoint? Because IBM is very customer-centric. What are some of the conversations and then how does WANdisco fit into those conversations? >> So when you look at the top three use cases that most clients use for even Hadoop environments or just what's going on in the market today, the top three use cases are you know, can I build a logical data warehouse? Can I build areas for discovery or analytical discovery? Can I build areas to be able to have data archiving? And those top three solutions in a hybrid heterogeneous environment, you need to be able to have active-active access to the data where that data resides. And therefore, we believe, from an IBM perspective, that we want to be able to provide the best of breed regardless of where that resides. And so we believe from a WANdisco perspective, that WANdisco has those capabilities that are very complementary to what we need for that broader skills and tooling ecosystem and hence why we have formed this partnership. >> Unbelievably, in the market, we're also seeing and it feels like the Hadoop market's just got going, but we're seeing migrations from distributions like Cloudera into cloud. So you know, those sort of lab environments, the small clusters that were being set up. I know this is slightly controversial, and I'll probably get darts thrown at me by Mike Olson, but we are seeing pretty large-scale migration from those sort of labs that were set up initially. And as they progress, and as it becomes mission-critical, they're going to go to companies like IBM, really, aren't they, in order to scale up their infrastructure? They're going to move the data into cloud to get hyperscale. For some of these cases that Ritika was just talking about so we are seeing a lot of those migrations. >> So basically, Hadoop, there's some silo deployments of POC's that need to be integrated in. Is that what you're referring to? I mean, why would someone do that? They would say okay, probably integration costs, probably other solutions, data. >> If you do a roll-your-own approach, where you go and get some open-source software, you've got to go and buy servers, you've got to go and train staff. We've just seen one of our customers, a big bank, two years later get servers. Two years to get servers, to get server infrastructure. That's a pretty big barrier, a practical barrier to entry. Versus, you know, I can throw something up in Bluemix in 30 minutes. >> David, you bring up a good point, and I want to just expand on that because you have a unique history. We know each other, we go way back. You were on The Cube when, I think we first started seven years ago at Hadoop World. You've seen the evolution and heck, you had your own distribution at one point. So you know, you've successfully navigated the waters of this ecosystem and you had gray IP and then you kind of found your swim lanes and you guys are doing great, but I want to get your perspective on this because you mentioned Cloudera. You've seen how it's evolving as it goes mainstream, as you know, Peter says, "The big guys are coming in and with power." I mean, IBM's got a huge spark investment and it's not just you know, lip service, they're actually donating a ton of code and actually building stuff so, you've got an evolutionary change happening within the industry. What's your take on the upstarts like Cloudera and Hortonworks and the Dishrow game? Because that now becomes an interesting dynamic because it has to integrate well. >> I think there will always be a market for the distribution of opensource software. As that sort of, that layer in the stack, you know, certainly Cloudera, Hortonworks, et cetera, are doing a pretty decent job of providing a distribution. The Hadoop marketplace, and Ritika laid this on pretty thick as well, is not Hadoop. Hadoop is a component of it, but in cloud we talk about object store technology, we talk about Swift, we talk about S3. We talk about Spark, which can be run stand-alone, you don't necessarily need Hadoop underneath it. So the marketplace is being stretched to such a point that if you were to look at the percentage of the revenue that's generated from Hadoop, it's probably less than one percent. I talked 12 months ago with you about the whale season, the whales are coming. >> Yeah, they're here. >> And they're here right now, I mean... >> (laughs) They're mating out in the water, deals are getting done. >> I'm not going to deal with that visual right now, but you're quite right. And I love the Peter Drucker quote which is, "Strategy is a commodity, execution is an art." We're now moving into the execution phase. You need a big company in order to do that. You can't be a five hundred or a thousand person... >> Is Cloudera holding onto dogma with Hadoop or do they realize that the ecosystem is building around them? >> I think they do because they're focused on the application layer, but there's a lot of competition in the application layer. There's a little company called IBM, there's a little company called Microsoft and the little company called Amazon that are kind of focused on that as well, so that's a pretty competitive environment and your ability to execute is really determined by the size of the organization to be quite frank. >> Awesome, well, so we have Hadoop Summit coming up in Dublin. We're going to be in Ireland next month for Hadoop Summit with more and more coverage there. Guys, thanks for the insight. Congratulations on the relationship and again, WANdisco, we know you guys and know what you guys have done. This seems like a prime time for you right now. And IBM, we just covered you guys at InterConnect. Great event. Love The Weather Company data, as a weather geek, but also the Apple announcement was really significant. Having Apple up on stage with IBM, I think that is really, really compelling. And that was just not a Barney deal, that was real. And the fact that Apple was on stage was a real testament to the direction you guys are going, so congratulations. This is The Cube, bringing you all the action, here live in Silicon Valley here for Big Data Week, BigData SV, and Strata Hadoop. We'll be right back with more after this short break.

Published Date : Mar 30 2016

SUMMARY :

the heart of Silicon Valley, and David Richards is the CEO of WANdisco. What's the story? and as that happens the partnerships but the reality is there is but data that's in the cloud, if you want to go deeper and broader to ask you guys is, and to be able to provide the tooling how is the market going to that we've got that you know the cloud or you can use dabbling in the public cloud, from the on-premises to cloud the need to go faster and the top three use cases are you know, and it feels like the Hadoop of POC's that need to be integrated in. a practical barrier to entry. and it's not just you know, lip service, in the stack, you know, mating out in the water, And I love the Peter and the little company called Amazon to the direction you guys are

ENTITIES

Entity	Category	Confidence
Michiel	PERSON	0.99+
Anna	PERSON	0.99+
David	PERSON	0.99+
Bryan	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Chris	PERSON	0.99+
NEC	ORGANIZATION	0.99+
Ericsson	ORGANIZATION	0.99+
Kevin	PERSON	0.99+
Dave Frampton	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Kerim Akgonul	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Jared	PERSON	0.99+
Steve Wood	PERSON	0.99+
Peter	PERSON	0.99+
Lisa Martin	PERSON	0.99+
NECJ	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Mike Olson	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Michiel Bakker	PERSON	0.99+
FCA	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Nokia	ORGANIZATION	0.99+
Lee Caswell	PERSON	0.99+
ECECT	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
OTEL	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
Bryan Pijanowski	PERSON	0.99+
Rich Lane	PERSON	0.99+
Kerim	PERSON	0.99+
Kevin Bogusz	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jared Woodrey	PERSON	0.99+
Lincolnshire	LOCATION	0.99+
Keith	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Chuck	PERSON	0.99+
Jeff	PERSON	0.99+
National Health Services	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
March	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Ireland	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Michael Dell	PERSON	0.99+
Rajagopal	PERSON	0.99+
Dave Allante	PERSON	0.99+
Europe	LOCATION	0.99+
March of 2012	DATE	0.99+
Anna Gleiss	PERSON	0.99+
Samsung	ORGANIZATION	0.99+
Ritika Gunnar	PERSON	0.99+
Mandy Dhaliwal	PERSON	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Big Data SV 2016: