Dustin Kirkland, Google | CUBEConversation, June 2019

>> from our studios in the heart of Silicon Valley. HOLLOWAY ALTO, California It is a cube conversation. >> Welcome to this Special Cube conversation here in Palo Alto, California at the Cube Studios at the Cube headquarters. I'm John for the host, like you were a Dustin Kirkland product manager and Google friend of the Cuban. The community with Cooper Netease been on the Cube Cube alumni. Dustin. Welcome to the Cube conversation. >> Thanks. John's a beautiful studio. I've never been in the studio and on the show floor a few times, but this is This is fun. >> Great to have you on a great opportunity to chat about Cooper Netease yet of what you do out some product man's working Google. But really more importantly on this conversation is about the fifth anniversary, the birthday of Cuba Netease. Today we're celebrating the fifth birthday of Cooper Netease. Still, it's still a >> toddler, absolutely still growing. You think about how you know Lennox has been around for a long time. Open stack has been around these other big projects that have been around for, you know, going on decades and Lenox this case and Cooper nineties. It's going so fast, but It's only five years old, you know. >> You know, I remember Adam Open Stack event in Seattle many, many years ago. That was six years ago. Pubes on his 10th year. So many of these look backs moments. This is one of them. I was having a beer with Lou Tucker. J J Kiss Matic was like one of the first comes at the time didn't make it, But we were talking about open stagger like this Cooper Netease thing. This is really hot. This paper, this initiative this could really be the abstraction layer to kind of bring all this cloud Native wasn't part of the time, but it was like more of an open stack. Try and move up to stack. And it turned out it ended up happening. Cooper Netease then went on to change the landscape of what containers did. Dr. Got a lot of credit for pioneering that got the big VC funding became a unicorn, and then containers kind of went into a different direction because of Cooper duties. >> Very much so. I mean, the modernization of software infrastructure has been coming for a long time, and Cooper nutty sort of brings it all brings it all together at this point, but putting software into a container. We've been doing that different forest for for a lot of time, uh, for a long time, but But once you have a lot of containers, what do you do with that? Right? And that was the problem that Cooper Nettie solved so eloquently and has, you know, now for a couple of years, and it just keeps getting better. >> You know, you mentioned modernization. Let's talk about that because I think the modernization the theme is now pretty much prevalent in every vertical. I'll be in D. C. Next week for the Amazon Webster was public sector Summit, where modernization of governments and nations are being discussed. Education, modernization of it. We've seen it here. The media business that were participating in is about not where you store the code. It's how you code. How you build is a mindset shift. This has been the rial revelation around the Dev Ops Movement Infrastructures Code, now called Cloud Native. Share your thoughts on this modernization mindset because it really is how you build. >> Yeah, I think the cross pollination actually across industries and we even we see that even just in the word containers, right and all the imagery around shipping and shipping containers, we've applied these age old concepts that have been I don't have perfected but certainly optimized over decades of, actually centuries or millennia of moving things across water in containers. Right. But we apply that to software and boom. We have the step function difference in the way that we we manage and we orchestrated and administer code. That's one example of that cross pollination, and now you're talking about, like optimizing optimized governments or economies but being able to maybe then apply other concepts that we've come a long way in computer science do de bop set a good example? You know, applying Dev ops principles to non computer feels. Just think about that for a second. >> It's mind blowing. And if you think about also the step function you mentioned because I think this actually changed a lot of the entrepreneurial landscape as well and also has shaped open source and, you know, big news this this quarter is map are going to shut down due one of the biggest do players. Cloudera merge with Horton Works fired their CEO, the founder Michael. So has retired, Some say forced out. I don't think so. I think it's more of his time. I'm Rodel still there. Open source is a business model, you know. Can we be the red hat for her? Duped the red? Not really kind of the viable, but it's evolving. So open source has been impacted by this step function. There's a business impact. Talk about the dynamics with step function both on the business side and on how software's built specifically open source. >> You know, you and I have been around open source for a long, long time. I think it started when I was in college in the late nineties on then through my career at IBM. And it's It's interesting how on the fringe open source was for so long and such so so much of my BM career. And then early time spent onside it at Red Hat. It was it was something that was it was different, was weird. It was. It was very much fringe where the right uh, but now it's in mainstream and it's everywhere, and it's so mainstream that it's almost the defacto standard to just start with open source. But you know, there's some other news that's been happening lately that she didn't bring up. But it's a really touchy aspect of open source right now on that's on some of the licenses and how those licenses get applied by software, especially databases. When offered as a service in the cloud. That's one of the big problems. I think that that's that we're we're working within the open >> source, summarize the news and what it means. What's what's happening? What's the news and what's the really business? Our technical impact to the licensing? What's the issue? What's the core issue? >> Yeah, eso without taking judgment any any way, shape or form on this, the the the TL D are on. This is a number of open source database is most recently cockroach D. B. I have adopted a different licensing model that is nonstandard from an open source perspective. Uh, and from one perspective, they're they're adopting these different licensing models because other vendors can take that software and offered as a service, yes, and in some some cases, like Amazon like Sure, you said, uh, and offered as a as a service, uh, and maybe contribute. Maybe pay money to the smaller startup or the open source community behind it. But not necessarily. Uh, and it's in some ways is quite threatening to open source communities and open source companies on other cases, quite empowering. And it's going to be interesting to see how that plays out. The tension between open sourcing software and eventually making money off of it is something that we've we've seen for, you know, at least 25. >> And it continues to go on today, and this is, to me a real fascinating area that I think is going to be super important to keep an eye on because you want to encourage contribution and openness. Att the same time we look at the scale of just the Lenox foundations numbers. It's pretty massive in terms of now, the open source contribution. When you factor in even China and other nations, it's it's on exponential growth, right? So is it just open source? Is the model not necessarily a business? Yeah. So this is the big question. No one knows. >> I think we crossed that. And open source is the model. Um, and this is where me is a product manager. That's worked around open source. I've spent a lot of time thinking about how to create commercial offerings around open source. I spent 10 years at Economical, the first half of which, as an engineer, the second half of which, as a product manager around, uh, about building services, commercial services around 12 And I learned quite a few things that now apply absolutely to communities as well as to a number of open source startups. That that I've advised on DH kind of given them some perspective on maybe some successful and unsuccessful ways to monetize that that opens. >> Okay, so doesn't talk about Let's get back to Coburg. And so I think this is the next level Talk track is as Cooper Netease has established itself and landed in the industry and has adoption. It's now an expansion votes the land adopted expand. We've seen adoption. Now it's an expansion mode. Where does it go from here? Because you look at the tale signs things like service meshes server. Listen, you get some interesting trends that going to support this expansionary stage of uber netease. What is your view about the next expansion everyway what >> comes next? Yeah, I I think I think the next stage is really about democratizing communities for workloads that you know. It's quite obvious where when communities is the right answer at the scale of a Google or a Twitter or Netflix or, you know, some of these massive services that it is obviously and clearly the best answer to orchestrating containers. Now I think the next question is, how does that same thing that works at that massive scale Also worked for me as a developer at a very small scale helped me develop my software. My small team of five or 10 people. Do I need a coup? Burnett. He's If I'm ah five or 10 person startup. Well, I mean, not the original sort of borde vision of communities. It's probably overkill, but actually the tooling has really advanced, and we now >> have >> communities that makes sense on very small scales. You've got things like a three s from from Rancher. You've got micro Kates from from my colleagues at economical other ways of making shrinking communities down to something that fits, perhaps on devices perhaps at the edge, beyond just the traditional data center and into remote locations that need to deploy manage applications >> on the Cooper Netease clustering the some of the tech side. You know, we've seen some great tech trends as mentioned in Claudia Horton. Works and map Our Let's Take Claudia and Horton work. Remember back in the old days when it was booming? Oh, they were so proud to talk about their clusters. I stood up all these clusters and then I would ask them, Well, what do you doing with it? Well, we're storing data. I think so. That became kind of this use case where standing up the cluster was the use case and they're like, OK, now let's put some data in it. It's a question for you is Coburn. Eddie's a little bit different. I'm not seeing they were seeing real use cases. What are people standing up? Cuban is clusters for what specific Besides the same Besides saying I've done it. Yeah, What's the what's the main use case that you're seeing this that has real value? >> Yeah, actually, there's you just jog t mind of really funny memory. You know, back in those big data days, I was CEO of a startup. We were encrypting data, and we were helping encrypt healthcare data for health care companies and the number of health care companies that I worked with at that time who said they had a big data problem and they had all of I don't know, 33 terabytes worth of worth of data that they needed to encrypt. It was kind of humorous sometimes like, Is that really a big, big data problem? This fits on a single disc, you know, Uh, but yeah, I mean, it's interesting how >> that the hype of of the tech was preceding. The reality needs needs, says Cooper Nettie. So I have a Cuban Eddie's cluster for blank. Fill in the blank. What are people saying? >> Yeah, uh, it's It's largely about the modernization. So I need to modernize my infrastructure. I'm going to adopt the platform. That's probably not, er, the old er job, a Web WebSphere type platform or something like that. I'm investing in hardware investing in Software Middle, where I'm investing in people, and I want all of those things to line up with where industry is going from a software perspective, and that's where Cooper Nighties is sort of the cornerstone piece of that Lennox Of course, that's That's pretty well established >> canoes delivery in an integration piece of is that the pipeline in was, that was the fit on the low hanging fruit use cases of Cooper Netease just development >> process. Or it's the operations it's the operations of now got software that I need to deploy across multiple versions, perhaps multiple sites. Uh, I need to handle that upgrade ideally without downtime in a way that you said service mash in a way that meshes together makes sense. I've got a roll out new certificates I need to address the security, vulnerability, thes air, all the things that Cooper and I used to such a better job at then, what people were doing previously, which was a whole lot of four loops, shell strips and sshh pushing, uh, pushing tar balls around. Maybe Debs or rpm's around. That is what Cooper not he's actually really solves and does an elegant job of solving as just a starting point. And that's just the beginning and, you know, without getting ve injury here, you know, Anthros is the thing that we had at Google have built around Cooper Netease that brings it to enterprise >> here the other day did a tweet. I called Anthem. I just typing too fast. I got a lot of crap on Twitter for that mission. And those multi cloud has been a big part of where Cubans seems to fit. You mentioned some of the licensing changes. Cloud has been a great resource for a lot of the new Web scale applications from all kinds of companies. Now, with several issues seeing a lot more than capabilities, how do you see the next shift with data State coming in? Because God stateless date and you got state full data. Yeah, this has become a conversation point. >> Yeah, I think Kelsey Hightower has said it pretty eloquently, as he usually does around the sort of the serval ist movement and lets lets developers focus on just their code and literally just their code, perhaps even just their function in just their piece of code, without having to be an expert on all of the turtles all the way, all the way down. That's the big difference about service have having written a couple of those functions. I can I can really invest my time on the couple of 100 lines of code that matter and not choosing a destro choosing a cougar Nati is choosing, you know, all the stack underneath. I simply choose the platform where I'm gonna drop that that function, compile it, uploaded and then riff and rub. On that >> fifth anniversary, Cooper Netease were riffing on Cooper Netease. Dustin Circle here inside the Cube Cube Alumni you were recently at the coop con in overseas in Europe, Barcelona, Barcelona, great city. Keeps been there many times. Do was there covering for us. Couldn't make this trip, Unfortunately, had a couple daughter's graduating, so I didn't make the trip. Sorry, guys. Um, what was the summary? What was the takeaway? Was the big walk away from that event? What synthesized? The main stories were the most important stories being >> told. >> Big news, big observations. >> It was a huge event to start with. It was that fear of Barcelona. Um, didn't take over the whole space. But I've been there a number of times from Mobile World Congress. But, you know, this is this is cube con in the same building that hosts all of mobile world Congress. So I think 8,000 attendees was what we saw. It's quite celebratory. You know, I think we were doing some some pre fifth birthday bash celebrations, Key takeaways, hybrid hybrid, Cloud, multi Cloud. I think that's the world that we've evolved into. You know, there was a lot of tension. I think in the early days about must stay on. Prem must go to the cloud. Everything's there's gonna be a winner and a loser and everything's gonna go one direction or another. I think the chips have fallen, and it's pretty obvious now that the world will exist in a very hybrid, multi cloud state. Ultimately, there's gonna be some stuff on Prem that doesn't move. There's going to be some stuff better hosted in one arm or public clouds. That's the multi cloud aspect, Uh, and there will be stubborn stuff at the edge and remote locations and vehicles on oil rigs at restaurants and stores and >> so forth. What's most exciting from a trans statement? What do you what? What's what's getting you excited from what you see on the landscape out there? >> So the tying all of that to Cooper Netease, Cuban aunties, is the thing that basically normalizes all of that. You write your application put it in a container and expect to communities to be there to scale that toe. Operate that top grade that to migrate that over time. From that perspective, Cooper nineties has really ticked, ticked all the boxes, and you've got a lot of choices now about which companies here, you're going to use it and where >> beyond communities, a lot of variety of projects coop flow, you got service messes out there a lot of difference. Project. What's What's a dark horse? What's something that sets out there that people should be paying attention to? That you see emerging? That's notable. That should be paying attention. To >> think is a combination of two things. One is pretty obvious, and that's a ML is coming like a freight train and is sort of the next layer of excitement. I think after Cooper, Netease becomes boring, which hopefully if we've done our jobs well, that communities layer gets settled and we'll evolve. But the sort of the hockey stick hopefully settles down and it becomes something super stable. Uh, the application of machine learning to create artificial intelligence conclusions, trends from things that is sort of the next big trend on then I would say another one If you really want the dark horse. I think it's around communications. And I think it's around the difference in the way that we communicate with one another across all forms of media voice, video chat, writing, how we interact with people, how we interact with our our tools with our software and in fact, how our software in Iraq's with us in our software acts with with other software that communications industry is, it's ripe for some pretty radical disruption. And you know some of the organizations and they're doing that. It's early early days on those >> changes. Final point you mentioned earlier in our conversation here about how Dev Ops is influencing impacting non tech and computer science. Really? What did you mean by that? >> Uh, well, I think you brought up unexpectedly and that that you were looking at the way Uh, some other industries are changing, and I think that cross pollination is actually quite quite powerful when you take and apply a skill and expertise you have outside of your industry. But it adds something new and interesting, too, to your professional environment. That's where you get these provocative operations. He's really creative, innovative things that you know. No one really saw it coming. >> Dave Ops principles apply to other disciplines. Yeah, agility. That's that's pointing down waterfall based processes. That's >> one phenomenal example. Imagine that for governments, right to remove some of the like the pain that you and I know. I've got to go and renew my license. My birthday's coming up. I gotta go to renew my driver's license. You know much. I'm dreading going to the the DMV Root >> Canal driver's license on the same. Exactly >> how waterfall is that experience. And could we could we beam or Mohr Agile More Dev Autopsy and some of our government across >> the U. S. Government's procurement practices airbase upon 1990 standards they still want Request a manual, a physical manual for every product violent? Who does that? >> I know that there are organizations trying to apply some open source principles to government. But I mean, think about, you know, just democracy and how being a little bit more open and transparent in the way that we are in open source code, the ability to accept patches. I have a side project, a passion for brewing beer and I love applying open source practices to the industry of brewing. And that's an example of where use professional work, Tio. Compliment a hobby. >> All right, we got to bring some cubic private label, some Q beer. >> If you like sour beer, I'm in the sour beer. >> That's okay. We like to get the pus for us. Final question for you. Five years from now, Cooper needs to be 10 years old. What's the world gonna look like when we wake up five years from now with two Cuban aunties? >> Yeah, I think, uh, I don't think we're struggling with the Cooper nutties. Uh, the community's layer. At that point, I think that's settled science, inasmuch as Lennox is pretty settled. Science, Yes, there's a release, and it comes out with incremental features and bug fixes. I think Cuban aunties is settled. Science management of of those containers is pretty well settled. Uh, five years from now, I think we end up with software, some software that that's writing software. And I don't quite mean that in the way That sounds scary, uh, and that we're eliminating developers, but I think we're creating Mohr powerful, more robust software that actually creates that that software and that's all built on top of the really strong, robust systems we have underneath >> automation to take the heavy lifting. But the human creation still keeping one of the >> humans Aaron the look it's were We're many decades away from humans being out of the loop on creative processes. >> Dustin Kirkland, he a product manager of Google Uh, Cooper Netease guru also keep alumni here in the studio talking about the coup. Burnett. He's 50 year anniversary. Of course, the kid was president creation during the beginning of the wave of communities. We love the trend we love Cloud would left home a tec. I'm Sean for here in Palo Alto. Thanks for watching.

Published Date : Jun 6 2019

SUMMARY :

from our studios in the heart of Silicon Valley. I'm John for the host, like you were a Dustin Kirkland product manager and Google friend I've never been in the studio and on the show floor a few times, Great to have you on a great opportunity to chat about Cooper Netease yet of what you do out some product man's You think about how you know Lennox has been around that got the big VC funding became a unicorn, and then containers kind of went into a different direction I mean, the modernization of software infrastructure has been coming for a long time, This has been the rial revelation around the Dev Ops Movement Infrastructures We have the step function difference in the way that lot of the entrepreneurial landscape as well and also has shaped open source and, but now it's in mainstream and it's everywhere, and it's so mainstream that it's almost the defacto What's the news and what's the really that we've we've seen for, you know, at least 25. Att the same time we look at the scale And open source is the model. is as Cooper Netease has established itself and landed in the industry and has adoption. the scale of a Google or a Twitter or Netflix or, you know, some of these massive services that it edge, beyond just the traditional data center and into remote locations that need to deploy manage on the Cooper Netease clustering the some of the tech side. This fits on a single disc, you know, Uh, but yeah, I mean, it's interesting that the hype of of the tech was preceding. That's probably not, er, the old er And that's just the beginning and, you know, I got a lot of crap on Twitter for that mission. I simply choose the platform where I'm gonna drop that that function, Dustin Circle here inside the Cube Cube That's the multi cloud aspect, on the landscape out there? So the tying all of that to Cooper Netease, Cuban aunties, is the thing that basically normalizes all That you see emerging? Uh, the application of machine learning to create artificial What did you mean by that? at the way Uh, some other industries are changing, and I think that cross pollination Dave Ops principles apply to other disciplines. that you and I know. Canal driver's license on the same. And could we could we beam or Mohr Agile More Dev Autopsy the U. S. Government's procurement practices airbase upon 1990 standards they still want But I mean, think about, you know, just democracy and how being a little bit more open and transparent in What's the world gonna look like when we wake And I don't quite mean that in the way That sounds scary, But the human creation still keeping one of the humans Aaron the look it's were We're many decades away from humans being out of the loop on We love the trend we love Cloud would left home

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Europe	LOCATION	0.99+
Dustin Kirkland	PERSON	0.99+
Barcelona	LOCATION	0.99+
10 years	QUANTITY	0.99+
Seattle	LOCATION	0.99+
Sean	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Dustin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
100 lines	QUANTITY	0.99+
John	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Lou Tucker	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lenox	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Cooper	PERSON	0.99+
Cooper Netease	PERSON	0.99+
first half	QUANTITY	0.99+
five	QUANTITY	0.99+
Coburg	LOCATION	0.99+
Cooper Netease	ORGANIZATION	0.99+
DMV	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Iraq	LOCATION	0.99+
second half	QUANTITY	0.99+
8,000 attendees	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
10th year	QUANTITY	0.99+
10 people	QUANTITY	0.99+
Rodel	PERSON	0.99+
June 2019	DATE	0.99+
Claudia Horton	PERSON	0.99+
six years ago	DATE	0.99+
33 terabytes	QUANTITY	0.99+
Horton	PERSON	0.99+
two things	QUANTITY	0.99+
Claudia	PERSON	0.99+
1990	DATE	0.99+
fifth anniversary	QUANTITY	0.99+
Burnett	PERSON	0.99+
Eddie	PERSON	0.99+
D. C.	LOCATION	0.99+
One	QUANTITY	0.99+
uber netease	ORGANIZATION	0.98+
Aaron	PERSON	0.98+
Netflix	ORGANIZATION	0.98+
fifth birthday	QUANTITY	0.98+
Next week	DATE	0.98+
Today	DATE	0.98+
single disc	QUANTITY	0.98+
both	QUANTITY	0.97+
Twitter	ORGANIZATION	0.97+
Red Hat	ORGANIZATION	0.97+
Cube Cube	ORGANIZATION	0.97+
five years	QUANTITY	0.97+
one	QUANTITY	0.97+
Kelsey Hightower	PERSON	0.97+
Economical	ORGANIZATION	0.97+
one perspective	QUANTITY	0.97+
Cubans	PERSON	0.96+
U. S. Government	ORGANIZATION	0.96+
many years ago	DATE	0.96+
first	QUANTITY	0.95+
late nineties	DATE	0.95+
one example	QUANTITY	0.95+
J J Kiss Matic	PERSON	0.95+
Cooper Nettie	PERSON	0.94+
50 year anniversary	QUANTITY	0.94+
China	LOCATION	0.93+
Mobile World Congress	EVENT	0.93+
Cuban	OTHER	0.93+
Dave Ops	PERSON	0.93+
10 person	QUANTITY	0.92+
couple	QUANTITY	0.92+
Coburn	ORGANIZATION	0.92+

Jeff Eckard, IBM | Cisco Live US 2018

>> Live from Orlando Florida, it's theCUBE. Covering Cisco Live 2018, brought to you by Cisco, NetApp, and theCUBE's ecosystem partners. (electronic music flourish) >> Welcome back, I'm Stu Miniman and this is theCUBE's exclusive coverage of Cisco Live 2018 in Orlando Florida. Joining me, my co-host for this segment Dave Vellante sitting in for John Furrier and happy to welcome to the program Jeff Eckard, who's the Vice President of Storage Solutions at IBM. Jeff, thanks so much for joining us. >> Thank you, good to see you guys. >> All right, and 26,000 people here. It'd been many years since I'd been to Cisco Live. There's some things that are same, many of the same faces, but a lot of new jobs, a lot of buzz going on. What's your impression been of the show this week? >> Yeah, it's been an interesting, great show for IBM and our presence, but it's a very large ecosystem of Cisco partners, a lot of their, our joint end users and a lot of focus on multi-cloud. You've consistently heard that as a theme from Cisco as well as IBM since last fall at their partner forum and they've continued it here with a lot of focus on being able to take tools and capabilities and enabling enterprises to manage data where they want to manage it. And it's really interesting, from traditional systems vendors like Cisco, to see that focus particularly around developers. >> It's been fascinating for me to watch. Jeff, you and I have some background in the storage and storage networking piece, specifically, where it was like, OK, where I sit in the stack and I've got a couple of integrations, and we work on our standards here. It's much broader. >> Oh, absolutely. The things that we're working on. We're talking about cloud. There's a lot of software that flows. Data and applications are critically important. Talk a little bit about some of that transformation and how you're seeing the expansion, and-- >> Yeah, no, it's a interesting time. If you think about the opportunities and challenges facing all enterprises, data is at the core of digital transformation, digital enhancement, whatever term you wanna use with it. Typically, it's focused in on wanting to provide realtime insights so that you make better decisions against threats or opportunities. Being able to deliver personalized services to your clients, and then also improving your internal processes and business outcomes. And so data is core for digital transformation, and you kinda see, kind of this web of what we're talking about here and then what we're doing with clients as well. >> You know, Jeff, you talk about multi-cloud, you've been in the business for a while, and throughout your career you've tried to help customers simplify their lives, and everybody felt, I thought, OK, I'm gonna put stuff in the cloud, it's gonna get simpler, and now you see this spate of clouds, whether it's infrastructures of service, private clouds, SaaS, and complexity is, in some regards, never have been higher, particularly as it relates to the data. >> That's right. >> You've gotta figure out, where do you put this stuff? How do you protect it, what about governance? Even if you think security's better in the cloud, it might be different for every cloud. So how is IBM approaching, generally in your team, specifically approaching simplifying the complex of this multi-cloud world? >> Sure, so from an IBM Perspective, at the top level we approached it with innovative technology and a lot of industry expertise, whether it's in financial services or healthcare, cloud and what we do with the public IBM cloud is really important around the services we provide there, data and AI, and then as you come down from that, modern infrastructure is key because modern infrastructure supports the data. So when you look at 80% of enterprises are intending to be multi-cloud. Something like 70% already are, right? Because of what you referenced with the consumption of SaaS. So, multi-cloud is the defacto operating model for applications and then, therefore, for the data. So from an IBM storage and SDI perspective, we kind of view... There are three primary adoption patterns that we're seeing with our clients. The first is around modernizing traditional applications or workloads, which also drags modern infrastructure, flash-based systems, leveraging more of storage efficiency technologies, like compression and dedupe, being able to protect that data, whether it's in a traditional VMware environment or the emerging containers environment. So, yeah, data's at the core. The partnership that we have with Cisco around VersaStack enables us to support traditional private clouds, whether those are built on the VMware set of tools or now, as last week we announced, the VersaStack for IBM Cloud Private. IBM Cloud Private is an enterprise platform for developers to leverage microservices and containerized IBM Middleware Services, whether that's WebSphere or MQ or Microservices Builder, as well as a whole catalog of open source technologies and tools to get agility out of the DevOps process and then also layer on analytics on top of that. >> So customers, they're gonna want consistency across all those clouds. So what role do you guys bring? Are you trying to be a platform of platforms, or is that too aspirational? Obviously, you can't have 100% market shares, so that's not practical. But to the extent that people adopt your technologies, is that how we should be thinking of about it? >> Well, so IBM Cloud Private is an open platform. It's built on Docker runtimes and Kubernetes orchestration. It's open to where you can leverage things like Red Hat OpenShift if you've chosen them for your containers platform, and then we also support the traditional Private Clouds with VMware. So, there's a whole set of tools in there. What we're trying to do from a data management perspective is protect it, whether that's backup and recovery, morphing into this new category of secondary data reuse. So, for instance, from a traditional workflow of just doing backup and recovery, we can now take native format copies of the data, whether that's in Oracle or SQL Server database, et cetera, and take that data to the Public Cloud, where different personas and use cases can act on that data. So you can spin up a VM from that Native format within our tools in the IBM cloud. So that's from a data protection standpoint. On data management, we have, later this year, we'll talk more formally about programs that we have around metadata management. That's where you can index and classify, for instance, unstructured or structured data, and act on that in terms of, where was it last accessed? Who should be accessing it? Is it personally identifiable information? Do I wanna run analytics on it? So the metadata management is an opportunity to plug in to broader IBM things, whether it's Watson data platform or information governance catalogs, to provide that kind of uber across cloud infrastructure management. >> And that's a machine sort of intelligence, automation component, that scale, right? >> It could absolutely be used for augmented intelligence, artificial intelligence, some of the machine learning pieces as well. >> Jeff, Jeff, I'm wondering if you could give us a little insight of some of the places that customers are falling down. We were just talking to a systems integrator before you came on and he said, "Well, sometimes I take a virtualized environment "and I move it and it's not really geared "for this modern platform." Containerization can help in a lot of these environments, so when you talk about the pattern we've seen that works many times is you modernize the platform, and then I can modernize the application, start pulling things apart, start refactoring, start playing with some of these environments because I can't just... Lift and shift can help, but it can't be that's the only move. There's a lot of work that needs to get done, and a lot of time that's underestimated. >> Right, well it's not a panacea, but there is a key tool called Transformation Advisor that is part of the IBM cloud platform. It's intended to assist with the challenge that you just stated, which is, OK, how do I take a traditional workload, determine if it's ready to be containerized, and then start the process of containerization. You can go back to some of the VM migration pieces, too. There's a whole set of tools that enterprises have used. Transformation Advisor is one tooling example of what we can do in the platform. And then we obviously have services through Global Services that can help at a large scale for enterprises to kinda make that step. >> You bring up a good point there, 'cause we always struggle with some of these tool transformations, but if you go back to virtualization it was really some of the organizational things that had to shift. Wonder if you can talk about some of the things that are changing here. This show, we've spent a lot of time talking about Cisco's moving up the stack, network people are much more closer tied to some of those new application development, especially with things like intent-based networking. >> Well, it's a interesting reminder that we get often from clients, 'cause you're really touching at some of the remember the operational steps, things like containerization are interesting new technologies, and there's a lot of advantages to them. But just going back a minute, of the heritage with what we've been doing with Cisco around VersaStack, leveraging it on a VMware environment, we hear a lot from customers that their operational practices really are set around Vmware and the VMware tooling. So one of the things that we did with IBM Cloud Private is, it can run on top of VMware. So as customers want to take a kind of transitive step towards microservices, they can continue to leverage their operational practices around VMware. So it's important to, it sometimes takes enterprises a little bit longer than you may guess, right, to embrace the new set of things. Our product portfolio and our directions are set where they can leverage some of the operational pieces they already have. >> Well, just for our viewers who may not know, I mean, the recent history of IBM and Cisco is quite interesting. IBM at one point purchased a company called BNT, which got sold as part of the X86 sale to Lenovo. That opened up a huge opportunity for IBM and Cisco to partner because it was very clear swim lanes. And that sorta catalyzed a relationship that from your standpoint, VersaStack was sort of the first instantiation of that relationship. So, take us through, sort of, where you guys are in the partnership and where you see it going. >> Sure, yeah, so VersaStack, for folks who may not be familiar, it's a Converge System, right? So it's IBM storage, flash or otherwise, leverages Cisco UCS servers, and then their Nexus and MDS Switching. So it's integrated, validated as a single solution to, as the name implies, to be very versatile and provide agility and flexibility. And so, through our routes to market, either with distribution or resellers or system integrators, it is a way that we can address platforms that matter to our joint customers. We've talked about IBM Cloud Private. A lot of heritage around VMware and SQL server and Oracle and a lot of focus around SAP HANA. So, we typically will partner around which enterprise platforms are we going, and then we also partner, in general, around MDS Switching with Cisco, and we'll talk more about that in months to come as we enhance that relationship. >> So, the solutions part of your title, you just mentioned VMware, Oracle, SAP HANA, there may be others. How do you guys approach solutions? Maybe you can talk about that a little bit. >> Yeah, so a solution, at a PetaLogic level, is a successful repeatable outcome. And what we focus on, then, are the integrations that matter. Those could be, integrations with IBM tools, like we talked about with IBM Cloud Private. Could be the integrations that we do jointly with Cisco through the validated design process for some of these applications or databases. And so we have teams that do the validation work and figure out how we marry IBM capabilities with ecosystem capabilities. And there's a whole, whether we're automating private clouds or accelerating workloads including the partnership that IBM and Cisco have with Horton Works. And then in industry context as well, particularly in healthcare and financial services. We'll pick the platforms that really matter and then do the integrations that enable us to take, whether it's our systems or our software or IBM level capabilities to market. >> I wanna come back to this simplicity theme, specifically in the context of data protection. With all this multi-cloud, data protection has become a really hot topic. You guys have dramatically simplified your data protection offering with Spectra Protect Plus. Talk about data protection, how it's changing from where it used to be just, OK, it's a virtualized world. We kind of understand the challenges of virtual data protection. That has played itself out, and now there's a whole new wave coming. What's your perspective on this? >> Well, I don't know if the virtual is play, I mean, the virtualized environment is still kind of paying the freight, if you will. >> Yeah, played out in terms of-- >> Yes, no, no, yeah, right. >> We understand what had to change. >> Right. And customers have made that change >> Yeah, and your simplicity point on that is really key. So one of the enhancements that we announced last year at VMWorld was Spectrum Protect Plus. So that's an agent list, OVA based, VM based backup and recovery tool. And it's very simple to use. The trick is that we've focused its capabilities around secondary data re-use. So I mentioned earlier, that whole workflow has evolved to where the data has increasing value beyond its primary use, right? So backup and recover, but then we can leverage those native format copies. Spectrum Protect Plus is available either on a bring your own license or a monthly subscription in the IBM cloud, other clouds over time. And so we enable enterprises to not only do the traditional backup and protection, but very simply, move that data to either a secondary or tertiary data center, if that's still a part of their backup architecture, or into the public cloud. And so the simplicity factor comes in, again, that it's agent lists. There's a catalog of where all your copies are, and you can reuse that data for whether it's DevOps or DevTest or analytics purposes. >> OK, so that's helpful. So what I'm trying to get to was sort of the enablers, maybe from a technology standpoint, because in the virtualization world, it was all about efficiency because you didn't have the underutilized physical resources anymore. >> Yep, right. >> All the servers utilized 10%. (chuckles) Well, I got rid of a lot of those physical servers, and the one job that needed that power was backup, so I needed a new way to approach it. What I'm hearing is, in this multi-cloud world, it's a focus on simplicity. I'm inferring from that, a cloud-like experience, maybe some other capabilities that you guys are-- >> Yeah, so. >> Doing away with. >> The containers are a progression. I mean, VMware came around to maximize your CPU and storage utilization. Containers provide yet another level of efficiency on top of that. They bring with them the need for changes in your data protection. And so we, at Think in March, we talked about our directions around container aware data protection and container aware snapshots. Most vendors will use snapshots and then volume level controls of how we've traditionally done backup. We have a progression, and we'll talk more about it later in the year, of how we do snapshots, again, that are container aware. They leverage our tools, such as Spectrum Copy Data Management, Spectrum Protect Plus, integrate with our arrays. But they'll bring the same level of capability that we've had traditionally in a virtualized environment to also support data protection in a container world. >> Well, it's an interesting landscape right now in data protection. >> Oh, it's awesome! There's so many new tools, and it's great to be able, (Dave chuckling) like we talked about earlier, to partner with Cisco around some of this as well. >> Great, Jeff, I wanna give you the final word, as if, for those that couldn't make it to the show, either share key conversation you're having, you're hearing from customers, or a big takeaway from the show that you'd like to share. >> Sure, yeah, we've had a lot of customers come up and wanna know, OK, well, how do you start, right? And we talked about, there are three primary adoption patterns, whether it's modernizing, and typically it will start with modernizing traditional workloads. 70% of private cloud usage is for that particular use case. Well, you can pretty quickly show them, then, the progression to, OK, they wanna be more agile. They wanna go cloud-native. From that private cloud infrastructure, you can do that, and then you can have a consistent way that you interact around services in the public cloud. And so that's what we've been talking to clients about. They wanted to know, how do I start with what I have, and then how do I get to this better future? And how do I leverage your tools and capabilities? And so whether that's with IBM systems components or what we do with our partnership with Cisco, we're showing them how we, collectively, can help them on that journey. >> All right, Jeff, I really appreciate all the updates. Dave, thanks so much for joining me for this segment. >> Yeah, thank you. >> We still have a full day here, three days wall-to-wall coverage of theCUBE, Cisco Live 2018. Thanks so much for watching. (techno musical flourish)

Published Date : Jun 13 2018

SUMMARY :

Covering Cisco Live 2018, brought to you by Cisco, and happy to welcome to the program but a lot of new jobs, a lot of buzz going on. and a lot of focus on multi-cloud. and I've got a couple of integrations, There's a lot of software that flows. and then what we're doing with clients as well. and now you see this spate of clouds, You've gotta figure out, where do you put this stuff? and then as you come down from that, So what role do you guys bring? and take that data to the Public Cloud, some of the machine learning pieces as well. a little insight of some of the places that is part of the IBM cloud platform. that had to shift. So one of the things that we did with IBM Cloud Private is, in the partnership and where you see it going. and then we also partner, in general, So, the solutions part of your title, Could be the integrations that we do jointly and now there's a whole new wave coming. kind of paying the freight, if you will. what had to change. And customers have made that change and you can reuse that data for whether it's DevOps because in the virtualization world, and the one job that needed that power was backup, and then volume level controls Well, it's an interesting landscape right now and it's great to be able, (Dave chuckling) or a big takeaway from the show that you'd like to share. and then you can have a consistent way All right, Jeff, I really appreciate all the updates. Thanks so much for watching.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Eckard	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
100%	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
BNT	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Orlando Florida	LOCATION	0.99+
70%	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
last year	DATE	0.99+
John Furrier	PERSON	0.99+
March	DATE	0.99+
first	QUANTITY	0.99+
three days	QUANTITY	0.99+
26,000 people	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.98+
three	QUANTITY	0.98+
last week	DATE	0.98+
NetApp	ORGANIZATION	0.98+
later this year	DATE	0.97+
last fall	DATE	0.97+
this week	DATE	0.97+
one	QUANTITY	0.96+
WebSphere	TITLE	0.96+
uber	ORGANIZATION	0.96+
VersaStack	ORGANIZATION	0.96+
SQL	TITLE	0.96+
SAP HANA	TITLE	0.96+
VMware	TITLE	0.95+
Think	ORGANIZATION	0.94+
Transformation Advisor	TITLE	0.94+
single solution	QUANTITY	0.94+
Red Hat OpenShift	TITLE	0.93+
Cisco Live 2018	EVENT	0.93+
DevTest	TITLE	0.93+
X86	COMMERCIAL_ITEM	0.92+
first instantiation	QUANTITY	0.92+
VMWorld	ORGANIZATION	0.92+
Vice President	PERSON	0.92+
DevOps	TITLE	0.91+
Kubernetes	TITLE	0.91+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Action Item | The Role of Open Source

>> Hi, I'm Peter Burris, Welcome to Wikibon's Action Item. (slow techno music) Once again Wikibon's research team is assembled, centered here in The Cube Studios in lovely Palo Alto, California, so I've got David Floyer and George Gilbert with me here in the studio, on the line we have Neil Raden and Jim Kobielus, thank you once again for joining us guys. This week we are going to talk about an issue that has been dominant consideration in the industry, but it's unclear exactly what direction it's going to take, and that is the role that open source is going to play in the next generation of solving problems with technology, or we could say the role that open source will play in future digital transformations. No one can argue whether or not open source has been hugely consequential, as I said it has been, it's been one of the major drivers of not only new approaches to creating value, but also new types of solutions that actually are leading to many of the most successful technology implementations that we've seen ever, that is unlikely to change, but the question is what formal open source take as we move into an era where there's new classes of individuals creating value, like data scientists, where those new problems that we're trying to solve, like problems that are mainly driven by the role that data as opposed to code plays, and that there are new classes of providers, namely service providers as opposed to product or software providers, these issues are going to come together, and have some pretty important changes on how open source behaves over the next few years, what types of challenges it's going to successfully take on, and ultimately how users are going to be able to get value out of it. So to start the conversation off George, let's start by making a quick observation, what has the history of open source been, take us through it kind of quickly. >> The definition has changed, in its first incarnation it was fixed UNIX fragmentation and the high price of UNIX system servers, meaning UNIX the proprietary UNIX's and the proprietary servers they were built, that actually rather quickly morphed into a second incarnation where it was let's take the Linux stack, Linux, Apache, MySQL, PHP, Python, and substitute that for the old incumbents, which was UNIX, BEA Web Logic, the J2E server and Oracle Database on an EMC storage device. So that was the collapse of the price of infrastructure, so really quickly then it morphed into something very, very different, which was we had the growth of the giant Internet scale vendors, and neither on pricing nor on capacity could traditional software serve their needs, so Google didn't quite do open source, but they published papers about what they did, those papers then were implemented. >> Like Map Produce. Yeah Map Produce, Big Table, Google File System, those became the basis of Hadoop which Yahoo open sourced. There is another incarnation going, that's probably getting near its end of life right now, which is sort of a hybrid, where you might take Kafka which is open source, and put sort of proprietary bits around it for management and things like that, same what Cloudera, this is called the open core model, it's not clear if you can build a big company around it, but the principle is, the principle for most of these is, the value of the software is declining, partly because it's open source, and partly because it's so easy to build new software systems now, and the hard part is helping the customer run the stuff, and that's where some of these vendors are capturing it. >> So let's David turn our attention to how that's going to turn into actual money. So in this first generation of open source, I think up until now, certainly Red Hat, Canonical have made money by packaging and putting forward distributions, that have made a lot of money, IBM has been one of the leaders in contributing open source, and then turning that into a services business, Cloudera, Horton Works, NapR, some of these other companies have not generated the same type of market presence that a Red Hat or Canonical have put forward, but that doesn't mean there aren't companies out there that have been very successful at appropriating significant returns out of open source software, mainly however they're doing it as George said, as a service, give us some examples. >> I think the key part of open source is providing a win-win environment, so that people are paid to do stuff, and what is happening now a lot is that people are putting stuff into open source in order that it becomes a standard, and also in order that it is maintained by the community as a whole. So those two functions, those two capabilities of being paid by a company often, by IBM or by whoever it is to do something on behalf of that company, so that it becomes a standard, so that it becomes accepted, that is a good business model, in the sense that it's win-win, the developer gets recognition, the person paying for it achieves their business objective of for example getting a standard recognized-- >> A volume. >> Volume, yes. >> So it's a way to get to volume for the technology that you want to build your business around. >> Yes, what I think is far more difficult in this area is application type software, so where open source has been successful, as George said is in the stacks themselves, the lower end of the stacks, there are a few, and they usually come from very very successful applications like Word, Microsoft Word, or things like that where they can be copied, and be put into open source, but even there they have around them software from a company, Red Hat or whoever it is, that will make it successful. >> Yes but open office wasn't that successful, get to the kind of, today we have Amazon, we have some of the hyper scalars that are using that open core model and putting forward some pretty powerful services, is that the new Red Hat, is that the new Canonical? >> The person who's made most money is clearly Amazon, they took open source code and made it robust, and made it in volume, those are the two key things you to have for success, it's got to be robust, it's got to be in volume, and it's very difficult for the open source community to achieve that on its own, it needs the support of a large company to do that, and it needs the value that that large company is going to get from it, for them to put those resources in. So that has been a very successful model a lot of people decry it because they're not giving back, and there's an argument-- >> They being Amazon, have not given back quite as much. >> Yes they have relatively very few commiters. I think that's more of a problem in the T&Cs of the open source contract, so those should probably be changed, to put more onus on people to give back into the pool. >> So let me stop you, so we have identified one thing that is likely going to have to be evolved as we move forward, to prevent problems, some of the terms and conditions, we try to ensure that there is that quid pro quo, that that win-win exists. So Jim Kobielus, let me ask you a question, open source has been, as David mentioned, open source has been more successful where there is a clear model, a clear target of what the community is trying to build, it hasn't been quite successful, where it is in fact is expected that the open source community is going to start with some of the original designs, so for example, there's an enormous plethora of big data tools, and yet people are starting to ask why is big data more successful, and partly it's because putting these tools together is so difficult. So are we going to see the type of artifacts and assets and technologies associated with machine learning, AI, deep learning et cetera, easily lend themselves to an open source treatment, what do you think? >> I think were going to see open source very much take off in the niches of the deep learning and machine learning AI space, where the target capabilities we've built are fairly well understood by our broad community. Machine learning clearly, we have a fair number of frameworks that are already well established, with respect to the core capabilities that need to be performed from modeling and training, and deployment of statistical models into applications. That's where we see a fair amount of takeoff for Tensor Flow, which Google built in an open source, because the core of deep learning in terms of the algorithm, in terms of the kinds of functions you perform to be able to take data and do feature engineering and algorithm selection are fairly well understood, so those are the kinds of very discreet capabilities for which open source code is becoming standard, but there's many different alternative frameworks for doing that, Tensor Flow being one of them, that are jostling for presence in the market. The term is commoditized, more of those core capabilities are being commoditized by the fact that there well understood and agreed to by a broad community. So those are the discrete areas we're seeing the open source alternatives become predominant, but when you take a Tensor Flow and combine it with a Spark, and with a Hadoop and a Kafka and broader collections of capabilities that are needed for robust infrastructure, those are disparate communities that each have their own participants committed and so forth, nobody owns that overall step, there's no equivalent of a lamp stack were all things to do with deep learning machine learning AI on an open source basis come to the fore. If some group of companies is going to own that broadening stack, that would indicate some degree of maturation for this overall ecosystem, that's not happening yet, we don't see that happening right now. >> So Jim, I want to, my bias, I hate the term commoditization, but I Want to unify what you said with something that David said, essentially what we're talking about is the agreement in a collaborative open way around the conventions of how we perform work that compute model which then turns into products and technologies that can in fact be distributed and regarded as a standard, and regarded as a commodity around which trading can take place. But what about the data side of things George, we have got, Jim's articulated I think a pretty good case, that we're going to start seeing some tools in the marketplace, it's going to be interesting to see whether that is just further layering on top of all this craziness that is happening in the big data world, and just adding to it in the ML world, but how does the data fit into this, are we going to see something that looks like open source data in the marketplace? >> Yes, yes, and a modified yes. Let me take those in two pieces. Just to be slightly technical, hopefully not being too pedantic, software used to mean algorithms and data structures, so in other words the recipe for what to do, and the buckets for where to put the data, that has changed in the data in terms of machine learning, analytic world where the algorithms and data are so tied together, the instances of the data, not the buckets, that the data changed the algorithms, the algorithms change the data, the significance of that is, when we build applications now, it's never done, and so you go, the construct we've been focusing on is the digital twin, more broadly defined than a smart device, but when you go from one vendor and you sort of partially build it, it's an evergreen thing, it's never done, then you go to the next vendor, but you need to be able to backport some core of that to the original vendor, so for all intents and purposes that's open source, but it boils down to actually the original Berkeley license for open source, not the Apache one everyone is using now. And remind me of the other question? >> The other issue is are we going to see datasets become open source like we see code bases and code fragments and algorithms becoming open source? >> Yes this is also, just the way Amazon made infrastructure commoditized and rentable, there are going to be many datasets were they used to be proprietary, like a Google web crawl, and Google knowledge graph of disambiguation people, places and things, some of these things are either becoming open source, or openly accessible by API, so when you put those resources together you're seeing a massive deflation, or a massive shrinkage in the capital intensity of building these sorts of apps. >> So Neil, if we take a look at where we are this far, we can see that there is, even though we're moving to a services oriented model, Amazon for example is a company that is able to generate commercial rents out of open source software, Jim has made a pretty compelling case that open source software can be, or will emerge out of the tooling world for some of these new applications, there are going to be some examples of datasets, or at least APIs to datasets that will look more open source like, so it's not inconceivable that we'll see some actual open source data, I think GDPR, and some other regulations, we're still early in the process of figuring out how we're going to turn data into commodity, using Jim's words. But what about the personnel, what about the people? There were reasons why developers moved to open source, some of the soft reasons that motivated them to do things, who they work with, getting the recognition, working on relevant projects, working with relevant technologies, are we going to see a similar set of soft motivators, diffuse into the data scientist world, so that these individuals, the real ones who are creating the real value, are going to have some degree of motivation to participate with each other collaborate with each other in an open source way, what do you think? >> Good question, I think the answer is absolutely true, but it's not unique to data scientists, academics, scientists in molecular biology, civil engineers, they all wannabe recognized by their peers, on some level beyond just their, just what they're doing in their organization, but there is another segment of data scientists that are just guys working for a paycheck, and generating predictive analysis and helping the company along and so forth, and that's what they're going to do. The whole open source thing, you remember object programming, you remember JavaBeans, you remember Web Services, we tried to turn developers into librarians, and when they wanted to develop something, you go to Github, I go to Github right now and I say I'm looking for a utility that can figure out why my face is so pink on this camera, I get 1000 listings of programs, and have no idea which ones work and which ones don't, so I think the whole open source thing is about to explode, it already has, in terms of piece parts. But I think managing in an organization is different, and when I say an organization, there's the Googles and the Amazons and so forth of the world, and then there's everybody else. >> Alright so we've identified an area where we can see some consequence of change where we can anticipate some change will be required to modernize the open source model, the licensing model, we see another one where the open source communities going to have to understand how to move from a product and code to a data and service orientation, can we think of any others? >> There is one other that I'd like to add to that, and that is compliance. You addressed it to some extent, but compliance brings some real-world requirements onto code and data, and you were saying earlier on that one of the options is bringing code and data so that they intermingle and change each other, I wonder whether that when you look at it from a compliance point of view will actually pass muster, because you need from a compliance point of view to prove, for example, in the health service, that it works, and it works the same way every time, and if you've got a set of code and data that doesn't work the same every time, you probably are going to get pushed back from the people who regularly health that this is not, you can't do it that way, you'll have to find another way to do it. But that again is, is at the same each time, so the point I'm making-- >> This is a bigger issue than just open source, this is an issue where the idea if continuous refinement of the code, and the data-- >> Automatic refinement. >> Automatic refinement, could in fact, we're going to have to change some compliance laws, is open source, is it possible the open source community might actually help us understand that problem? >> Absolutely, yes. >> I think that's a good point, I think that's a really interesting point, because you're right George, the idea of a continuous development, is not something that for example Serr Banes actually says I get this, Serr Banes actually says "Oh yeah, I get this." Serr Banes actually is like, yes the data, I acknowledge that this date is right, and I acknowledge the process by which it was created was read, now this is another subject, let's bring this up later, but I think it's relevant here, because in many respects it's a difference between an income statement and balance sheet right? Saying it's good now, is kind of like the income statement, but let's come back to this, because I think it's a bigger issue. You're asserting the open source community in fact may help solve this problem by coming up with new ways of conceiving say versioning of things, and stamping things and what is a distribution, what isn't a distribution, with some of these more tightly bound sets of-- >> What we find normally is that-- >> Jim: I think that we are going to-- >> Peter: Go on Jim. >> Just to elaborate on what Peter was talking about, that whole theme, I think what we're going to see is more open source governance of models and data, within distributed development environments, using technologies like block chain as a core enabler for these workflows, for these as it were general distributed hyper ledgers indicate the latest and greatest version of a given dataset, or a given model being developed somewhere around some common solution domain, I think those kinds of environments for governance will become critically important, as this pipeline for development and training and deployment of these assets, gets ever more distributed and virtual. >> By the way Jim I actually had a conversation with a very large open source distribution company a few months ago about this very point, and I agree, I think blockchain in fact could become a mechanism by which we track intellectual property, track intellectual contributions, find ways to then monetize those contributions, going back to what you were saying David, and perhaps that becomes something that looks like the basis of a new business model, for how we think about how open source goes after these looser, goosier problems. >> But also to guarantee integrity without going through necessarily a central-- >> Very important, very important because at the end of the day George-- >> It's always hard to find somebody to maintain. >> Right, big companies, one of the big challenges that companies today are having is that they do open source is that they want to be able to keep track of their intellectual property, both from a contribution standpoint, but also inside their own business, because they're very, very concerned that the stuff that they're creating that's proprietary to their business in a digital sense, might leave the building, and that's not something a lot of banks for example want to see happen. >> I want to stick one step into this logic process that it think we haven't yet discussed, which is, we're talking about now how end customers will consume this, but there still a disconnect in terms of how the open source software vendor's or even hybrid ones can get to market with this stuff, because between open source pricing models and pricing levels, we've seen a slow motion price collapse, and the problem is that, the new go to market motion is actually made up of many motions, which is discover, learn, try, buy, recommend, and within each of those, the motion was different, and you hear it's almost like a reflex, like when your doctor hit you on the knee and your leg kind of bounced, everybody says yeah we do land and expand, and land was to discover, learn, try augmented with inside sales, the recommend and standardizes still traditional enterprise software where someone's got to talk to IT and procurement about fitting into the broader architecture, and infrastructure of the firm, and to do that you still need what has always been called the most expensive migratory workforce in the world, which is an enterprise sales force. >> But I would suggest there's a big move towards standardization of stacks, true private cloud is about having a stack which is well established, and the relationship between all the different piece parts, and the stack itself is the person who is responsible for putting that stack and maintaining that stack. >> So for a moment pretend that you are a CIO, are you going to buy OpenStack or are you going to buy the Vmware stack? >> I'm going to buy Vmware stack. >> Because that's about open source? >> No, the point I'm saying is that those open source communities or pieces, would then be absorbed into the stack as an OEM supplier as opposed to a direct supplier and I think that's true for all of these stacks, if you look at the stack for example and you have code from Netapp or whatever it is that's in that code and they're contributing It You need an OEM agreement with that provider, and it doesn't necessarily have to be open source. >> Bottom line is this stuff is still really, really complicated. >> But this model of being an OEM provider is very different from growing an enterprise sales force, you're selling something that goes into the cost of goods sold of your customer, and that the cost of goods sold better be less than 15 percent, and preferably less than five percent. >> Your point is if you can't afford a sales force, an OEM agreement is a much better way of doing it. >> You have to get somebody else's sales force to do it for you. So look I'm going to do the Action Item on this, I think that this has been a great conversation again, David, George, Neil, Jim, thanks a lot. So here's the Action Item, nobody argues that open source hasn't been important, and nobody suggests that open source is not going to remain important, what we think based on our conversation today is that open source is going to go through some changes, and those changes will occur as a consequence of new folks that are going to be important to this like data scientists, to some of the new streams of value in the industry, may not have the same motivations that the old developer world had, new types of problems that are inherently more data oriented as opposed process-oriented, and it's not as clear that the whole concept of data as an artifact, data as a convention, data as standards and commodities, are going to be as easy to define as it was in the cold world. As well as ultimately IT organizations increasingly moving towards an approach that focused more on the consumption of services, as opposed to the consumption of product, so for these and many other reasons, our expectation is that the open source community is going to go through its own transformation as it tries to support future digital transformations, current and future digital transformations. Now some of the areas that we think are going to be transformed, is we expect that there's going to be some pressure on licensing, we think there's going to be some pressure in how compliance is handled, and we think the open source community may in fact be able to help in that regard, and we think very importantly that there will be some pressure on the open source community trying to rationalize how it conceives of the new compute models, the new design models, because where open source always has been very successful is when we have a target we can collaborate to replicate and replace that target or provide a substitute. I think we can all agree that in 10 years we will be talking about how open source took some time to in fact put forward that TPC stack, as opposed to define the true private cloud stack. So our expectation is that open source is going to remain relevant, we think it's going to go through some consequential changes, and we look forward to working with our clients to help them navigate what some of those changes are, both as commiters, and also as consumers. Once again guys, thank you very much for this week's Action Item, this is Peter Barris, and until next week thank you very much for participating on Wikibon's Action Item. (slow techno music)

Published Date : Jan 12 2018

SUMMARY :

and that is the role that open source is going to play and substitute that for the old incumbents, and partly because it's so easy to build IBM has been one of the leaders in contributing open source, so that people are paid to do stuff, that you want to build your business around. the lower end of the stacks, it needs the support of a large company to do that, of the open source contract, going to have to be evolved as we move forward, that are jostling for presence in the market. and just adding to it in the ML world, and the buckets for where to put the data, there are going to be many datasets were they used some of the soft reasons that motivated them to do things, and so forth of the world, There is one other that I'd like to add to that, and I acknowledge the process by which Just to elaborate on what Peter was talking about, going back to what you were saying David, are having is that they do open source is that they want and to do that you still need what has always and the stack itself is the person who is responsible and it doesn't necessarily have to be open source. Bottom line is this stuff is still and that the cost of goods sold better an OEM agreement is a much better way of doing it. and it's not as clear that the whole concept

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Neil Raden	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Neil	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Canonical	ORGANIZATION	0.99+
Peter Barris	PERSON	0.99+
Amazons	ORGANIZATION	0.99+
Horton Works	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
less than five percent	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Red Hat	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
NapR	ORGANIZATION	0.99+
Word	TITLE	0.99+
less than 15 percent	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
two functions	QUANTITY	0.99+
two capabilities	QUANTITY	0.99+
next week	DATE	0.99+
PHP	TITLE	0.99+
Python	TITLE	0.99+
MySQL	TITLE	0.99+
second incarnation	QUANTITY	0.99+
first incarnation	QUANTITY	0.99+
10 years	QUANTITY	0.98+
Palo Alto, California	LOCATION	0.98+
This week	DATE	0.98+
GDPR	TITLE	0.98+
two key	QUANTITY	0.98+
Linux	TITLE	0.98+
today	DATE	0.97+
1000 listings	QUANTITY	0.97+
one	QUANTITY	0.97+
UNIX	TITLE	0.97+
this week	DATE	0.96+
Github	ORGANIZATION	0.96+
first generation	QUANTITY	0.96+
Vmware	ORGANIZATION	0.96+
each	QUANTITY	0.95+
Kafka	TITLE	0.95+
one step	QUANTITY	0.94+
each time	QUANTITY	0.93+
JavaBeans	TITLE	0.92+
both	QUANTITY	0.91+
BEA Web Logic	ORGANIZATION	0.91+

Seth Dobrin, IBM Analytics - IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany; it's The Cube. Covering IBM; fast-track your data. Brought to you by IBM. (upbeat techno music) >> For you here at the show, generally; and specifically, what are you doing here today? >> There's really three things going on at the show, three high level things. One is we're talking about our new... How we're repositioning our hybrid data management portfolio, specifically some announcements around DB2 in a hybrid environment, and some highly transactional offerings around DB2. We're talking about our unified governance portfolio; so actually delivering a platform for unified governance that allows our clients to interact with governance and data management kind of products in a more streamlined way, and help them actually solve a problem instead of just offering products. The third is really around data science and machine learning. Specifically we're talking about our machine learning hub that we're launching here in Germany. Prior to this we had a machine learning hub in San Francisco, Toronto, one in Asia, and now we're launching one here in Europe. >> Seth, can you describe what this hub is all about? This is a data center where you're hosting machine learning services, or is it something else? >> Yeah, so this is where clients can come and learn how to do data science. They can bring their problems, bring their data to our facilities, learn how to solve a data science problem in a more team oriented way; interacting with data scientists, machine learning engineers, basically, data engineers, developers, to solve a problem for their business around data science. These previous hubs have been completely booked, so we wanted to launch them in other areas to try and expand the capacity of them. >> You're hosting a round table today, right, on the main tent? >> Yep. >> And you got a customer on, you guys going to be talking about sort of applying practices and financial and other areas. Maybe describe that a little bit. >> We have a customer on from ING, Heinrich, who's the chief architect for ING. ING, IBM, and Horton Works have a consortium, if you would, or a framework that we're doing around Apache Atlas and Ranger, as the kind of open-source operating system for our unified governance platform. So much as IBM has positioned Spark as a unified, kind of open-source operating system for analytics, for a unified governance platform... For a governance platform to be truly unified, you need to be able to integrate metadata. The biggest challenge about connecting your data environments, if you're an enterprise that was not internet born, or cloud born, is that you have proprietary metadata platforms that all want to be the master. When everyone wants to be the master, you can't really get anything done. So what we're doing around Apache Atlas is we are setting up Apache Atlas as kind of a virtual translator, if you would, or a dictionary between all the different proprietary metadata platforms so that you can get a single unified view of your data environment across hybrid clouds, on premise, in the cloud, and across different proprietary vendor platforms. Because it's open-sourced, there are these connectors that can go in and out of the proprietary platforms. >> So Seth, you seem like you're pretty tuned in to the portfolio within the analytics group. How are you spending your time as the Chief Data Officer? How do you balance it between customer visits, maybe talking about some of the products, and then you're sort of day job? >> I actually have three days jobs. My job's actually split into kind of three pieces. The first, my primary mission, is really around transforming IBM's internal business unit, internal business workings, to use data and analytics to run our business. So kind of internal business unit transformation. Part of that business unit transformation is also making sure that we're compliant with regulations like GDBR and other regulations. Another third is really around kind of rethinking our offerings from a CDO perspective. As a CDO, and as you, Dave, I've only been with IBM for seven months. As a former client recently, and as a CDO, what is it that I want to see from IBM's offerings? We kind of hit on it a little bit with the unified governance platform, where I think IBM makes fantastic products. But as a client, if a salesperson shows up to me, I don't want them selling me a product, 'cause if I want an MDM solution, I'll call you up and say, "Hey, I need an MDM solution. "Give me a quote." What I want them showing up is saying, "I have a solution that's going to solve "your governance problem across your portfolio." Or, "I'm going to solve your data science problem." Or, "I'm going to help you master your data, "and manage your data across "all these different environments." So really working with the offering management and the Dev teams to define what are these three or four, kind of business platforms that we want to settle on? We know three of them at least, right? We know that we have a hybrid data management. We have unified governance. We have data science and machine learning, and you could think of the Z franchise as a fourth platform. >> Seth, can you net out how governance relates to data science? 'Cause there is governance of the statistical models, machine learning, and so forth, version control. I mean, in an end to end machine learning pipeline, there's various versions of various artifacts they have to be managed in a structured way. Is your unified governance bundle, or portfolio, does it address those requirements? Or just the data governance? >> Yeah, so the unified governance platform really kind of focuses today on data governance and how good data governance can be an enabler of rapid data science. So if you have your data all pre-governed, it makes it much quicker to get access to data and understand what you can and can't do with data; especially being here in Europe, in the context of the EU GDPR. You need to make sure that your data scientists are doing things that are approved by the user, because basically your data, you have to give explicit consent to allow things to be done with it. But long term vision is that... essentially the output of models is data, right? And how you use and deploy those models also need to be governed. So the long term vision is that we will have a governance platform for all those things, as well. I think it makes more sense for those things to be governed in the data science platform, if you would. And we... >> We often hear separate from GDPR and all that, is something called algorithmic accountability; that more is being discussed in policy circles, in government circles around the world, as strongly related to everything you're describing. Being able to trace the lineage of any algorithmic decision back to the data, the metadata, and so forth, and the machine learning models that might have driven it. Is that where IBM's going with this portfolio? >> I think that's the natural extension of it. We're thinking really in the context of them as two different pieces, but if you solve them both and you connect them together, then you have that problem. But I think you're absolutely right. As we're leveraging machine learning and artificial intelligence, in general, we need to be able to understand how we got to a decision, and that includes the model, the data, how the data was gathered, how the data was used and processed. So it is that entire pipeline, 'cause it is a pipeline. You're not doing machine learning or AI in a vacuum. You're doing it in the context of the data, and you're doing it in the context about the individuals or the organizations that you're trying to influence with the output of those models. >> I call it Dev ops for data science. >> Seth, in the early Hadoop days, the real headwind was complexity. It still is, by the way. We know that. Companies like IBM are trying to reduce that complexity. Spark helps a little bit So the technology will evolve, we get that. It seems like one of the other big headwinds right now is that most companies don't have a great understanding of how they can take data and monetize it, turn it into value. Most companies, many anyway, make the mistake of, "Well, I don't really want to sell my data," or, "I'm not really a data supplier." And they're kind of thinking about it, maybe not in the right way. But we seem to be entering a next wave here, where people are beginning to understand I can cut costs, I can do predictive maintenance, I can maybe not sell the data, but I can enhance what I'm doing and increase my revenue, maybe my customer retention. They seem to be tuning, more so; largely, I think 'cause of the chief data officer roles, helping them think that through. I wonder if you would give us your point of view on that narrative. >> I think what you're describing is kind of the digital transformation journey. I think the end game, as enterprises go through a digital transformation, the end game is how do I sell services, outcomes, those types of things. How do I sell an outcome to my end user? That's really the end game of a digital transformation in my mind. But before you can get to that, before you transform your business's objectives, there's a couple of intermediary steps that are required for that. The first is what you're describing, is those kind of data transformations. Enterprises need to really get a handle on their data and become data driven, and start then transforming their current business model; so how do I accelerate my current business leveraging data and analytics? I kind of frame that, that's like the data science kind of transformation aspect of the digital journey. Then the next aspect of it is how do I transform my business and change my business objectives? Part of that first step is in fact, how do I optimize my supply chain? How do I optimize my workforce? How do I optimize my goals? How do I get to my current, you know, the things that Wall Street cares about for business; how do I accelerate those, make those faster, make those better, and really put my company out in front? 'Cause really in the grand scheme of things, there's two types of companies today; there's the company that's going to be the disruptor, and there's companies that's going to get disrupted. Most companies want to be the disruptors, and it's a process to do that. >> So the accounting industry doesn't have standards around valuing data as an asset, and many of us feel as though waiting for that is a mistake. You can't wait for that. You've got to figure out on your own. But again, it seems to be somewhat of a headwind because it puts data and data value in this fuzzy category. But there are clearly the data haves and the data have-nots. What are you seeing in that regard? >> I think the first... When I was in my former role, my former company went through an exercise of valuing our data and our decisions. I'm actually doing that same exercise at IBM right now. We're going through IBM, at least in the analytics business unit, the part I'm responsible for, and going to all the leaders and saying, "What decisions are you making?" "Help me understand the decisions that you're making." "Help me understand the data you need "to make those decisions." And that does two things. Number one, it does get to the point of, how can we value the decisions? 'Cause each one of those decisions has a specific value to the company. You can assign a dollar amount to it. But it also helps you change how people in the enterprise think. Because the first time you go through and ask these questions, they talk about the dashboards they want to help them make their preconceived decisions, validated by data. They have a preconceived notion of the decision they want to make. They want the data to back it up. So they want a dashboard to help them do that. So when you come in and start having this conversation, you kind of stop them and say, "Okay, what you're describing is a dashboard. "That's not a decision. "Let's talk about the decision that you want to make, "and let's understand the real value of that decision." So you're doing two things, you're building a portfolio of decisions that then becomes to your point, Jim, about Dev ops for data science. It's your backlog for your data scientists, in the long run. You then connect those decisions to data that's required to make those, and you can extrapolate the data for each decision to the component that each piece of data makes up to it. So you can group your data logically within an enterprise; customer, product, talent, location, things like that, and you can assign a value to those based on decisions they support. >> Jim: So... >> Dave: Go ahead, please. >> As a CDO, following on that, are you also, as part of that exercise, trying to assess the value of not just the data, but of data science as a capability? Or particular data science assets, like machine learning models? In the overall scheme of things, that kind of valuation can then drive IBM's decision to ramp up their internal data science initiatives, or redeploy it, or, give me a... >> That's exactly what happened. As you build this portfolio of decisions, each decision has a value. So I am now assigning a value to the data science models that my team will build. As CDOs, CDOs are a relatively new role in many organizations. When money gets tight, they say, "What's this guy doing?" (Dave laughing) Having a portfolio of decisions that's saying, "Here's real value I'm adding..." So, number one, "Here's the value I can add in the future," and as you check off those boxes, you can kind of go and say, "Here's value I've added. "Here's where I've changed how the company's operating. "Here's where I've generated X billions of dollars "of new revenue, or cost savings, or cost avoidance, "for the enterprise." >> When you went through these exercises at your previous company, and now at IBM, are you using standardized valuation methodologies? Did you kind of develop your own, or come up with a scoring system? How'd you do that? >> I think there's some things around, like net promoter score, where there's pretty good standards on how to assign value to increases in net promoter score, or decreases in net promoter score for certain aspects of your business. In other ways, you need to kind of decide as an enterprise, how do we value our assets? Do we use a three year, five year, ten year MPV? Do we use some other metric? You need to kind of frame it in the reference that your CFO is used to talking about so that it's in the context that the company is used to talking about. Most companies, it's net present value. >> Okay, and you're measuring that on an ongoing basis. >> Seth: Yep. >> And fine tuning as you go along. Seth, we're out of time. Thanks so much for coming back in The Cube. It was great to see you. >> Seth: Yeah, thanks for having me. >> You're welcome, good luck this afternoon. >> Seth: Alright. >> Keep it right there, buddy. We'll be back. Actually, let me run down the day here for you, just take a second to do that. We're going to end our Cube interviews for the morning, and then we're going to cut over to the main tent. So in about an hour, Rob Thomas is going to kick off the main tent here with a keynote, talking about where data goes next. Hilary Mason's going to be on. There's a session with Dez Blanchfield on data science as a team sport. Then the big session on changing regulations, GDPRs. Seth, you've got some customers that you're going to bring on and talk about these issues. And then, sort of balancing act, the balancing act of hybrid data. Then we're going to come back to The Cube and finish up our Cube interviews for the afternoon. There's also going to be two breakout sessions; one with Hilary Mason, and one on GDPR. You got to go to IBMgo.com and log in and register. It's all free to see those breakout sessions. Everything else is open. You don't even have to register or log in to see that. So keep it right here, everybody. Check out the main tent. Check out siliconangle.com, and of course IBMgo.com for all the action here. Fast track your data. We're live from Munich, Germany; and we'll see you a little later. (upbeat techno music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. that allows our clients to interact with governance and expand the capacity of them. And you got a customer on, you guys going to be talking about and Ranger, as the kind of open-source operating system How are you spending your time as the Chief Data Officer? and the Dev teams to define what are these three or four, I mean, in an end to end machine learning pipeline, in the data science platform, if you would. and the machine learning models that might have driven it. and you connect them together, then you have that problem. I can maybe not sell the data, How do I get to my current, you know, But again, it seems to be somewhat of a headwind of decisions that then becomes to your point, Jim, of not just the data, but of data science as a capability? and as you check off those boxes, you can kind of go and say, You need to kind of frame it in the reference that your CFO And fine tuning as you go along. and we'll see you a little later.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
ING	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
Germany	LOCATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Thomas	PERSON	0.99+
ten year	QUANTITY	0.99+
five year	QUANTITY	0.99+
seven months	QUANTITY	0.99+
Asia	LOCATION	0.99+
three year	QUANTITY	0.99+
three	QUANTITY	0.99+
four	QUANTITY	0.99+
Heinrich	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Dez Blanchfield	PERSON	0.99+
two types	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
three days	QUANTITY	0.99+
two things	QUANTITY	0.99+
each piece	QUANTITY	0.99+
today	DATE	0.99+
Dav	PERSON	0.99+
each	QUANTITY	0.99+
first	QUANTITY	0.99+
Munich, Germany	LOCATION	0.99+
third	QUANTITY	0.99+
both	QUANTITY	0.99+
billions of dollars	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
two different pieces	QUANTITY	0.98+
three things	QUANTITY	0.98+
DB2	TITLE	0.98+
first step	QUANTITY	0.98+
GDPR	TITLE	0.97+
Apache Atlas	ORGANIZATION	0.97+
fourth platform	QUANTITY	0.97+
2017	DATE	0.97+
three pieces	QUANTITY	0.97+
IBM Analytics	ORGANIZATION	0.96+
first time	QUANTITY	0.96+
single	QUANTITY	0.96+
Spark	TITLE	0.95+
Ranger	ORGANIZATION	0.91+
two breakout sessions	QUANTITY	0.88+
about an hour	QUANTITY	0.86+
each decision	QUANTITY	0.85+
Cube	COMMERCIAL_ITEM	0.84+
each one	QUANTITY	0.83+
this afternoon	DATE	0.82+
Cube	ORGANIZATION	0.8+
San Francisco, Toronto	LOCATION	0.79+
GDPRs	TITLE	0.76+
GDBR	TITLE	0.75+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Mike Merritt-Holmes, Think Big - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Covering Data Works Summit Europe 2017 brought to you by Horton Works. (uptempo, energetic music) >> Okay, welcome back everyone. We're here live in Germany at Munich for DataWorks Summit 2017, formerly Hadoop Summit. I'm John Furrier, my co-host Dave Vellante. Our next guest is Mike Merritt-Holmes, is senior Vice President of Global Services Strategy at Think Big, a Teradata company, formerly the co-founder of the Big Data Partnership merged in with Think Big and Teradata. Mike, welcome to The Cube. >> Mike: Thanks for having me. >> Great having an entrepreneur on, you're the co-founder, which means you've got that entrepreneurial blood, and I got to ask you, you know, you're in the big data space, you got to be pretty pumped by all the hype right now around AI because that certainly gives a lot of that extra, extra steroid of recognition. People love AI it gives a face to it, and certainly IOT is booming as well, Internet of Things, but big data's cruising along. >> I mean it's a great place to be. The train is certainly going very, very quickly right now. But the thing for us is, we've been doing data science and AI and trying to build business outcomes, and value for businesses for a long time. It's just great now to see this really, the data science and AI both were really starting to take effect and so companies are starting to understand it and really starting to really want to embrace it which is amazing. >> It's inspirational too, I mean I have a bunch of kids in my family, some are in college and some are in high school, even the younger generation are getting jazzed up on just software, right, but the big data stuffs been cruising along now. It's been a good, decade now of really solid DevOps culture, cloud now accelerating, but now the customers are forcing the vendors to be very deliberate in delivering great product, because the demand (chuckling) for real time, the demand for more stuff, is at an all time high. Can you elaborate your thoughts on, your reaction to what customers are doing, because they're the ones driving everyone, not to create friction, to create simplicity. >> Yeah, and you know, our customers are global organizations, trying to leverage this kind of technology, and they are, you know, doing an awesome amount of stuff right now to try to move them from, effectively, a step change in their business, whether it's, kind of, shipping companies doing preventive asset maintenance, or whether it's retailers looking to target customers in a more personalized way, or really understand who their customers are, where they come from, they're leveraging all those technologies, and really what they're doing is pushing the boundaries of all of them, and putting more demands on all of the vendors in the space to say, we want to do this quicker, faster, but more easily as well. >> And then the things that you're talking about, I want to get your thoughts on, because this is the conversation that you're having with customers, I want to extract is, have those kind of data-driven mindset questions, have come out the hype of the Hadoob. So, I mean we've been on a hype cycle for awhile, but now its back to reality. Where are we with the customer conversations, and, from your stand point, what are they working on? I mean, is it mostly IT conversation? Is it a frontoffice conversation? Is it a blend of both? Because, you know, data science kind of threads both sides of the fence there. >> Yeah, I mean certainly you can't do big data without IT being involved, but since the start, I mean, we've always been engaged with the business, it's always been about business outcome, because you bring data into a platform, you provide all this data science capability, but unless you actually find ROI from that, then there's no point, because you want to be moving the business forward, so it's always been about business engagement, but part of that has always been also about helping them to change their mindset. I don't want a report, I want to understand why you look at that report and what's the thing you're looking for, so we can start to identify that for you quicker. >> What's the coolest conversation you've been in, over the past year? >> Uh, I mean, I can't go into too much details, but I've had some amazing conversations with companies like Lego, for instance, they're an awesome company to work with. But when you start to see some of the things we're doing, we're doing some amazing object recognition with deep-learning in Japan. We're doing some ford analytics in the Nordics with deep-learning, we're doing some amazing stuff that's really pushing the boundaries, and when you start to put those deep-learning aspects into real world applications, and you start to see, customers clambering over to want to be part of that, it's a really exciting place to be. >> Let me just double-click on that for a second, because a lot of, the question I get a lot on The Cube, and certainly off-camera is, I want to do deep-learning, I want to do AI, I love machine learning, I hear, oh, it's finally coming to reality so people see it forming. How do they get started, what are some of the best practices of getting involved in deep-learning? Is it using open-source, obviously, is one avenue, but what advice would you give customers? >> From a deep-learning perspective, so I think first of all, I mean, a lot of the greatest deep-learning technologies, run open-source, as you rightly said, but I think actually there's a lot of tutorials and stuff on there, but really what you need is someone who has done it before, who knows where the pitfalls are, but also know when to use the right technology at the right time, and also to know around some of the aspects about whether using a deep-learning methodology is going to be the right approach for your business problem. Because a lot of companies are, like, we want to use this deep-learning thing, its amazing, but actually its not appropriate, necessarily, for the use case you're trying to draw from. >> It's the classic holy grail, where is it, if you don't know what you're looking for, it's hard to know when to apply it. >> And also, you've got to have enough data to utilize those methods as well, so. >> You hear a lot about the technical complexity associated with Hadoop specifically, but just ol' big data generally. I wonder if you could address that, in terms of what you're seeing, how people are dealing with that technical complexity but what other headwinds are there, in terms of adopting these new capabilities. >> Yeah, absolutely, so one of the challenges that we still see is that customers are struggling to leverage value from their platform, and normally that's because of the technical complexities. So we really, we introduced to the open-source world last month Kaylo, something you can download free of charge. It's completely open-source on the Apache license, and that really was about making it easier for customers to start to leverage the data on the platform, to self-serve injection onto that, and for data scientists to wrangle the data better. So, I think there's a real push right now about that next level up, if you like, in the technology stack to start to enable non-technical users to start to do interesting things on the platform directly, rather than asking someone to do it for them. And that, you know, we've had technologies in the PI space like Tableau, and, obviously, the (mumbling) did a data-warehouse solutions on Teradata that have been giving customers something, before and previously, but actually now they're asking for more, not just that, but more as well. And that's where we are starting to see the increases. >> So that's sort of operationalizing analytics as an example, what are some of the business complexities and challenges of actually doing that? >> That's a very good question, because, I think, when you find out great insight, and you go, wow you've built this algorithm, I've seen things I've never seen before, then the business wants to have that always on they want to know that it's that insight all the time is it changing, is it going up, is it going down do I need to change my business decisions? And doing that and making that operational means, not only just deploying it but also monitoring those models, being able to keep them up to date regularly, understanding whether those things are still accurate or not, because you don't want to be making business decisions, on algorithms that are now a bit stale. So, actually operationalizing it, is about building out an entire capability that's keeping these things accurate, online, and, therefore, there's still a bit of work to do, I think, actually in the marketplace still, around building out an operational capability. >> So you kind of got bottom-up, top-down. Bottom-up is the you know the Hadoop experiments, and then top-down is CXO saying we need to do big data. Have those two constituencies come together now, who's driving the bus? Are they aligned or is it still, sort of, a mess organizationally? >> Yeah, I mean, generally, in the organization, there's someone playing the Chief Data Officer, whether they have that as a title or a roll, ultimately someone is in charge of generating value from the data they have in the organization. But they can't do that with IT, and I think where we've seen companies struggle is where they've driven it from the bottom-up, and where they succeed is where they drive it from the top-down, because by driving it from the top-down, you really align what you're doing with the business and strategy that you have. So, the company strategy, and what you're trying to achieve, but ultimately, they both need to meet in the middle, and you can't do one without the other. >> And one of our practitioner friends, who's describing this situation in our office in Palo Alto, a couple of weeks ago. he said, you know, the challenge we have as an organization is, you've got top people saying alright, we're moving. And they start moving, the train goes, and then you've got kind of middle management, sort of behind them, and then you got the doers that are far behind, and aligning those is a huge challenge for this particular organization. How do you recommend organizations to address that alignment challenge, does Think Big have capabilities to help them through that, or is that, sort of, you got to call Accenture? >> In essence, our reason for being is to help with those kind of things, and, you know, whether it's right from the start, so, oh, my God, my Chief Data Officer or my CEO is saying we need to be doing this thing right now, come on, let's get on with it, and we help them to understand what does that mean, what are the use cases, how, where's the value going to come from, what's that architecting to look like, or whether its helping them to build out capability, in terms of data science or building out the cluster itself, and then managing that and providing training for staff. Our whole reason for being is supporting that transformation as a business, from, oh, my God, what do I do about this thing, to, I'm fully embracing it, I know what's going on, I'm enabling my business, and I'm completely comfortable with that world. >> There was a lot talk three, or four or five years ago, about the ROI of so-called big data initiatives, not being really, you know, there were edge cases which were huge ROI, but there was a lot of talk about not a lot of return. My question is, has that, first question, has that changed, are you starting to see much bigger phone numbers coming back where the executives are saying yeah, lets double down on this. >> Definitely, I'm definitely seeing that. I mean, I think it's fair to say that companies are a bit nervous about reporting their ROI around this stuff, in some cases, so there's more ROI out there than you necessarily see out in the public place, but-- >> Why is that? Because they don't want to expose to the competition, or they don't want to front run their earnings, or whatever it is? >> They're trying to get a competitive edge. The minute you start saying, we're doing this, their competitors have an opportunity to catch up. >> John: Very secretive. >> Yeah and I think, it's not necessarily about what they're doing, it's about keeping the edge over their customers, really, over their competitors. So, but what we're seeing is that many customers are getting a lot of ROI more recently because they're able to execute better, rather than being struggling with the IT problems, and even just recently, for instance, we had a customer of ours, the CEO phones us up and says, you know what, we've got this problem with our sales. We don't really know why this is going down, you know, in this country, in this part of the world, it's going up, in this country, it's going down, we don't know why, and that's making us very nervous. Could you come in and just get the data together, work out why it's happening, so that we can understand what it is. And we came in, and within weeks, we were able to give them a very good insight into exactly why that is, and they changed their strategy, moving forward, for the next year, to focus on addressing that problem, and that's really amazing ROI for a company to be able to get that insight. Now, we're working with them to operationalize that, so that particular insight is always available to them, and that's an example of how companies are now starting to see that ROI come through, and a lot of it is about being able to articulate the right business question, rather than trying to worry about reports. What is the business question I'm trying to solve or answer, and that's when you can start to see the ROI come through. >> Can you talk about the customer orientation when they get to that insight, because you mentioned earlier that they got used to the reports, and you mentioned visualization, Tableau, they become table states, once you get addicted to the visualization, you want to extract more insights so the pressure seems to be getting more insight. So, two questions, process gap around what they need to do process-wise, and then just organizational behavior. Are they there mentally, what are some of the criteria in your mind, in your experiments, with customers around the processes that they go through, and then organizational mindset. >> Yeah, so what I would say is, first of all, from an organizational mindset perspective, it's very important to start educating, not just the analysis team, but the entire business on what this whole machine-learning, big data thing is all about, and how to ask the right questions. So, really starting to think about the opportunities you have to move your business forward, rather than what you already know, and think forward rather than retrospective. So, the other thing we often have to teach people, as well, is that this isn't about what you can get from the data warehouse, or replacing your data warehouse or anything like that. It's about answering the right questions, with the right tools, and here is a whole set of tools that allow you to answer different questions that you couldn't before, so leverage them. So, that's very important, and so that mindset requires time actually, to transform business into that mindset, and a lot of commitment from the business to make that happen. >> So, mindset first, and then you look at the process, then you get to the product. >> Yep, so, and basically, once you have that mindset, you need to set up an engine that's going to run, and start to drive the ROI out, and the engine includes, you know, your technical folk, but also your business users, and that engine will then start to build up momentum. The momentum builds more interest, and, overtime, you start to get your entire business into using these tools. >> It kind of makes sense, just kind of riffing in real time here, so the product-gap conversation should probably come after you lay that out first, right? >> Totally, yeah, I mean, you don't choose a product before you know what you need to do with it. So, but actually often companies don't know what they need to do with it, because they've got the wrong mindset in the first place. And so part of the road map stuff that we do, that we have a road map offering, is about changing that mindset, and helping them to get through that first stage, where we start to put, articulate the right use cases, and that really is driving a lot of value for our customers. Because they start from the right place-- >> Sometimes we hear stories, like the product kind of gives them a blind spot, because they tend to go into, with a product mindset first, and that kind of gives them some baggage, if you will. >> Well, yeah, because you end up with a situation, where you go, you get a product in, and then you say what can we do with it. Or, in fact, what happens is the vendor will say, these are the things you could do, and they give you use cases. >> It constrains things, forecloses tons of opportunities, because you're stuck within a product mindset. >> Yeah, exactly that, and you're not, you don't want to be constrained. And that's why open-source, and the kind of ecosystem that we have within the big data space is so powerful, because there's so many different tools for different things but don't choose your tool until you know what you're trying to achieve. >> I have a market question, maybe you just give us opinion, caveat, if you like, it's sort of a global, macro view. When we started first looking at the big data market, we noticed right away the dominant portion of revenue was coming from services. Hardware was commodity, so, you know, maybe sort of less than you would, obviously, in a mainframe world, and open-source software has a smaller contribution, so services dominated, and, frankly, has continued to dominate, since the early days. Do you see that changing, or do you think those percentages, if you will, will stay relatively constant? >> Well, I think it will change over time, but not in the near future, for sure, there's too much advancement in the technology landscape for that to stop, so if you had a set of tools that weren't really evolving, becoming very mature, and that's what tools you had, ultimately, the skill sets around them start to grow, and it becomes much easier to develop stuff, and then companies start to build out industry- or solutions-specific stuff on top, and it makes it very easy to build products. When you have an ecosystem that's evolving, growing with the speed it is, you're constantly trying to keep up with that technology, and, therefore, services have to play an awful big part in making sure that you are using the right technology, at the right time, and so, for the near future, for certain, that won't change. >> Complexity is your friend. >> Yeah, absolutely. Well, you know, we live in a complex world, but we live and breathe this stuff, so what's complex to some is not to us, and that's why we add value, I guess. >> Mike Merritt-Holmes here inside The Cube with Teradata Think Big. Thanks for spending the time sharing your insights. >> Thank you for having me. >> Understand the organizational mindset, identify the process, then figure out the products. That's the insight here on The Cube, more coverage of Data Works Summit 2017, here in Germany after this short break. (upbeat electronic music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Horton Works. formerly the co-founder of and I got to ask you, you know, I mean it's a great place to be. but the big data stuffs and they are, you know, of the fence there. that for you quicker. and when you start to put but what advice would you give customers? a lot of the greatest if you don't know what you're looking for, got to have enough data I wonder if you could address that, and for data scientists to and you go, wow you've Bottom-up is the you know and you can't do one without the other. and then you got the is to help with those kind of things, not being really, you know, in the public place, but-- The minute you start and that's when you can start so the pressure seems to and a lot of commitment from the business then you get to the product. and the engine includes, you and helping them to get because they tend to go into, and then you say what can we do with it. because you're stuck and the kind of ecosystem that we have of less than you would, and so, for the near future, Well, you know, we live Thanks for spending the identify the process, then

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Japan	LOCATION	0.99+
Mike	PERSON	0.99+
John Furrier	PERSON	0.99+
Lego	ORGANIZATION	0.99+
Mike Merritt-Holmes	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Think Big	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
first question	QUANTITY	0.99+
Munich	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
last month	DATE	0.99+
one	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Big Data Partnership	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.98+
two constituencies	QUANTITY	0.98+
next year	DATE	0.98+
first	QUANTITY	0.98+
Nordics	LOCATION	0.98+
first stage	QUANTITY	0.98+
#DW17	EVENT	0.97+
Data Works Summit 2017	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.96+
Tableau	TITLE	0.95+
Hadoop	TITLE	0.95+
four	DATE	0.93+
Hadoop Summit	EVENT	0.93+
five years ago	DATE	0.9+
Apache	TITLE	0.89+
The Cube	ORGANIZATION	0.87+
Vice President	PERSON	0.87+
Data Works Summit Europe 2017	EVENT	0.83+
a couple of weeks ago	DATE	0.82+
one avenue	QUANTITY	0.82+
DataWorks Summit Europe 2017	EVENT	0.8+
Kaylo	PERSON	0.8+
past year	DATE	0.79+
Global Services Strategy	ORGANIZATION	0.79+
Teradata Think Big	ORGANIZATION	0.77+
three	QUANTITY	0.76+
double	QUANTITY	0.75+
Think Big -	EVENT	0.71+
Covering	EVENT	0.69+
Hadoob	ORGANIZATION	0.62+
decade	QUANTITY	0.58+
second	QUANTITY	0.58+
Cube	COMMERCIAL_ITEM	0.56+
CXO	PERSON	0.48+
Cube	ORGANIZATION	0.46+
#theCUBE	ORGANIZATION	0.45+

Arun Murthy, Hortonworks - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> [Announcer] Live, from Boston, Massachusetts, it's the Cube, covering Spark Summit East 2017, brought to you by Data Breaks. Now, your host, Dave Alante and George Gilbert. >> Welcome back to snowy Boston everybody, this is The Cube, the leader in live tech coverage. Arun Murthy is here, he's the founder and vice president of engineering at Horton Works, father of YARN, can I call you that, godfather of YARN, is that fair, or? (laughs) Anyway. He's so, so modest. Welcome back to the Cube, it's great to see you. >> Pleasure to have you. >> Coming off the big keynote, (laughs) you ended the session this morning, so that was great. Glad you made it in to Boston, and uh, lot of talk about security and governance, you know we've been talking about that years, it feels like it's truly starting to come into the main stream Arun, so. >> Well I think it's just a reflection of what customers are doing with the tech now. Now, three, four years ago, a lot of it was pilots, a lot of it was, you know, people playing with the tech. But increasingly, it's about, you know, people actually applying stuff in production, having data, system of record, running workloads both on prem and on the cloud, cloud is sort of becoming more and more real at mainstream enterprises. So a lot of it means, as you take any of the examples today any interesting app will have some sort of real time data feed, it's probably coming out from a cell phone or sensor which means that data is actually not, in most cases not coming on prem, it's actually getting collected in a local cloud somewhere, it's just more cost effective, why would we put up 25 data centers if you don't have to, right? So then you got to connect that data, production data you have or customer data you have or data you might have purchased and then join them up, run some interesting analytics, do geobased real time threat detection, cyber security. A lot of it means that you need a common way to secure data, govern it, and that's where we see the action, I think it's a really good sign for the market and for the community that people are pushing on these dimensions of the broader, because, getting pushed in this dimension because it means that people are actually using it for real production work loads. >> Well in the early days of Hadoop you really didn't talk that much about cloud. >> Yeah. >> You know, and now, >> Absolutely. >> It's like, you know, duh, cloud. >> Yeah. >> It's everywhere, and of course the whole hybrid cloud thing comes into play, what are you seeing there, what are things you can do in a hybrid, you know, or on prem that you can't do in a public cloud and what's the dynamic look like? >> Well, it's definitely not an either or, right? So what we're seeing is increasingly interesting apps need data which are born in the cloud and they'll stay in the cloud, but they also need transactional data which stays on prem, you might have an EDW for example, right? >> Right. >> There's not a lot of, you know, people want to solve business problems and not just move data from one place to another, right? Or back from one place to another, so it's not interesting to move an EDW to the cloud, and similarly it's not interesting to bring your IOT data or sensor data back into on-prem, right? Just makes sense. So naturally what happens is, you know, at Hortonworks we talk of kinds of modern app or a modern data app, which means a modern data app has to spare, has to sort of, you know, it can pass both on-prem data and cloud data. >> Yeah, you talked about that in your keynote years ago. Furio said that the data is the new development kit. And now you're seeing the apps are just so dang rich, >> Exactly, exactly. >> And they have to span >> Absolutely. >> physical locations, >> Yeah. >> But then this whole thing of IOT comes up, we've been having a conversation on The Cube, last several Cubes of, okay, how much stays out, how much stays in, there's a lot of debates about that, there's reasons not to bring it in, but you talked today about some of the important stuff will come back. >> Yeah. >> So the way this is, this all is going to be, you know, there's a lot of data that should be born in the cloud and stay there, the IOT data, but then what will happen increasingly is, key summaries of the data will move back and forth, so key summaries of your EDW will move to the cloud, sometimes key summaries of your IOT data, you know, you want to do some sort of historical training in analytics, that will come back on-prem, so I think there's a bi-directional data movement, but it just won't be all the data, right? It'll be key interesting summaries of the data but not all of it. >> And a lot of times, people say well it doesn't matter where it lives, cloud should be an operating model, not a place where you put data or applications, and while that's true and we would agree with that, from a customer standpoint it matters in terms of performance and latency issues and cost and regulation, >> And security and governance. >> Yeah. >> Absolutely. >> You need to think those things through. >> Exactly, so I mean, so that's what we're focused on, to make sure that you have a common security and governance model regardless of where data is, so you can think of it as, infrastructure you own and infrastructure you lease. >> Right. >> Right? Now, the details matter of course, when you go to the cloud you lose S3 for example or ADLS from Microsoft, but you got to make sure that there's a common sort of security governance front and top of it, in front of it, as an example one of the things that, you know, in the open source community, Ranger's a really sort of key project right now from a security authorization and authentication standpoint. We've done a lot of work with our friends at Microsoft to make sure, you can actually now manage data in Wasabi which is their object store, data stream, natively with Ranger, so you can set a policy that says only Dave can access these files, you know, George can access these columns, that sort of stuff is natively done on the Microsoft platform thanks to the relationship we have with them. >> Right. >> So that's actually really interesting for the open source communities. So you've talked about sort of commodity storage at the bottom layer and even if they're different sort of interfaces and implementations, it's still commodity storage, and now what's really helpful to customers is that they have a common security model, >> Exactly. >> Authorization, authentication, >> Authentication, lineage prominence, >> Oh okay. >> You want to make sure all of these are common sources across. >> But you've mentioned off of the different data patterns, like the stuff that might be streaming in on the cloud, what, assuming you're not putting it into just a file system or an object store, and you want to sort of merge it with >> Yeah. >> Historical data, so what are some of the data stores other than the file system, in other words, newfangled databases to manage this sort of interaction? >> So I think what you're saying is, we certainly have the raw data, the raw data is going to line up in whatever cloud native storage, >> Yeah. >> It's going to be Amazon, Wasabi, ADLS, Google Storage. But then increasingly you want, so now the patterns change so you have raw data, you have some sort of an ETL process, what's interesting in the cloud is that even the process data or, if you take the unstructured raw data and structure it, that structured data also needs to live on the cloud platform, right? The reason that's important is because A, it's cheaper to use the native platform rather than set up your own database on top of it. The other one is you also want to take advantage of all the native sources that the cloud storage provides, so for example, linking your application. So automatically data in Wasabi, you know, if you can set up a policy and easily say this structured data stable that I have of which is a summary of all the IOT activity in the last 24 hours, you can, using the cloud provider's technologies you can actually make it show up easily in Europe, like you don't have to do any work, right? So increasingly what we Hortonworks focused a lot on is to make sure that we, all of the computer engines, whether it's Spark or Hive or, you know, or MapReduce, it doesn't really matter, they're all natively working on the cloud provider's storage platform. >> [George] Okay. >> Right, so, >> Okay. >> That's a really key consideration for us. >> And the follow up to that, you know, there's a bit of a misconception that Spark replaces Hadoop, but it actually can be a processing, a compute engine for, >> Yeah. >> That can compliment or replace some of the compute engines in Hadoop, help us frame, how you talk about it with your customers. >> For us it's really simple, like in the past, the only option you had on Hadoop to do any computation was MapReduce, that was, I started working in MapReduce 11 years ago, so as you can imagine, it's a pretty good run for any technology, right? Spark is definitely the interesting sort of engine for sort of the, anything from mission learning to ETL for data on top of Hadoop. But again, what we focus a lot on is to make sure that every time we bring in, so right now, when we started on HTP, the first on HTP had about nine open source projects literally just nine. Today, the last one we shipped was 2.5, HTP 2.5 had about 27 I think, like it's a huge sort of explosion, right? But the problem with that is not just that we have 27 projects, the problem is that you're going to make sure each of the 27 work with all the 26 others. >> It's a QA nightmare. >> Exactly. So that integration is really key, so same thing with Spark, we want to make sure you have security and YARN (mumbles), like you saw in the demo today, you can now run Spark SQL but also make sure you get low level (mumbles) masking, all of the enterprise capabilities that you need, and I was at a financial services three or four weeks ago in Chicago. Today, to do equivalent of what I showed today on demo, they need literally, they have a classic ADW, and they have to maintain anywhere between 1500 to 2500 views of the same database, that's a nightmare as you can imagine. Now the fact that you can do this on the raw data using whether it's Hive or Spark or Peg or MapReduce, it doesn't really matter, it's really key, and that's the thing we push to make sure things like YARN security work across all the stacks, all the open source techs. >> So that makes life better, a simplification use case if you will, >> Yeah. >> What are some of the other use cases that you're seeing things like Spark enable? >> Machine learning is a really big one. Increasingly, every product is going to have some, people call it, machine learning and AI and deep learning, there's a lot of techniques out there, but the key part is you want to build a predictive model, in the past (mumbles) everybody want to build a model and score what's happening in the real world against model, but equally important make sure the model gets updated as more data comes in on and actually as the model scores does get smaller over time. So that's something we see all over, so for example, even within our own product, it's not just us enabling this for the customer, for example at Hortonworks we have a product called SmartSense which allows you to optimize how people use Hadoop. Where the, what are the opportunities for you to explore deficiencies within your own Hadoop system, whether it's Spark or Hive, right? So we now put mesh learning into SmartSense. And show you that customers who are running queries like you are running, Mr. Customer X, other customers like you are tuning Hadoop this way, they're running this sort of config, they're using these sort of features in Hadoop. That allows us to actually make the product itself better all the way down the pipe. >> So you're improving the scoring algorithm or you're sort of replacing it with something better? >> What we're doing there is just helping them optimize their Hadoop deploys. >> Yep. >> Right? You know, configuration and tuning and kernel settings and network settings, we do that automatically with SmartSense. >> But the customer, you talked about scoring and trying to, >> Yeah. >> They're tuning that, improving that and increasing the probability of it's accuracy, or is it? >> It's both. >> Okay. >> So the thing is what they do is, you initially come with a hypothesis, you have some amount of data, right? I'm a big believer that over time, more data, you're better off spending more, getting more data into the system than to tune that algorithm financially, right? >> Interesting, okay. >> Right, so you know, for example, you know, talk to any of the big guys on Facebook because they'll do the same, what they'll say is it's much better to get, to spend your time getting 10x data to the system and improving the model rather than spending 10x the time and improving the model itself on day one. >> Yeah, but that's a key choice, because you got to >> Exactly. >> Spend money on doing either, >> One of them. >> And you're saying go for the data. >> Go for the data. >> At least now. >> Yeah, go for data, what happens is the good part of that is it's not just the model, it's the, what you got to really get through is the entire end to end flow. >> Yeah. >> All the way from data aggregation to ingestion to collection to scoring, all that aspect, you're better off sort of walking through the paces like building the entire end to end product rather than spending time in a silo trying to make a lot of change. >> We've talked to a lot of machine learning tool vendors, application vendors, and it seems like we got to the point with Big Data where we put it in a repository then we started doing better at curating it and understanding it then starting to do a little bit exploration with business intelligence, but with machine learning, we don't have something that does this end to end, you know, from acquiring the data, building the model to operationalizing it, where are we on that, who should we look to for that? >> It's definitely very early, I mean if you look at, even the EDW space, for example, what is EDW? EDW is ingestion, ETL, and then sort of fast query layer, Olap BI, on and on and on, right? So that's the full EDW flow, I don't think as a market, I mean, it's really early in this space, not only as an overall industry, we have that end to end sort of industrialized design concept, it's going to take time, but a lot of people are ahead, you know, the Google's a world ahead, over time a lot of people will catch up. >> We got to go, I wish we had more time, I had so many other questions for you but I know time is tight in our schedule, so thanks so much Arun, >> Appreciate it. For coming on, appreciate it, alright, keep right there everybody, we'll be back with our next guest, it's The Cube, we're live from Spark Summit East in Boston, right back. (upbeat music)

Published Date : Feb 9 2017

SUMMARY :

brought to you by Data Breaks. father of YARN, can I call you that, Glad you made it in to Boston, So a lot of it means, as you take any of the examples today you really didn't talk that has to sort of, you know, it can pass both on-prem data Yeah, you talked about that in your keynote years ago. but you talked today about some of the important stuff So the way this is, this all is going to be, you know, And security and You need to think those so that's what we're focused on, to make sure that you have as an example one of the things that, you know, in the open So that's actually really interesting for the open source You want to make sure all of these are common sources in the last 24 hours, you can, using the cloud provider's in Hadoop, help us frame, how you talk about it with like in the past, the only option you had on Hadoop all of the enterprise capabilities that you need, Where the, what are the opportunities for you to explore What we're doing there is just helping them optimize and network settings, we do that automatically for example, you know, talk to any of the big guys is it's not just the model, it's the, what you got to really like building the entire end to end product rather than but a lot of people are ahead, you know, the Google's everybody, we'll be back with our next guest, it's The Cube,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dave Alante	PERSON	0.99+
Arun Murthy	PERSON	0.99+
Europe	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
Boston	LOCATION	0.99+
Chicago	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
George	PERSON	0.99+
Arun	PERSON	0.99+
Wasabi	ORGANIZATION	0.99+
25 data centers	QUANTITY	0.99+
Today	DATE	0.99+
Hadoop	TITLE	0.99+
Wasabi	LOCATION	0.99+
YARN	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
ADLS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Horton Works	ORGANIZATION	0.99+
today	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
1500	QUANTITY	0.98+
SmartSense	TITLE	0.98+
S3	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
One	QUANTITY	0.98+
27 projects	QUANTITY	0.98+
three	DATE	0.98+
Google	ORGANIZATION	0.98+
Furio	PERSON	0.98+
Spark	TITLE	0.98+
2500 views	QUANTITY	0.98+
first	QUANTITY	0.97+
Spark Summit East	LOCATION	0.97+
both	QUANTITY	0.97+
Spark SQL	TITLE	0.97+
Google Storage	ORGANIZATION	0.97+
26	QUANTITY	0.96+
Ranger	ORGANIZATION	0.96+
four weeks ago	DATE	0.95+
one	QUANTITY	0.94+
each	QUANTITY	0.94+
four years ago	DATE	0.94+
11 years ago	DATE	0.93+
27 work	QUANTITY	0.9+
MapReduce	TITLE	0.89+
Hive	TITLE	0.89+
this morning	DATE	0.88+
EDW	TITLE	0.88+
about nine open source	QUANTITY	0.88+
day one	QUANTITY	0.87+
nine	QUANTITY	0.86+
years	DATE	0.84+
Olap	TITLE	0.83+
Cube	ORGANIZATION	0.81+
a lot of data	QUANTITY	0.8+

Shaun Connolly, Hortonworks - BigDataNYC - #BigDataNYC - #theCUBE

(upbeat electronic music) >> Male Voiceover: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors Sisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts. Dave Vellante and Peter Burress. >> We're back in the Big Apple. This is the Cube, the worldwide leader in live tech coverage, we're here at Big Data NYC, Big Data week is part of strata plus dupe world. Shaun Connolly is here as the vice president of strategy at Horton Works, long time friend and Cube alum, great to see you again. >> Thanks for having me, were back at the same venue last year, always a pleasure. >> Yeah, it's good, we're growing, I guess the event's growing, we haven't been over there yet, but some of our guys have, but what's it like over there? >> You know, it feels the same, some of the different use cases, I think last year was streaming, we're hearing more machine learning and things like that as far as use cases, so similar vibe. >> Yeah, so things are evolving, right? How's Hortonworks evolving? >> We're continuing to report our quarterly earnings as the only publicly traded company in this space, things from a business perspective are doing well. Our connected data platforms strategy which we unveiled at the beginning of this year, which is written data in motion and data at rest and enabling these new gen transformational applications continues to play out. The data in motion piece is sort of decoupled and unrelated to a hadou platform, it's really about acquiring and handling the FedEx for data delivery type notions, data logistics, secure transmission. That's based on the Apache Ni-Fi tech that was originally built sort of at the NSA over the past eight years, so. Really a nice robust piece of technology that we've pushed out to the edge in our latest release so you can really skin these down into a secure site to site transmission. A lot of sophisticated capabilities there, so we're seeing a lot of uptake in that sort of architectural vision, the products are maturing, both on prem and in the cloud, things are pretty exciting. >> Well this cloud thing seems pretty real. (Shaun laughing) You can get a lot of traction, right? Everybody kind of knew it was coming, but what are you seeing? >> Yeah so it was, I guess I started the journey back in 2009, when I was at Springsource in Paul Moretz was CEO of Vmware, and that was pre sort of cloud at that time. We were talking about this notion of platform as a service, and things like that. And that resonated really well with folks back then, but their main ask was how do you solve the data problem, how do actually get the data to the apps that need it. Fast forward to 2016, I think it's been a lot of open source innovation, you know a lot of commercial innovation, the rise of cloud for providing a fast path to value, booting up these used cases, it's a fascinating transition to watch. Many of our customers are, people use the word hybrid. What that means to me is they'll have data center workloads, or multi data center workloads, but they also have cloud workloads, sometimes even multi cloud workloads, and that inherent nature of the beast is why I use sort of the term of connected data architecture, is y%ou need an architecture that inherently is built to span that fact. And that's just increasing, that's just the world we live in today. >> But the fact is because there speed of light issues, there's data fidelity issues. >> Shaun: Yup. >> There's other types of things, how are you starting to see those practical and very physical realities start to impact the whole concept of design as it pertains to data, as it pertains to analytics, as it pertains to the infrastructure associated with the two? >> Yup, so at Hoop Summit that we had last June, there were really some really good sessions that were there. Folks like Comcast, Ford, Schlumberger talked about this connected data architecture reality, right. If you look at like, I like to use the connected car ecosystem as a good example, cause there were insurance providers and others that were sort of speaking on behalf of that, where you have the cars and other data that's inherently born up there, and there's a slug of use cases that are around edge analytics, streaming analytics, time series analytics, and we're seeing that, and I think the cloud lends itself really well for those types of use cases. But we also see manufacturing line data for the cars, where you want to get a 360 degree view of operational issues, and dovetail that with manufacturing line elements, and that's inherently what we've seen is, what your classic sort of on prem data wake, in quotes has been used for so you can get that 360 degree operational intelligence type of analytics to come out of that, right? So that type of use case, whether you apply it to oil and gas and having the sensors on the oil rigs, in the Schlumberger example, that pattern is repeating itself across different industries. British Gas, in Europe talks about how they're fundamentally changing the nature of the relationship with their customer because of the smart meters, and their connectivity in the homes and they can deliver a better value there. So that's inherently connected data realm, there's cloud use cases, and in the data center use cases. So I see these use cases, you know, they'll be use case specific in applications that are sprinkled across that fabric, if you will. And that's really what we're seeing. >> At our panel last year here in this venue, we would talk about a lot of things, one was the market, the sort of ebbs and flows you just mentioned, you guys are the only public player, Talon's joining that crew. >> Shaun: Yeah. Excellent. >> You've seen some. >> Shaun: We need more. >> We need more, we've seen some MNA, Plat 4 taken out, I don't know if that was, I don't know the specifics of that deal. Might have been an acu hire, might not, I don't know. And Data Mere did a raise, so you're seeing these rip currents, in all directions. What are you seeing in the marketplace, lot of funding early on, lot of players, lot of innovation, and now it's like, okay, the music at some point's going to stop, but. >> Yeah. >> What's your take? >> So in our last call, and I think we repeated it on our prior earnings call, you know, our focus and then we put out there in our earnings, in our Q3 earnings will sort of reiterate where we stand is, we basically said Q4 is when we look to go adjust to even or break even. >> Right. >> And then 2017 we'll go from there. We reiterated that guidance, we had a little over 62 million in billings for the quarter, so the business is pretty robust and growing, it's a. We're only five years into this, I mean we're just five years old, so it's a very fast pace of billings growth, right? That's almost a 250 million run rate, right? For exiting that quarter. You know, annual run rate. So we see a lot of the use cases really continuing to move on. I think what I and what our customers ask us is, they're on a digital transformation journey, and they want the industry to start talking about those types of business value drivers, right? So I think we should expect to see a transition from the piece parts animals in the zoo and what's the right open source piece of technology, and more why should you care, right? As a business, how is this transforming what you do? How does this open up new lines of business? We started seeing that at Hadoop Summit when I think about two dozen customers were sharing, very rich stories, right? So that's where things are. But I think running a company is, you have to run it with a certain sense of rigor and that was one of the reasons why we chose to go public, right? >> So, we by the way, we totally agree that customers want to stop talking about digital business in platitudes and start actually identifying specifically what is it about it that's new and different, and find ways of doing it. >> Shaun: Sure. >> Coming back to the issue, however, of how you go about making some of those transformations relevant. There is clearly a knowledge gap about what digital business is, what it isn't, certainly. But there's also a fair amount of skills that have yet to be developed, that are required for a lot of the use cases that companies are pursuing. Not just in terms of implementing the technology appropriately, but actually constructing and conceptualizing the use cases. >> Shaun: Sure. >> So that suggests that there's two paths forward. There's a path forward where we can do a better job of diffusing knowledge through people, and there's a path for where we can do a better job of building software that's easier to use. >> Shaun: Mm hmm. >> And there's both. How do you see this playing out over the course of the next few years? >> Yep, and I think in any new area as technology's emerging, like one of the things I use is Apache Software Foundation. Literally every other week there's a new data related Apache project that lands, so it's. It can be really confusing, but it's exhilarating from the fact of I participate in that, and I try and figure out what ones we can harness in a consumable platform, whether it's one prem or a cloud or what have you. What use cases can it light up? So I think you have both of those vectors, and it really depends on, I like to use the classic software adoption curve, you have a lot of the left side of the chasm folks, where a lot of this new stuff is going to be sharper edges, and they're always going to be trailblazers, right? But we are also seeing a lot of some of these advanced analytics. Some of these new solutions are automating the pipeline, so you can actually let the infrastructure and these engines do more of the thinking for you, so you get your model's output. Even to the point where you run multi model simulation in parallel, and out pops the best fit. That's where things will head, right? I think it's just a matter of the technology maturing, making sure we address things like security, metadata management, governance, and those illities that the enterprise expects, and then really forcing ourselves to simplify and automate as much as possible, right. And that was one of the reasons on that last one why in October 2011 we basically chose Teradata and Microsoft as key partners. Teradata because in 2011, clearly, right? >> Peter: Teradata. >> They're Teradata, right? Microsoft because it simplifies technologies and brings them to billions of users, right? And so we need to do both, you need to harden it, right? For the most rigorous large enterprises, but you need to simplify it for the meat of the market adopters, right? The early majority and late majority. You have to do both. >> Shaun, you're sitting across from a CEO, and you have to say these are the three things you need to do to enact this digital transformation. >> Shaun: Yup. >> What are the three things you're telling him? >> So, I think they need as a business to identify how do they want to leverage data as capital, and what pockets of value do they want to go chase, number one. Number two, how is their business being impacted by the fact that you have the rise of IOT and inherently increasing connected society and infrastructure. How is that impacting them? And number three is, how do they evolve what they're used to doing, right? You have to align it, exactly. >> Because that's really many respects of, I like to say there's a difference between invention and innovation. Invention is the engineering act, innovation's a social act, it's adopting those new practices >> Shaun: Exactly. >> That actually allow you to enact the invention and generate revenue. >> Exactly. Now in our space, I think we have a very compelling renovate value prop which is a cost savings where you can drive cost out, but the innovate use cases are the ones. Like if all you're going to do is renovate, then you will fail, you will stall, right? Because it's not a balance of cost savings. It's about how do you actually transform your business. And in the case of like the British Gas example, I used that as how they engaged that end consumer is fundamentally changing. So that's the question I put back in those conversations is how do you want to evolve your business and how do you leverage data as capital? Because the beauty of data as capital is you can actually generate multiple lines of interest off of a single data set, cause you can derive different insights off of that, so it's not like a dollar, right? And single compound, it's multiple compound annual interest rate on that. But they have to chase the right use cases. >> Although, we've also learned from great design that if you do the right thing better, you get rid of a lot waste and so coming back to your point, doing the right thing better often leads to cost savings. >> Yes. Exactly. One inherently can drive the other, but if you're just driving it then >> Peter: Just doing cost. >> You're not going to transform your buisiness. >> Peter: You're just going to continue to do the same or wrong things worse. >> Shaun: Exactly. >> Or wrong things cheaper. >> And that's difficult for enterprises. Because there's a certain way to do data management inherently inside in a highly structured manner, but I do think the rise of like IOT, I don't see as a market, I see it as infinite slices of prosciutto, right? (laughter) It's a very thinly sliced set of market opportunities, right? But it's forcing people to think about different use cases and how that might impact their business. >> We see those set of capabilities. >> Yup. >> Which leads to the prosciutto. >> Exactly. >> So you, and come up with a really nice sandwich. (laughter) >> It's my Italian. >> Let's keep going. >> I'm loving it. >> I'm getting a little hungry. >> You have always made a big deal out of your partnerships not being barney deals but being deep integration relationships. So you mentioned two here, Teradata and Microsoft. As the cloud becomes more prevalent, as things evolve and machine learning becomes the hot buzzword, et cetera. How have you evolved those relationships specifically in terms of the integration work that you've done? Have you kept up that engineering ethos, or? >> And that was the thing. With Microsoft, we clearly spent a lot of sweat equity on the Azure HDInsight service, but if you look at that ecosystem, they have Azure machine learning, right? They have a whole raft of services, right, that you can apply to the data when it's in the cloud, right? So how that piece integrates with the broader ecosystem of services is a lot of engineering work as well. I've always said, there's work to be done in our green box, but the other half of the work is how it plumbs into the rest. And so if you look at the AWS ecosystem, how do you optimize for S3 as a storage tier, and ephemeral workloads where HDFS is maybe a caching mechanism but it's not your primary storage, right? It brings up really interesting integration modes and how you actually bring your value out into really interesting use cases, right? So I think it's opened up a lot of areas where we can drive a lot more integration, drive the open source tech in a way that's relevant for those use cases. >> Alright, we got to go but, summit in Tokyo, is it next month? >> Yes, end of October. >> End of October. >> It's our first time, so primarily summits have been US and Europe. We had Melbourne end of August, and we have Tokyo end of October. I'll be, they're bringing the right hander out of retirement, so I'll be onstage in Tokyo. (laughing) I've usually been behind the scenes. >> Throwing the slurve? (laughter) >> Yeah, exactly. So I'm looking forward to it, it'll be exciting. >> Alright, good, and then 17, you're going to start again in the spring. >> Shaun: Yup. >> You're in Munich. >> Shaun: Yup. Munich. >> You were in Dublin last year, you're moving to Munich this year. >> Shaun: Exactly. >> Hopefully the Cube will be back, in Munich, alright? >> We love you guys, you guys do a good job. >> Let's make it happen, do good stuff in Europe, so thanks again for coming out. >> Shaun: Thanks for having me. >> Always a pleasure. Alright, keep it right there, we'll be back right after this short break. This is the Cube, we're live from New York City. ( upbeat electronic music)

Published Date : Sep 29 2016

SUMMARY :

Brought to you by headline sponsors and Cube alum, great to see you again. at the same venue last the same, some of the of at the NSA over the but what are you seeing? nature of the beast is why I use But the fact is because there in the data center use cases. and flows you just mentioned, you guys Shaun: Yeah. okay, the music at some So in our last call, and I think so the business is pretty of doing it. for a lot of the use and there's a path for where we can do a of the next few years? the pipeline, so you can actually let the for the meat of the market and you have to say these by the fact that you have the rise of IOT Invention is the engineering you to enact the invention And in the case of like that if you do the right thing better, One inherently can drive the other, You're not going to to do the same or wrong things worse. But it's forcing people to think about So you, and come up with of the integration work of sweat equity on the of August, and we have to it, it'll be exciting. start again in the spring. Shaun: Yup. to Munich this year. We love you guys, so thanks again for coming out. This is the Cube, we're

ENTITIES

Entity	Category	Confidence
Shaun	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dublin	LOCATION	0.99+
Nvidia	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Sisco	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
2011	DATE	0.99+
British Gas	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
Peter	PERSON	0.99+
Shaun Connolly	PERSON	0.99+
October 2011	DATE	0.99+
Tokyo	LOCATION	0.99+
New York City	LOCATION	0.99+
Apache Software Foundation	ORGANIZATION	0.99+
2009	DATE	0.99+
2016	DATE	0.99+
two	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
one	QUANTITY	0.99+
last year	DATE	0.99+
Vmware	ORGANIZATION	0.99+
2017	DATE	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
last year	DATE	0.99+
both	QUANTITY	0.99+
Springsource	ORGANIZATION	0.99+
this year	DATE	0.99+
Melbourne	LOCATION	0.99+
last June	DATE	0.99+
Schlumberger	ORGANIZATION	0.99+
Big Apple	LOCATION	0.99+
NYC	LOCATION	0.99+
first time	QUANTITY	0.99+
End of October	DATE	0.99+
end of October	DATE	0.99+
next month	DATE	0.99+
end of August	DATE	0.98+
Apache	ORGANIZATION	0.98+
single	QUANTITY	0.98+
two paths	QUANTITY	0.98+
Horton Works	ORGANIZATION	0.98+
BigDataNYC	ORGANIZATION	0.97+
over 62 million	QUANTITY	0.97+
US	LOCATION	0.96+
Azure	TITLE	0.96+
billions of users	QUANTITY	0.93+
today	DATE	0.92+

Jim Campigli, WANdisco - #BigDataNYC 2015 - #theCUBE

>> Live from New York. It's The Cube, covering Big Data NYC 2015. Brought to you by Horton Works, IBM, EMC, and Pivotal. Now for your hosts, John Furrier and Dave Vellante. >> Hello, everyone. Welcome back to live in New York City for the Cube. A special big data [inaudible 00:00:27] our flagship program will go out to the events. They expect a [Inaudible 00:00:30] We are here live as part of Strata Hadoop Big Data NYC. I'm John Furrier. My co-host, Dave Vellante. Our next guest is Jim Campigli, the Chief Product Officer at WANdisco. Welcome back to The Cube. Great to see you. >> Thanks, great to be here. >> You've been COO of WANdisco, head of marketing, now Chief Product Officer for a few years. You guys have always had the patent. David was on earlier. I asked him specifically, why doesn't the other guys just do what you do? I wanted you to comment deeper on that because he had a great answer. He said, patents. But you guys do something that's really hard that people can't do. >> Right. >> So let's get into it because Fusion is a big announcement you guys made. Big deal with EMC, lot of traction with that, and it's one of these things that is kind of talked about, but not talked about. It's really a big deal, so what is the reason why you guys are so successful on the product side? >> Well I think, first of all, it starts with the technology that we have patented, and it's this true active active replication capability that we have. Other software products claim to have active active replication, but when you drill down on what they're really doing, typically, what's happening is they'll have a set of servers that they replicate across, and you can write a transaction at any server, but then that server is responsible for propagating it to all of the other servers in the implementation. There's no mechanism for pre-agreeing to that transaction before it's actually written, so there's no way to avoid conflicts up front, there's no way to effectively handle scenarios where some of the servers in the implementation go down while the replication is in process, and very frequently, those solutions end up requiring administrators to do periodic resynchronization, go back and manually find out what didn't take, and deal with all the deltas, whereas we offer guaranteed consistency. And effectively what happens is with us, you can write at any server as well, but the difference is we go through a peer-to-peer agreement process, and once a quorum of the servers in the implementation agree to the transaction, they all accept it, and we make sure everything is written in the same order on every server. And every server knows the last good transaction it processed, so if it goes down at some point in time, as soon as it comes back up, it can grab all the transactions it missed during that time slice while it was offline, resync itself automatically without an administrator having to do anything. And you can use that feature not only for network and server outages that cause downtime, but even for planned maintenance, which is one of the biggest causes of Hadoop availability issues, because obviously if you've got a global appointment, when it's midnight on Sunday in the U.S., it's the start of the business day on Monday in Europe, and then it's the middle of the afternoon in Asia. So if you take Hadoop clusters down, somebody somewhere in the world is going to be going without their applications and data. >> It's interesting; I want to get your comments on this because this has a great highlight into the next conversation we've been hearing all throughout The Cube this week is analytics, outcomes. These are the kind of things that people talk about because that means there's checks being written. Hadoop is moving into production. People have done the clusters. It used to be the conversation, hey, x number of clusters, you do this, you do that, replication here and there, YARN, all these different buzz words. Really feeds and speeds. Now, Hadoop is relevant, but it's kind of invisible. It's under the hood. >> Right. >> Yet, it's part of other things in the network, so high availability, non-disruptive operations, is what our table stakes now. So I want you to talk about that nuance because that's what we're seeing as the things that are powering, as the engine of Hadoop deployments. What is that? Take us through that nuance, because that's one of the things that you guys have been doing a lot of work in that's making it reliable and stable. To actually go out and play with Hadoop, deploy it, make sure it's always on. >> Well, we really come into play when companies are moving Hadoop out of the lab and into production. When they have defined application SLAs, when they can only have so much down time, and it may be business requirements, it may be regulatory compliance issues, for example, financial services. They pretty much always have to have their data available. They have to have a solid back-up of the data. That's a hard requirement for them to put anything into production in their data centers. >> The other use case we've been hearing is okay, I've got Hadoop, I've been playing with it, now I need to scale it up big time. I need to double, triple my clusters. I have to put it with my applications. Then the conversation's, okay, wait, do I need to do more cis admin work? How do you address that particular piece because I think that's where I think Fusion comes in from how I'm reading it, but is that a Fusion value proposition? Is it a WANdisco thing, and what does the customer, and is that happening? >> Yeah, so there's actually two angles to that, and the first is how do we maintain that up-time? How do we make sure there's performance availability to meet the SLA's, the production SLA's? The active active replication that we have patents for, that I described earlier, and it's embodied in our discount distributed coordination engine, is at the core of Fusion, and once a Fusion server's installed with each of your Hadoop clusters, that active active replication capability is extended to them, and we expose that HDFS API so the client applications, Sqoop, Flume, Impala, HIVE, anything that would normally run against a Hadoop cluster, would talk through us. If it's been defined for replication, we do the active active replication of it. Pass straight through and process normally on the local cluster. So how does that address the issues you were talking about? What you're getting by default with our active active replication is effectively continuous hot back-up. That means if one cluster or an entire data center goes offline, that data exists elsewhere. Your users can fail over. They can continue accessing the data, running their applications. As soon as that cluster comes back online, it resyncs automatically. Now what's the other >> No user involvement? No admin? >> No user involvement in that. Now the only time, and this gets back into what I was talking about earlier, if I take servers offline for planned maintenance, upgrade the hardware, the operating system, whatever it may be, I can take advantage of that feature, as I was alluding to earlier. I can take the servers of the entire cluster offline, and Fusion knows the last good transactions that were processed on that cluster. As soon as the admin turns it back on, it'll resync itself automatically. So that's how you avoid down time, even for planned maintenance, if you have to take an entire location off. Now, to your other question, how do you scale this stuff up? Think about what we do. We eliminate idle standby hardware, because everything is full read write. You don't have standby read-only back-up clusters and servers when we come into the picture. So let's say we walk into an existing implementation, and they've got two clusters. One is the active cluster where everything's being written to, read from, actively being accessed by users. The other's just simply taking snapshots or periodic back-ups, or they're using dis(CP) or something else, but they really can't get full utilization out of that. We come in with our active active replication capability, and they don't have to change anything, but what suddenly happens is, as soon as they define what they want replicated, we'll replicate it for them initially to the other clusters. They don't have to pre-sync it, and the cluster that was formally for disaster recovery, for back-up, is now live and fully usable. So guess what? I'm now able to scale up to twice my original implementation by just leveraging that formally read-only back-up cluster that I was >> Is there a lot of configuration involved in that, or is it automatically? >> No, so basically what happens, again, you don't have to synchronize the clusters in advance. The way we replicate is based on this concept of folders, and you can think of a folder as basically a collection of files and subdirectories that roll up into root directories, effectively, that reflect typically particular applications that people are using with Hadoop or groups of users that have data sets that they access for their various sets of applications. And you define the replicated folders, basically a high level directory that consists of everything in it, and as soon as you do that, what we'll do automatically, in a new implementation. Let's keep it simple. Let's say you just have two clusters, two locations. We'll replicate that folder in its entirety to the target you specify, and then from that point on, we're just moving the deltas over the wire. So you don't have to do anything in advance. And then suddenly that back-up hardware is fully usable, and you've doubled the size of your implementations. You've scaled up to 2x. >> So, I mean what you're describing before, really strikes me that the way you tell the complexity of a product and the value of a product in this space is what happens when something goes wrong. >> Yep. >> That's the question you always ask. How do you recover, because recovery's a very hard thing, and your patents, you've got a lot of math inside there. >> Right. >> But you also said something that's interesting, which is you're an asset utilization play. >> Right. >> You're being able to go in relatively simply and say, okay, you've got this asset that's underutilized. I'm now going to give you back some capacity that's on the floor and take advantage of that. >> Right, and you're able to scale up without spending any more on hardware and infrastructure. >> So I'm interested in, so another company. You're now with an EMC partnership this week. And they sort of got into this way back in the mainframe days with SRDF. I always thought when I first heard about WANdisco, it's like SRDF for Hadoop, but it's active active. Then they bought that Yada Yada. >> And there's no distance limitations for their active active. >> So what's the nature of the relationship with EMC? >> Okay, so basically EMC, like the other storage vendors that want to play in the Hadoop space, expose some form of an HDFS API, and in fact, if you look at Hortonworks or Cloudera, if you go and look at Cloudera Manager, one of the things it asks you when you're installing it is are you going to run this on regular HDFS storage, effectively a bunch of commodity boxes typically, or are you going to use EMC Isilon or the various other options? And what we're able to do is replicate across Hadoop clusters running on Isilon, running on EMC ECS, running on standard HDFS, and what that allows these companies to do is without modifying those storage systems, without migrating that data off of them, incorporate it into an enterprise-wide data lake, if that's what they want to do, and selectively replicate across all of those different storage systems. It could be a mix of different Hadoop distributions. You could have replication between C/D/H, HDP, Pivotal, MapR, all of those things, including EMC Storage that I just mentioned, it was mentioned in the press release, Isilon, and ECS effectively has a Hadoop-compatible API support. And we can create in effect a single virtual cluster out of all of those different platforms. >> So is it a go-to-market relationship? Is it an OEM deal? >> Yeah, it was really born out of the fact that we have some mutual customers that want to do exactly what I just described. They have standard Hortonworks or Cloudera deployments in house. They've got data running on Isilon, and they want to deploy a data lake that includes what they've got stored on Isilon with what they've got in HDFS and Hadoop and replicate across that. >> Like onerous EMC certification process? >> Yeah, we went through that process. We actually set up environments in our labs where we had EMC, Isilon, and ECS running and did demonstration integrations, replication across Isilon to HDP to Hortonworks, Isilon to Cloudera, ECS to Isilon to HDP and Cloudera and so forth. So we did prove it out. They saw that. In fact, they lent us boxes to actually do this in our labs, so they were very motivated, and they're seeing us in some of their bigger accounts. >> Talk about the aspect of two things: non-disruptive operations, meaning I have to want to deploy stuff because now that Hadoop has a hardened top with some abstraction layer, with analytics to focus, there's a lot of work going on under the hood, and a large scale enterprise might have a zillion versions of Hadoop. They might have little Hortonworks here. They might have something over here, so there might be some diversity in the distributions. That's one thing. The other one is operational disruption. >> Right. >> What do you guys do there? Is it zero disruption, and how do you deal with multiple versions of the distro? >> Okay, so basically what we're doing, the simplest way to describe it is we're providing a common API across all of these different distributions, running on different storage platforms and so forth, so that the client applications are always interacting with us. They're not worrying about the nuances of the particular Hadoop API's that these different things expose. So we're providing a layer of abstraction effectively. So we're transparent in effect, in that sense, operationally, once we're installed. The other thing is, and I mentioned this earlier, we come in, basically, you don't have to pre-sync clusters, you don't have to make sure they're all the same versions or the same distros or any of that, just install us, select the data that you want to replicate, we'll replicate it over initially to the target clusters, and then from that point on, you just go. It just works, and we talked about the core patent for active active replication. We've got other patents that have been approved, three patents now and seven pending applications pending, that allow this active active replication to take place while servers are being added and removed from implementations without disrupting user access or running applications and so forth. >> Final question for you, sum up the show this week. What's the vibe here? What's the aroma? Is it really Hadoop next? What is the overall Big Data NYC story here in Strata Hadoop? What's the main theme that you're seeing coming out of the show? >> I think the main theme that we're starting to see, it's twofold. I think one is we are seeing more and more companies moving this into production. There's a lot of interest in Spark and the whole fast data concept, and I don't think that Spark is necessarily orthogonal to Hadoop at all. I think the two have to coexist. If you think about Spark streaming and the whole fast data concept, basically, Hadoop provides the historical data at rest. It provides the historical context. The streaming data provides the point in time information. What Spark together with Hadoop allows you to do is that real time analysis, do the real time informed decision-making, but do it within historical context instead of a single point in time vacuum. So I think what's happening, and you notice the vendors themselves aren't saying, oh it's all Spark, forget Hadoop. They're really talking about coexisting. >> Alright, Jim, from WANdisco, Chief Product Officer, really in the trenches, talking about what's under the hood and making it all scale in the infrastructure so his analysts can hit the scene. Great to see you again. Thanks for coming and sharing your insight here on The Cube. Live in New York City. We are here, day two of three days of wall-to-wall coverage of Big Data NYC in conjunction with Strata. We'll be right back with more live coverage in the moment here in New York City after this short break.

Published Date : Oct 6 2015

SUMMARY :

Brought to you by Horton New York City for the Cube. You guys have always had the patent. on the product side? and once a quorum of the servers These are the kind of things because that's one of the things back-up of the data. and is that happening? So how does that address the issues and the cluster that was and you can think of a folder really strikes me that the way you tell That's the question you always ask. But you also said that's on the floor and Right, and you're able to scale up in the mainframe days with SRDF. And there's no distance limitations one of the things it asks you born out of the fact and Cloudera and so forth. diversity in the distributions. so that the client applications What is the overall Big Data NYC story and the whole fast data concept, in the infrastructure

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim	PERSON	0.99+
Jim Campigli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
WANdisco	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
U.S.	LOCATION	0.99+
New York	LOCATION	0.99+
John Furrier	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
two locations	QUANTITY	0.99+
Strata Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
Pivotal	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two things	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
two clusters	QUANTITY	0.99+
three days	QUANTITY	0.99+
Monday	DATE	0.99+
three patents	QUANTITY	0.98+
this week	DATE	0.98+
seven pending applications	QUANTITY	0.98+
two angles	QUANTITY	0.98+
two clusters	QUANTITY	0.98+
Spark	TITLE	0.97+
this week	DATE	0.97+
one cluster	QUANTITY	0.97+
00:00:30	DATE	0.95+
ECS	TITLE	0.95+
HDP	ORGANIZATION	0.94+
Cloudera Manager	TITLE	0.94+
single point	QUANTITY	0.94+
#BigDataNYC	EVENT	0.94+
each	QUANTITY	0.94+
Impala	TITLE	0.93+
NYC	LOCATION	0.93+
twofold	QUANTITY	0.93+
Strata	ORGANIZATION	0.92+
Flume	TITLE	0.92+
00:00:27	DATE	0.92+
Sqoop	TITLE	0.92+
Fusion	TITLE	0.91+
Isilon	ORGANIZATION	0.89+
Cloudera	ORGANIZATION	0.89+
midnight	DATE	0.89+
Sunday	DATE	0.88+
Isilon	TITLE	0.88+
single	QUANTITY	0.88+
HIVE	TITLE	0.87+
one thing	QUANTITY	0.83+
double	QUANTITY	0.83+

Amr Awadallah - Hadoop Summit 2013 - theCUBE - #HadoopSummit

>>Come back here. This is Silicon Valley coverage of ADU Summit. I'm John Fur, the founder. We're, we're pleased to have a friend inside the cube. It's rare to have such luminaries, Ama Aala, good friend and also co-founder of Cloudera. Really the pioneer in the space that helped build this industry that we're living here at at Hadoop Summit. I'm with Dave Ante from wiba.org. Amour, welcome back to the Cube Cub alumni. Thank you for having me here. Wow, what a journey. Are you co-founded Cloudera? I remember when you in Stealth Mo, I really can't talk about it. And, and then of course the history of Silicon Angle being, you know, founded and kind of built in in your office when you only had like 20 something employees. Yep. We owe a great deal of gratitude to you and, and congratulations to you Michael Olson, the team for building an industry. So I just wanted Thank you. Thank you. And welcome to the Cube. >>Thank you. It was great to be here. >>So what do you think, what's your take on the current Hadoop ecosystem right now? I mean, obviously a lot's happened. I mean it's big now. It's growing up fast. Yeah. The word enterprise grade is out there. You're seeing it move from, you know, trying to change the world. Our first interview, you said, I've seen the future, I want to bring it to the mainstream. It's here. Yeah. It's hitting mainstream right now. Yeah. What's your take of the current situation of the ecosystem and it's, and its value? >>Yeah, so I, I have a quick question first. Should I look to you or look to the camera? Look to >>The camera or both? Whatever you, whatever you'd like. >>So I think it's, the ecosystem is definitely growing, which is very, very healthy. However, there is a side question there, which is what do you think of all the competition coming into the space? So five years ago when Cloudera was started was just Cloudera. There was no other commercial vendor trying to support or enable Hadoop in the, in the industry for enterprises. And today there is at least 10 of them trying to compete with us, right? And that includes big companies, established companies that decided, hey, we gonna start addressing the space, but includes many, many newcomers who like Hortonworks, who were founded over the last couple of years. That's a healthy thing. I mean, that's absolutely a sign of a growing market. If the market wasn't growing, if there wasn't money in the market, if there wasn't, if it was just hype, there wouldn't have been all of these new companies and new ventures showing up. That said, I never look at competition as something that worries me, that I'm afraid now or what's gonna happen to me, or that's normal. That's exactly what happens to successful companies. If you look at Red Hat, when Red Hat was launching with the Linux, they had 25 competitors or even more 30 competitors. That's when Red Hat was forming out. And today, even of these 25, 30 competitors, they still have six or seven still left. So I think it's a very, very healthy sign of the graph of this market and the maturity that's reaching. >>What do you think about some of the, the white spaces that are evolving? You guys have obviously been involved in a lot of deployments at Cloudera. Again, you're doing a lot of, lot of work with the top, top names and the clients that you have aren't usually disclosed cuz you really can't disclose them. What, what are you seeing right now as the white spaces for things to do in the Hado platform? >>It's a very, very good question. So first I can't talk about future, future roadmap. Right now we're becoming a big company at that level where we can't comment on future roadmaps. >>Ah, that's sinus sign of the >>Time. You're well media train, good to see they're doing a good job keeping you >>A, You want more information on that? I can connect you with a pt, >>Please. No, no, no, we're good. We're good. We'll get it outta you. But, >>But our vision, our vision for Cloudera from day one, like you were saying earlier, we saw the future, right? So our vision from from day one was really to build this data system where we can have detail of any type, whether that data is structured or unstructured or images, it doesn't matter. And then on top of that data run any type of workloads. That workload could be the initial genesis of Hado, which is map use, which is batch processing. But now as as we made many announcements through the last few years, we also now have Impala for interactive analytics as a workload. We have a very, very strong partner partnership with SaaS for doing machine learning and statistics as a workload. And a few weeks ago we announced search as another workload. So you have multiple types of workloads that can handle different types of problems that you have within your organization and bring all of these workloads to all of your data regardless of type. And that's the vision that we'll continue to deliver on. That's exactly what we're building going into the >>Future. So how's that fit in with yarn, right? We're hearing a lot at this conference about yarn, the ability to, you know, do more with less in a lot of the things that you typically hear with the enter within the enterprise. And, and so talk about that a little bit. >>Yarn is a very core part to our platform. In fact, yarn has been part of CDH four for more than a year now out in the, in the markets. So we did bring, we were one of the, I think we were the first vendor who brought yarn into a distribution of Hado out there. It's very, very fundamental to us because that is how we're gonna coordinate. We are gonna be using yarn to coordinate launching all of these different type of workloads. You're gonna have the map produce workload, which is very batch oriented. The Impala workload, which is very latency sensitive. The, the search workload, which is also very latency sensitive. The machine learning workload, which is more batch oriented, et cetera, et cetera. And yarn is a very, very central piece to helping us coordinate all of these different types of workloads onto the >>Platform. Cloudera has been a great citizen in the community also. You, you mentioned and, and we witnessed that your team create the industry. You guys were there, you took the chance, you were the first ones commercially funded by the venture capitalists, you know, then others will follow and I'll see huge ecosystem here. Yes. A lot of noise. A lot of people trying to get attention. So I got to ask you, because I want you to address this because I know it's been talked about in some of the other blogs is there's a lot of fud going on around who's doing what? Who's doing what, and in some cases maybe flat out, you know, misinformation and that happens in a growing market, you know, the elbows get sharp. Yes. So I want you share with the audience anything that you want say about the fud around what people say about Cloudera or about others or what you're doing. Just to clarify, cuz there has been, I mean I've gotten back channel information around, you know, not sure the committers this, and it's been, it's been well documented. There's a lot of fu out there. What, what would you say to the folks out there to clarify >>That? Yes, I, I would say that our focus should be to continue to work as a community, to push the platform forwards. I would say that at Cloudera we do a lot of contributions. Horton works definitely is one of the top contributors out there as well. I'll acknowledge that. So as many, many, many other companies and we wanna continue to see the platform evolve. I will stress though that at Cloudera we do have a number of the original project founders working at the company. So it's not just the, the contribution that we bring, but the fact that we have the founders of these projects working at Cloudera. And some of these projects actually were created at Cloudera from day one as opposed to created in some other company. And then you hire the employee and they work for you. So I gave you what examples from Cloudera dot cutting. >>He is the creator of Hudu dot Cutting is also the creator of Luine, which became solar, which is part of the search project that we launched recently. Dot Cutting wasn't with Cloudera from day one, right? So, so when he created these technologies, he actually was at Tia for example, when he created had he was at ta, wasn't at Cloudera. However, he now works for Cloudera. So we get that because now that cutting works for Cloudera. So that's one example. On the flip side, there is projects like Flume and Scoop that are now part of every single distribution out there. And flu and Scoop were both created at Calera. They were actually created inside of Cloudera. Yeah. So the key point is, and and that's what I would like all of the vendors out there that are trying to leverage had and get benefit about out Hadoop is please don't be just takers. >>There are some vendors out there who are just takers. Just wanna take from the open source, take from the open source and don't give back. Right? I'm not gonna name them, but there is a few of them out there. Please, please, please. I mean that that, that is very, very a selfish behavior. It's not gonna help the ecosystem in the long term. We would like to see you both take and give at the same time. So that would be my core message. And that's for example, like I thank Hortonworks because that's exactly what Hortonworks is doing. They're both giving and taking at the same >>Time. You guys have always been clear on that. Nobody, I mean here contribution to open source has been well documented and there's, there's no question about that. John and I have talked about it a lot that you guys help get it all started. And even Haak when we had 'em on a couple years ago, when Horton Works came to the market said, Hey, the more people work on an open source, the better. >>Yeah, >>Exactly. So yeah, it's always been, been your posture. You're not playing games there. Anyways, having said that, you you, you have a strategy to layer on top of that open source some of your own proprietary code. And so you have choices to make Yes. In terms of how you allocate those resources. So as an engineering manager, how do you allocate those resources in terms of, okay, what do we do for the community and what do we do for our own, you know, future because of the business model that we chose? How do you make those trade offs? >>Yes, that's a very, very good question. So first it's important to stress that our core platform, CDH, is open source. Everything we put in the core platform is open source. So for example, in Palo, which we launched very recently as a ga, now we launched beta last year, but now's ga is a hundred percent Apache license, a hundred percent open source search, which we announced very recently is also open source. So the platform itself, we're committing to everything in there to be open source. Now we believe fundamentally just from having lots of history in studying the open source markets from our ceo Mike Olson himself being one of the very first open source people in the world with, with sleepy cats, the company that he sold to Oracle before founding Cloudera from our investors, helping many other open source companies. To have a successful open co open source company, you need to have a very good engine between the business model that generates revenue and between the product that you are creating. If you don't have a good feedback loop there between these two, you won't be able to sustain the innovation to continue to push the, the boundaries of how good the product is. So we strongly believe in that if you are, if your product is literally a hundred percent open source, meaning both the management and every, there is nothing proprietary whatsoever inside of your products. I can't tell what that is. It's >>Taking a picture. >>Oh, sorry, I thought somebody was waiting >>For me. >>Sorry about that. >>It's a cheap signal. >>It >>Was like a's really good. >>I thought it's like a card of paper with some writing. You, >>You, you have a fan fans out there. They're storming the, the concert here. >>Okay, that's, that's good to hear. That's good to hear. Sorry about that interruption. So if, if, if you have everything a hundred percent open source, that creates two problems. First you have no differentiation whatsoever, meaning another big corporation without naming who the big corporations could be, we just can take everything you do, literally every single bit of source code you have and say, Hey, we can do it too. Come to us, don't work with those guys. Right? We have the latest, greatest things that they have. Why do you wanna continue to work with them? So no, no differentiation is number one, which is very dangerous. And number two, when it becomes, if, if it's a hundred percent open source and there is lots of other vendors able to take the art, the open source artifact and work with it, then it becomes now purely about maintenance and insurance on the products, which is a commodity product, which obviously the prices for that will go down to the ground and you won't be able to have this sustain this positive feedback effect between your business model and between your product code map and won't be able to build a long-lasting company. >>So that's why we do have a combination of open source artifacts and proprietary artifacts. Now our pro proprietary AR artifacts is always around the management of the system, right? So how do we manage the security of the system? How do we manage the, the data flow within the system? How do we manage the services inside the, of the system across all layers, right? Not just the Hado player but the edge based layer, the zookeeper layer, et cetera, et cetera. So that's where we focus our efforts going forward and that's how we differentiate ourself from our, from other vendors out there. Cloudera manager, Cloudera navigator are very unique to us. Nobody else has anything close to those capabilities out there. >>So it sounds like the contributions you make to open source are cultural of, of, in nature, I mean DNA of sorts of Right. And so you're, that's something that you guys do cuz you've always done it. Absolutely. And then the, the artifacts that are proprietary are essentially around rationalizing the revenue opportunity with the expense that you're gonna apply there and making a business case decided >>How to balance. That's that's one. And then two, the differentiation from other competitors. So these two things, Yes. >>Okay. >>I believe that's fundamental to business to open source business models. >>Yeah, I mean there are many open source business models, right? You can go pure service, you can go, like you said, you can totally bogart the code. >>There is no, there is no pure service open source model company that was able to build the longlasting surviving public company, never happened in history. They always get acquired because it becomes a commodity. I >>Mean, right. I mean, I mean and even ibm, right? >>Tom or I want to ask you about the storage thing. We were talking before camera, the, the hor and worst announcement storage you, what's your take on that? >>Which one? The Gluster, the one with Red Hats? Yes. Yes. So Red Hats and yeah, there has been recent news about Red Hat with, with Hor Works having a version of the Haddo platform that uses map use for the computation but uses Red Hat for the storage, right? So Red Hat has a new storage offering that was built based off of a company they acquired was called Guster. And that, that news was very, very surprising to me. And it, the reason why it was surprising, it's correlated also with a shift in messaging from, from Horton works. If you look at Horton Works last year at had Summit last year, one of the key messages that they deliver to us is that within the next five years or by 2015, the tagline back then by 2015, and you're doing research right now to see if I'm saying the right thing. By 2015, half the world data data will be on, will be stored in had would be stored in had. Yes. If you look today at the slides, it >>Doesn't say that it says within five years, >>Right? No, no, no. It says, well >>That was the second iteration was within five years. And now they say something >>Different. Now say they say within 2015 by, sorry, by 2015, half the world's data will be processed by Hado and instead of stored by Hado. And that's a very, very fundamental So >>It's a nuance. >>It's a, it's a very important >>Nuance. Well it's a big deal because yes, when I first saw that I said, Hmm, what does this all mean? And then it sounds 2015 sounds a little early. Yes. And now you're saying processed by, Okay that's different. >>Yes, exactly. And and the reason why now is we believe s GFS is very, very core to the had platform. S GFS is very core to had platform, the storage system of had we want. It's really the layer that Mid had with is more than anything else is how scalable, how reliable and how economical the sdfs storage layer is. So we, we really, I mean ask qu works and ask all the companies working in the, in the had community not to fragment at the storage layer. We need the storage for had to stay inside of had and not to fragment that out. That's very, very critical. >>Okay. So but so >>You're saying that they're in indicating through the gesture that, that they're not come out saying we're going to fragment Hgfs, but the way that this is position might signal >>No, no, no. The announcement, the announcement with Red Hat is >>That is the direct signal. It's >>Literally, we, you'll be able to run map produce directly on top of Red Hat storage instead of sdfs. >>Okay. So >>I >>Interpreted it, I interpret it as they were just hortonwork was hedging on its prediction, which I said Okay, I'll give 'em a break on that. You're saying it's something different, >>It's a shift in strategy potentially. Yeah. Which can be dangerous. It's shift in strategy. >>Is that a compliance issue? Cuz you know, the, the Dishon Hads poss Yeah. Red Hat does have a lot of enterprise customers. Yeah. So is that just maybe if >>Then invest in making had poss compliance, which actually by the way, we are as a community investing in that. Yeah. Yes. You must have. Yeah. So we are investing in adding compulsive poss compliance to had, we're investing in adding snapshots into had, which will be coming very, very soon overnight. >>Well, do you think that that pick a year, I don't care if it's 2015 2000, 22,000 whenever that the majority of the world's data will be running into do >>The majority of worse data that has to do with analytics. Yes. Okay. So so there is, >>So that is that >>Is it's very important, the caveat. Yes, exactly. Because there is lots of types of data that are not very suitable for, had at all. For example, that data storage for Oracle systems, for Oracle database systems. No, you wanna store that in an NetApp emc you don't wanna store that in Hao the, the, the, the, the data storage for streaming video files, right? For just streaming lots and lots of video files. No, you don't wanna store that indu. It's >>A huge >>Proportion of the data. Yeah. Which is a huge, huge >>Proportion of data files, in fact that could overwhelm the data. >>Yeah. So the new nuance, like I would say like I agree that the half thing but the half thing within the world of data for the purpose of analysis. >>Yeah. Okay. So that's, that's >>Narrow down the >>Yeah, okay. But it's a more reasonable, But I've, I >>Never, It's still a huge market by the way. It is. Yeah, >>It is. Yes. Okay. So, so what's next for you? A are you, you, you've gone on this, this journey, you start this company. You've, you've been traveling around like crazy working with customers. What's the next phase of aara do's, you know, career? >>What >>Do you want to have happen next? I mean, what, what do you, what excites you? What do you, what are you working on? >>Yeah, it's just to continue to grow cloud there to be the biggest company it can be. I mean, we want to be literally, we want be one of the very few companies that we're able to take an open source model and turn that into a large publicly traded corporation. >>So you've talked about that you guys brought a new CEO on Right. Look at the background of the ceo and it's, you know, clearly it's got some IPO chops. Yes. So that's, that's an aspiration that you guys have put forth. Okay. >>And you're outward facing now. So you're doing a lot of travel. Yes. So what, what, where have, what have your travels taken now? You've been in China, you obviously you've got a European office Yeah. Open. So what's going on internationally? Give us some sound bites of, of what's happening in the field. Yeah, >>So in, in internationally, I mean, Europe definitely is our next big focus right now. And we now have a big operation in Europe and we have an office presence in, in Europe and a big team down there. And it's growing very quickly. I would say Europe is about two years behind the US kind of like that's how the, how the growth usually matters. What's happening here. And yeah, so we, our, our next big market is Europe. We are looking at China. We don't have a big process in China right now. Japan, we have a big presence in Japan. Japan is growing very quickly. So yeah, I mean we're obviously Canada with the US growing very quickly as well. >>Great to have you on the cube again, for me personally and, and for, for Dave. And I wanna say thanks to Cloudera for some great support over the years. You guys have been fantastic. You know, I say it's built a great company. It's so hard to build a company. You guys have done a great job. I gotta ask you the final question because you did bring that first sound bite, which was, I saw the future, this is back when you guys were just in your B round in, in Palo Alto office, just ramping up, just starting to ramp what's next? What do you see as around the corner? Obviously we're on a trajectory right now. A lot of things gonna get done. Positive compliance, a lot of stuff's gonna fill in. The platform's gonna get stronger. Yeah. We think that open source will win. Yeah. Through all the democratization of open source. What's next? What's the, what's around the corner that you're watching personally that you're, that's interesting to you? A or around where this will take us? >>Yeah. So what, what's next is having this, having this vision become true. Having this future vision that, that you refer to become true. Meaning having a single platform that can store all of your data and that can, regardless of the type of that data, and allow you to extract value for different types of workloads, whether that be batch, interactive machine learning or search or more, right? There will be more things that will come to the platform, but how to bring your applications, all of your data applications, how to bring them to your data and all of your data as opposed to have the data go to them. >>And what are the landmines out there that you need to avoid Yes. In the industry and community needs to avoid to make that a reality. >>The, the key landmine, it's, it's a bit technical. The landmine is a bit technical, which is making sure that they, they are vision continues to evolve and that we have the capability to properly have a multi workload resource management system that allows me to run all of these type of workloads without having them step on each other's steps. That's the key key step going forward. And >>Of course, playing well together in the sandbox. And as always, competitive competition is good. And again, Hadup is doing great. Amma Aala, co-founder of Cloudera inside the Cube. This is Silicon Angle and Wiki Bond's exclusive coverage of ADU Summit here in Silicon Valley. Right back with our next guest after the short break.

Published Date : Jun 27 2013

SUMMARY :

We owe a great deal of gratitude to you and, and congratulations to you Michael Olson, It was great to be here. So what do you think, what's your take on the current Hadoop ecosystem right now? Should I look to you or look to the camera? The camera or both? there is a side question there, which is what do you think of all the competition coming into the space? what are you seeing right now as the white spaces for things to do in the So first I can't talk about future, future roadmap. you No, no, no, we're good. So you have multiple types of workloads that can handle different types of problems to, you know, do more with less in a lot of the things that you typically hear with the enter within the enterprise. You're gonna have the map produce workload, which is very batch So I want you share with the audience anything that you want say about the So I gave you what examples from Cloudera dot cutting. So the key point is, and and that's what I would like all of the vendors out there that We would like to see you both take and give at the same time. John and I have talked about it a lot that you guys help get it all started. And so you have choices to make Yes. So we strongly believe in that if you are, I thought it's like a card of paper with some writing. You, you have a fan fans out there. big corporations could be, we just can take everything you do, literally every single bit of source code you have So how do we manage the security of the system? So it sounds like the contributions you make to open source are cultural of, of, in nature, So these two things, Yes. You can go pure service, you can go, There is no, there is no pure service open source model company I mean, I mean and even ibm, right? Tom or I want to ask you about the storage thing. And it, the reason why it was surprising, it's correlated also with a shift in messaging No, no, no. It says, well And now they say something half the world's data will be processed by Hado and instead of stored And now you're saying processed And and the reason why now is we believe s GFS is very, That is the direct signal. Interpreted it, I interpret it as they were just hortonwork was hedging on its prediction, which I said Okay, It's a shift in strategy potentially. So is that just maybe if So we are investing in adding compulsive poss compliance to had, we're investing in adding snapshots So so there is, No, you wanna store that in an NetApp emc you don't wanna store that in Hao Proportion of the data. for the purpose of analysis. But it's a more reasonable, But I've, I Never, It's still a huge market by the way. What's the next phase of aara do's, you know, of the very few companies that we're able to take an open source model and turn that into So that's, that's an aspiration that you guys have You've been in China, you obviously you've got a European how the growth usually matters. that first sound bite, which was, I saw the future, this is back when you guys were just in your B round in, and allow you to extract value for different types of workloads, whether that be batch, interactive And what are the landmines out there that you need to avoid Yes. That's the key key step going forward. Amma Aala, co-founder of Cloudera inside the Cube.

ENTITIES

Entity	Category	Confidence
Michael Olson	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Mike Olson	PERSON	0.99+
six	QUANTITY	0.99+
John Fur	PERSON	0.99+
China	LOCATION	0.99+
Dave	PERSON	0.99+
Amma Aala	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Horton Works	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
2015	DATE	0.99+
25	QUANTITY	0.99+
last year	DATE	0.99+
seven	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
25 competitors	QUANTITY	0.99+
Dave Ante	PERSON	0.99+
Ama Aala	PERSON	0.99+
two	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
30 competitors	QUANTITY	0.99+
Calera	ORGANIZATION	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
both	QUANTITY	0.99+
ADU Summit	EVENT	0.99+
Hortonworks	ORGANIZATION	0.99+
five years ago	DATE	0.99+
second iteration	QUANTITY	0.99+
one	QUANTITY	0.98+
22,000	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
first vendor	QUANTITY	0.98+
five years	QUANTITY	0.98+
hundred percent	QUANTITY	0.98+
Red Hat	TITLE	0.98+
Canada	LOCATION	0.98+
Tia	ORGANIZATION	0.98+
Tom	PERSON	0.98+
Hor Works	ORGANIZATION	0.97+
first	QUANTITY	0.97+
Horton	PERSON	0.97+
two things	QUANTITY	0.97+
first interview	QUANTITY	0.97+
Stealth Mo	LOCATION	0.97+
half	QUANTITY	0.96+
Haak	PERSON	0.96+
one example	QUANTITY	0.96+
Hadoop Summit 2013	EVENT	0.95+

Dr. Amr Awadallah - Interview 2 - Hadoop World 2011 - theCUBE

Yeah, I'm Aala, They're the co-founder back to back. This is the cube silicon angle.com, Silicon angle dot TV's production of the cube, our flagship telecasts. We go out to the event. That was a great conversation. I was really just, just cool. I could have, we could have probably hit on a few more things, obviously well read. Awesome. Co-founder of Cloudera a. You were, you did a good job teaming up with that co-founder, huh? Not bad on the cube, huh? He's not bad on the cube, isn't he? He, >>He reads the internet. >>That's what I'm saying. >>Anything is going on. >>He's a cube star, you know, And >>Technology. Jeff knows it. Yeah. >>We, we tell you, I'm smarter just by being in Cloudera all those years. And I actually was following what he was saying, Sad and didn't dust my brain. So, Okay, so you're back. So we were talking earlier with Michaels and about the relational database thing. So I kind of pick that up where we left off with you around, you know, he was really excited. It's like, you know, hey, we saw that relational database movement happen. He was part of that. Yeah, yeah. That generation. And then, but things were happening or kind of happening the same way in a similar way, still early. So I was trying to really peg with him, how early are we, like, so, you know, as the curve, you know, this is 1400, it's not the Javit Center yet. Maybe the Duke world, you know, next year might be at the Javit Center, 35,000 just don't go to Vegas. So I'm trying to figure out where we are on that curve. Yeah. And we on the upwards slope, you know, down here, not even hitting that, >>I think, I think, I think we're moving up quicker than previous waves. And actually if you, if you look for example, Oracle, I think it took them 15, 20 years until they, they really became a mature company, VM VMware, which started about, what, 12, 13 years ago. It took them about maybe eight years to, to be a big company, met your company, and I'm hoping we're gonna do it in five. So a couple more years. >>Highly accelerated. >>Yes. But yeah, we see, I mean, I'm, I'm, I've been surprised by the growth. I have been, Right? I've been told, warned about enterprise software and, and that it takes long for production to take place. >>But the consumerization trend is really changing that. I mean, it seems to be that, yeah, the enterprises always last. Why the shorter >>Cycle? I think the shorter cycle is coming from having the, the, the, the right solution for the right problem at the right time. I think that's a big part of it. So luck definitely is a big part of this. Now, in terms of why this is changing compared to a couple of dec decades ago, why the adoption is changing compared to a couple of decades ago. I, I think that's coming just because of how quickly the technology itself, the underlying hardware is evolving. So right now, the fact that you can buy a single server and it has eight cores to 16 cores has 12 hards to terabytes. Each is, is something that's just pushing the, the, the, the limits what you can do with the existing systems and hence making it more likely for new systems to disrupt them. >>Yeah. We can talk about a lot. It's very easy for people to actually start a, a big data >>Project. >>Yes. For >>Example. Yes. And the hardest part is, okay, what, what do I really, what problem do I need to solve? How am I gonna, how am I gonna monetize it? Right? Those are the hard parts. It's not the, not the underlying >>Technology. Yes, Yes, that's true. That's true. I mean, >>You're saying, eh, you're saying >>Because, because I'm seeing both so much. I'm, I'm seeing both. I'm seeing both. And like, I'm seeing cases where you're right. There's some companies that was like, Oh, this Hadoop thing is so cool. What problem can I solve with it? And I see other companies, like, I have this huge problem and, and, and they don't know that HA exists. It's so, And once they know, they just jump on it right away. It's like, we know when you have a headache and you're searching for the medicine in Espin. Wow. It >>Works. I was talking to Jeff Hiba before he came on stage and, and I didn't even get to it cuz we were so on a nice riff there. Right. Bunch of like a musicians playing the guitar together. But like he, we talked about the it and and dynamics and he said something that I thoughts right. On money and SAP is talking the same thing and said they're going to the lines of business. Yes. Because it is the gatekeeper that's, it's like selling mini computers to a mainframe selling client servers from a mini computer team. Yeah. >>There's not, we're seeing, we're seeing both as well. So more likely the, the former one meaning, meaning that yes, line of business and departments, they adopt the technology and then it comes in and they see there's already these five different departments having it and they think, okay, now we need to formalize this across the organization. >>So what happens then? What are you seeing out there? Like when that happens, that mean people get their hands on, Hey, we got a problem to solve. Yeah. Is that what it comes down to? Well, Hadoop exist. Go get Hadoop. Oh yeah. They plop it in there and I what does it do? They, >>So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and solving the problem for them. Yeah. And when that happens, it's a very, it's a very easy adoption from there on because they just go tell it, We need this right now because it's solving this problem and it's gonna make, make us much >>More money moving it right in. Yes. No problems. >>Is is that another reason why the cycle's compressed? I mean, you know, you think client server, there was a lot of resistance from it and now it's more much, Same thing with mobile. I mean mobile is flipped, right? I mean, so okay, bring it in. We gotta deal with it. Yep. I would think the same thing. We, we have a data problem. Let's turn it into an >>Opportunity. Yeah. In my, and it goes back to what I said earlier, the right solution for the right problem at the right time. Like when they, when you have larger amounts of unstructured data, there isn't anything else out there that can even touch what had, can >>Do. So Amar, I need to just change gears here a minute. The gaming stuff. So we have, we we're featured on justin.tv right now on the front page. Oh wow. But the numbers aren't coming in because there's a competing stream of a recently released Modern Warfare three feature. Yes. Yes. So >>I was looking for, we >>Have to compete with Modern Warfare three. So can you, can we talk about Modern Warfare three for a minute and share the folks what you think of the current version, if any, if you played it. Yeah. So >>Unfortunately I'm waiting to get back home. I don't have my Xbox with me here. >>A little like a, I'm talking about >>My lines and business. >>Boom. Water warfares like a Christmas >>Tree here. Sorry. You know, I love, I'm a big gamer. I'm a big video gamer at Cloudera. We have every Thursday at five 30 end office, we, we play Call of of Beauty version four, which is modern world form one actually. And I challenge, I challenge people out there to come challenge our team. Just ping me on Twitter and we'll, we'll do a Cloudera versus >>Let's, let's, let's reframe that. Let team out. There am Abalas company. This is the geeks that invent the future. Jeff Haer Baer at Facebook now at Cloudera. Hammerer leading the charge. These guys are at gamers. So all the young gamers out there am are saying they're gonna challenge you. At which version? >>Modern Warfare one. >>Modern Warfare one. Yes. How do they fire in? Can you set up an >>External We'll >>We'll figure it out. We'll figure it out. Okay. >>Yeah. Just p me on Twitter and We'll, >>We can carry it live actually we can stream that. Yeah, >>That'd be great. >>Great. >>Yeah. So I'll tell you some of our best Hadooop committers and Hadoop developers pitch >>A picture. Modern Warfare >>Three going now Model Warfare three. Very excited about the game. I saw the, the trailers for it looks, graphics look just amazing. Graphics are amazing. I love the Sirius since the first one that came out. And I'm looking forward to getting back home to playing the game. >>I can't play, my son won't let me play. I'm such a fumbler with the Hub. I'm a keyboard controller. I can't work the Xbox controller. Oh, I have a coordination problem my age and I'm just a gluts and like, like Dad, sorry, Charity's over. I can I play with my friends? You the box. But I'm around big gamer. >>But, but in terms of, I mean, something I wanted to bring up is how to link up gaming with big data and analysis and so on. So like, I, I'm a big gamer. I love playing games, but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. So it's like, I mean, yeah, it's fun and I'm getting lots of enjoyment on it makes my life much more cheerful. But still, how can we harness all of this, all of these hours that gamers spend playing a game like Modern Warfare three, How can we, how can we collect instrument, all of the data that's coming from that and coming up, for example, with something useful with predicted. >>This is exactly, this is exactly the kind of application that's mainstream is gaming. Yeah. Yeah. Danny at Riot G is telling me, we saw him at Oracle Open World. He's up there for the Java one. He said that they, they don't really have a big data platform and their business is about understanding user behavior rep tons of data about user playing time, who they're playing with. Yeah, Yeah. How they want us to get into currency trading, You know, >>Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop right now. And, and depending on CDH for doing exactly that kind of thing, creating >>A good user experience >>Today, they're doing it for the purpose of enhancing the user experience and improving retention. So they do track everything. Like every single bullet, you fire everything in best Ball Head, you get everything home run, you do. And, and, and in, in a three >>Type of game consecutive headshot, you get >>Everything, everything is being Yeah. Headshot you get and so on. But, but as you said, they are using that information today to sell more products and, and, and retain their users. Now what I'm suggesting is that how can you harness that energy for the good as well? I mean for making money, money is good and everything, but how can you harness that for doing something useful so that all of this entertainment time is also actually productive time as well. I think that'd be a holy grail in this, in this environment if we >>Can achieve that. Yeah. It used to be that corn used to be the telegraph of the future of about, of applications, but gaming really is, if you look at gaming, you know, you get the headset on. It's a collaborative environment. Oh yeah. You got unified communications. >>Yeah. And you see our teenager kids, how, how many hours they spend on these things. >>You got play as a play environments, very social collaborative. Yeah. You know, some say, you know, we we're saying, what I'm saying is that that's the, that's the future work environment with Skype evolving. We're our multiplayer game's called our job. Right? Yeah. You know, so I'm big on gaming. So all the gamers out there, a has challenged you. Yeah. Got a big data example. What else are we seeing? So let's talk about the, the software. So we, one of the things you were talking about that I really liked, you were going down the list. So on Mike's slide he had all the new features. So around the core, can you just go down the core and rattle off your version of what, what it means and what it is. So you start off with say H Base, we talked about that already. What are the other ones that are out there? >>So the projects that we have right there, >>The projects that are around those tools that are being built. Cause >>Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use for processing. Yeah. And then the, the immediate layer above that is how to make MAP reduce easier for the masses. So how can, not everybody knows how to learn map, use Java, everybody knows sql, right? So, so one of the most successful projects right now that has the highest attach rate, meaning people usually when they install had do installed as well is Hive. So Hive takes sequel and so Jeff Harm Becker, my co-founder, when he was at Facebook, his team built the Hive system. Essentially Hive takes sql so you don't have to learn a new language, you already know sql. And then converts that into MAP use for you. That not only expands the developer base for how many people can use adu, but also makes it easier to integrate Hadoop through all DBC and JDBC integrated with BI tools like MicroStrategy and Tableau and Informatica, et cetera, et cetera. >>You mentioned R too. You mentioned R Program R >>As well. Yeah, R is one of our best partnerships. We're very, very happy with them. So that's, that's one of the very key projects is Hive assisted project to Hive ISS called Pig. A pig Latin is a language that ya invented that you have to learn the language. It's very easy, it's very easy to learn compared to map produce. But once you learn it, you can, you can specify very deep data pipelines, right? SQL is good for queries. It's not good for data pipelines because it becomes very convoluted. It becomes very hard for the, the human brain to understand it. So Pig is much more natural to the human. It's more like Pearl very similar to scripting kind of languages. So with Peggy can write very, very long data pipelines, again, very successful projects doing very, very well. Another key project is Edge Base, like you said. So Edge Base allows you to do low latencies. So you can do very, very quick lookups and also allows you to do transactions. So you can do updates in inserts and deletes. So one of the talks here that had World we try to recommend people watch when the videos come out is the Talk by Jonathan Gray from Facebook. And he talked about how they use Edge Base, >>Jonathan, something on here in the Cube later. Yeah. So >>Drill him on that. So they use Edge Base now for many, many things within Facebook. They have a big team now committed to building an improving edge base with us and with the community at large. And they're using it for doing their online messaging system. The live mail system in Facebook is powered by Edge Base right now. Again, Pro and eBay, The Casini project, they gave a keynote earlier today at the conference as well is using Edge Base as well. So Edge Base is definitely one of the projects that's growing very, very quickly right now within the Hudu system. Another key project that Jeff alluded to earlier when he was on here is Flum. So Flume is very instrumental because you have this nice system had, but Hadoop is useless unless you have data inside it. So how do you get the data inside do? >>So Flum essentially is this very nice framework for having these agents all over your infrastructure, inside your web servers, inside your application servers, inside your mobile devices, your network equipment that collects all of that data and then reliably and, and materializes it inside Hado. So Flum does that. Another good project is Uzi, so many of them, I dunno how, how long you want me to keep going here, But, but Uzi is great. Uzi is a workflow processing system. So Uzi allows you to define a series of jobs. Some of them in Pig, some of them in Hive, some of them in map use. You can define a series of them and then link them to each other and say, only start this job when these other jobs, two jobs finish because I'm waiting for the input from them before I can kick off and so on. >>So Uzi is a very nice framework that will will do that. We'll manage the whole graph of jobs for you and retry things when they fail, et cetera, et cetera. Another good project is where W H I R R and where allows you to very easily start ADU cluster on top of Amazon. Easy two on top of Rackspace, virtualized environ. It's more for kicking off, it's for kicking off Hadoop instances or edge based instances on any virtual infrastructure. Okay. VMware, vCloud. So that it supports all of the major vCloud, sorry, all of the me, all of the major virtualized infrastructure systems out there, Eucalyptus as well, and so on. So that's where W H I R R ARU is another key project. It's one, it's duck cutting's main kind of project right now. Don of that gut cutting came on stage with you guys has, So Aru ARO is a project about how do we encode with our files, the schema of these files, right? >>Because when you open up a text file and you don't know how to what the columns mean and how to pars it, it becomes very hard to work for it. So ARU allows you to do that much more easily. It's also useful for doing rrp. We call rtc remove procedure calls for having different services talk to each other. ARO is very useful for that as well. And the list keeps going on and on Maha. Yeah. Which we just, thanks for me for reminding me of my house. We just added Maha very recently actually. What is that >>Adam? I'm not >>Familiar with it. So Maha is a data mining library. So MAHA takes some of the most popular data mining algorithms for doing clustering and regression and statistical modeling and implements them using the map map with use model. >>They have, they have machine learning in it too or Yes, yes. So that's the machine learning. >>So, So yes. Stay vector to machines and so on. >>What Scoop? >>So Scoop, you know, all of them. Thanks for feeding me all the names. >>The ones I don't understand, >>But there's so many of them, right? I can't even remember all of them. So Scoop actually is a very interesting project, is short for SQL to Hadoop, hence the name Scoop, right? So SQ from SQL and Oops from Hadoop and also means Scoop as in scooping up stuff when you scoop up ice cream. Yeah. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata and it is a vertical and so on and Hadoop. So you can very simply say, Scoop the name of the table inside the relation system, the name of the file inside Hadoop. And the, the table will be copied over to the file and Vice and Versa can say Scoop the name of the file in Hadoop, the name of the table over there, it'll move the table over there. So it's a connectivity tool between the relational world and the Hadoop world. >>Great, great tutorial. >>And all of these are Apache projects. They're all projects built. >>It's not part of your, your unique proprietary. >>Yes. But >>These are things that you've been contributing >>To, We're contributing to the whole ecosystem. Yes. >>And you understand very well. Yes. And >>And contribute to your knowledge of the marketplace >>And Absolutely. We collaborate with the, with the community on creating these projects. We employ committers and founders for many of these projects. Like Duck Cutting, the founder of He works in Cloudera, the founder for that UIE project. He works at Calera for zookeeper works at Calera. So we have a number of them on stuff >>Work. So we had Aroon from Horton Works. Yes. And and it was really good because I tell you, I walk away from that conversation and I gotta say for the folks out there, there really isn't a war going on in Apache. There isn't. And >>Apache, there isn't. I mean isn't but would be honest. Like, and in the developer community, we are friends, we're working together. We want to achieve the, there's >>No war. It's all Kumbaya. Everyone understands the rising tide floats, all boats are all playing nice in the same box. Yes. It's just a competitive landscape in Horton. Works >>In the business, >>Business business, competitive business, PR and >>Pr. We're trying to be friendly, as friendly as we can. >>Yeah, no, I mean they're, they're, they're hying it up. But he was like, he was cool. Like, Hey, you know, we know each other. Yes. We all know each other and we're just gonna offer free Yes. And charge with support. And so are they. And that's okay. And they got other things going on. Yes. But he brought up the question. He said they're, they're launching a management console. So I said, Tyler's got a significant lead. He kind of didn't really answer the question. So the question is, that's your core bread and butter, That's your yes >>And no. Yes and no. I mean if you look at, if you look at Cloudera Enterprise, and I mentioned this earlier and when we talked in the morning, it has two main things in it. Cloudera Enterprise has the management suite, but it also has the, the the the support and maintenance that we provide to our customers and all the experience that we have in our team part That subscription. Yes. For a description. And I, I wanna stress the point that the fact that I built a sports car doesn't mean that I'm good at running that sports car. The driver of the car usually is much better at driving the car than the guy who built the car, right? So yes, we have many people on staff that are helping build had, but we have many more people on stuff that helped run Hado at large scale, at at financial indu, financial industry, retail industry, telecom industry, media industry, health industry, et cetera, et cetera. So that's very, very important for our customer. All that experience that we bring in on how to run the system technically Yeah. Within these verticals. >>But their strategies clear. We're gonna create an open source project within Apache for a management consult. Yes. And we sell support too. Yes. So there'll be a free alternative to management. >>So we have to see, But I mean we look at the product, I mean our products, >>It's gotta come down to product differentiation. >>Our product has been in the market for two years, so they just started building their products. It's >>Alpha, It's just Alpha. The >>Product is Alpha in Alpha right now. Yeah. Okay. >>Well the Apache products, it is >>Apache, right? Yeah. The Apache project is out. So we'll see how it does it compare to ours. But I think ours is way, way ahead of anything else out there. Yeah. Essentially people to try that for themselves and >>See essentially, John, when I asked Arro why does the world need Hortonwork? You know, eventually the answer we got was, well it's free. It needs to be more open. Had needs to be more open. >>No, there's, >>It's going to be, That's not really the reason why Warton >>Works. >>No, they want, they want to go make money. >>Exactly. We wasn't >>Gonna say them you >>When I kept pushing and pushing and that's ultimately the closest we can get cuz you >>Just listens. Not gonna >>12 open source projects. Yes. >>I >>Mean, yeah, yeah. You can't get much more open. Yeah. Look >>At management >>Consult, but Airs not shooting on all those. I mean, I mean not only we are No, no, not >>No, no, we absolutely >>Are. No, you are contributing. You're not. But that's not all your projects. There's other people >>Involved. Yeah, we didn't start, we didn't start all of these projects. Yeah, that's >>True. You contributing heavily to all of them. >>Yes, we >>Are. And that's clear. Todd Lipkin said that, you know, he contributed his first patch to HPAC in 2008. Yes. So I mean, you go back through the ranks >>Of your people and Todd now is a committer on Edge base is a committer on had itself. So on a number >>Of you clearly the lead and, and you know, and, but >>There is a concern. But we, we've heard it and I wanna just ask you No, no. So there's a concern that if I build processes around a proprietary management console, Yes. I'm gonna end up being locked into that proprietary management CNA all over again. Now this is so far from ca Yes. >>Right. >>But that's a concern that some people have expressed. And, and, and I think one of the reasons why Port Works is getting so much attention. So Yes. >>Talk about that. It's, it's a very good, it's a very good observation to make. Actually, >>There there is two separate things here. There's the platform where all the data sets and then there's this management parcel beside the platform. Now why did we make the management console why the cloud didn't make the management console? Because it makes our job for supporting the customers much more achievable. When a customer calls in and says, We have a problem, help us fix this problem. When they go to our management console, there is a button they click that gives us a dump of the state, of the cluster. And that's what allows us to very quickly debug what's going on. And within minutes tell them you need to do this and you to do that. Yeah. Without that we just can't offer the support services. There's >>Real value there. >>Yes. So, so now a year from, But, but, but you have to keep in mind that the, the underlying platform is completely open source and free CBH is completely a hundred percent open source, a hundred percent free, a hundred percent Apache. So a year from now, when it comes time to renew with us, if the customer is not happy with our management suite is not happy with our support data, they can, they can go to work >>And works. People are afraid >>Of all they can go to ibm. >>The data, you can take the data that >>You don't even need to take the data. You're not gonna move the data. It's the same system, the same software. Every, everything in CDH is Apache. Right? We're not putting anything in cdh, which is not Apache. So a year from now, if you're not happy with our service to you and the value that we're providing, you can switch. There is no lock in. There is no lock. And >>Your, your argument would be the switching costs to >>The only lock in is happiness. The only lock in is which >>Happiness inspection customer delay. Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. We made that statement. We've got some heat for it. Yes. And >>This is sort of at scale though. What the, what the people are saying, they're throwing the tomatoes is saying if this is, again, in theory at scale, the customers are so comfortable with that, the console that they don't switch. Now my argument was >>Yes, but that means they're happy with it. That means they're satisfied and happy >>With it. >>And it's more economical for them than going and hiding people full-time on stuff. Yeah. >>So you're, you're always on check as, as long as the customer doesn't feel like Oracle. >>Yeah. See that's different. Oracle is very, Oracle >>Is like different, right? Yeah. Here it's like Cisco routers, they get nested into the environment, provide value. That's just good competitive product strategy. Yes. If it they're happy. Yeah. It's >>Called open washing with >>Oracle, >>I mean our number one core attribute on the company, the number one value for us is customer satisfaction. Keeping our people Yeah. Our customers happy with the service that we provide. >>So differentiate in the product. Yes. Keep the commanding lead. That's the strategist. That's the, that's what's happening. That's your goal. Yes. >>That's what's happening. >>Absolutely. Okay. Co-founder of Cloudera, Always a pleasure to have you on the cube. We really appreciate all the hospitality over the beer and a half. And wanna personally thank you for letting us sit in your office and we'll miss you >>And we'll miss you too. We'll >>See you at the, the Cube events off Swing by, thanks for coming on the cube and great to see you and congratulations on all your success. >>Thank >>You. And thanks for the review on Modern Warfare three. Yeah, yeah. >>Love me again. If there any gaming stuff, you know, I.

Published Date : May 1 2012

SUMMARY :

Yeah, I'm Aala, They're the co-founder back to back. Yeah. So I kind of pick that up where we left off with you around, you know, he was really excited. So a couple more years. takes long for production to take place. But the consumerization trend is really changing that. So right now, the fact that you can buy a single server and it It's very easy for people to actually start a, a big data Those are the hard parts. I mean, It's like, we know when you have a headache and you're On money and SAP is talking the same thing and said they're going to the lines of business. the former one meaning, meaning that yes, line of business and departments, they adopt the technology and What are you seeing out there? So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and Yes. I mean, you know, you think client server, there was a lot of resistance from for the right problem at the right time. Do. So Amar, I need to just change gears here a minute. of the current version, if any, if you played it. I don't have my Xbox with me here. And I challenge, I challenge people out there to come challenge our team. So all the young gamers out there am are saying they're gonna challenge you. Can you set up an We'll figure it out. We can carry it live actually we can stream that. Modern Warfare I love the Sirius since the first one that came out. You the box. but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. Danny at Riot G is telling me, we saw him at Oracle Open World. Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop So they do Now what I'm suggesting is that how can you harness that energy for the good as well? but gaming really is, if you look at gaming, you know, you get the headset on. So around the core, can you just go down the core and rattle off your version of what, The projects that are around those tools that are being built. Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use You mentioned R too. So one of the talks here that had World we Jonathan, something on here in the Cube later. So Edge Base is definitely one of the projects that's growing very, very quickly right now So Uzi allows you to define a series of So that it supports all of the major vCloud, So ARU allows you to do that much more easily. So MAHA takes some of the most popular data mining So that's the machine learning. So, So yes. So Scoop, you know, all of them. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata And all of these are Apache projects. To, We're contributing to the whole ecosystem. And you understand very well. So we have a number of them on And and it was really good because I tell you, Like, and in the developer community, It's all Kumbaya. So the question is, the experience that we have in our team part That subscription. So there'll be a free alternative to management. Our product has been in the market for two years, so they just started building their products. Alpha, It's just Alpha. Product is Alpha in Alpha right now. So we'll see how it does it compare to ours. You know, eventually the answer We wasn't Not gonna Yes. Yeah. I mean, I mean not only we are No, But that's not all your projects. Yeah, we didn't start, we didn't start all of these projects. So I mean, you go back through the ranks So on a number But we, we've heard it and I wanna just ask you No, no. So there's a concern that So Yes. It's, it's a very good, it's a very good observation to make. And within minutes tell them you need to do this and you to do that. So a year from now, when it comes time to renew with us, if the customer is And works. It's the same system, the same software. The only lock in is which Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. the console that they don't switch. Yes, but that means they're happy with it. And it's more economical for them than going and hiding people full-time on stuff. Oracle is very, Oracle Yeah. I mean our number one core attribute on the company, the number one value for us is customer satisfaction. So differentiate in the product. And wanna personally thank you for letting us sit in your office and we'll miss you And we'll miss you too. you and congratulations on all your success. Yeah, yeah. If there any gaming stuff, you know, I.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Hiba	PERSON	0.99+
Todd Lipkin	PERSON	0.99+
2008	DATE	0.99+
Cisco	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
John	PERSON	0.99+
Mike	PERSON	0.99+
Modern Warfare three	TITLE	0.99+
Apache	ORGANIZATION	0.99+
Danny	PERSON	0.99+
Jonathan Gray	PERSON	0.99+
Jeff Haer Baer	PERSON	0.99+
15	QUANTITY	0.99+
two years	QUANTITY	0.99+
Calera	ORGANIZATION	0.99+
Modern Warfare	TITLE	0.99+
16 cores	QUANTITY	0.99+
Jeff Harm Becker	PERSON	0.99+
Todd	PERSON	0.99+
eight cores	QUANTITY	0.99+
Jonathan	PERSON	0.99+
both	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Java	TITLE	0.99+
next year	DATE	0.99+
Skype	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
Vegas	LOCATION	0.99+
Michaels	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
hundred percent	QUANTITY	0.99+
35,000	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Today	DATE	0.99+
Peggy	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Horton	LOCATION	0.99+
12 hards	QUANTITY	0.99+
Each	QUANTITY	0.99+
vCloud	TITLE	0.99+
HPAC	ORGANIZATION	0.99+
Aala	PERSON	0.99+
Adam	PERSON	0.99+
Tyler	PERSON	0.98+
UIE	ORGANIZATION	0.98+
Hadoop World	TITLE	0.98+
first one	QUANTITY	0.98+
12 open source projects	QUANTITY	0.98+
Edge Base	TITLE	0.98+
W H I R R	TITLE	0.98+
five	QUANTITY	0.98+
Hammerer	PERSON	0.98+
Xbox	COMMERCIAL_ITEM	0.98+
Port Works	ORGANIZATION	0.98+
Hive	TITLE	0.98+
Amar	PERSON	0.98+
five different departments	QUANTITY	0.98+
today	DATE	0.98+
Christmas	EVENT	0.98+
SQL	TITLE	0.97+
Silicon angle dot TV	ORGANIZATION	0.97+
Tableau	TITLE	0.97+
two	QUANTITY	0.97+
W H I R R	TITLE	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Horton Works: