Alan Gates, Hortonworks | Dataworks Summit 2018

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
Alan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Alan Gates	PERSON	0.99+
four years	QUANTITY	0.99+
James	PERSON	0.99+
ING	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Apache	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Java	TITLE	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
100%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Atlas	ORGANIZATION	0.99+
DataWorks Summit 2018	EVENT	0.98+
Data Steward Studio	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
NiFi	ORGANIZATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
Hadoop	ORGANIZATION	0.98+
one platform	QUANTITY	0.97+
2018	EVENT	0.97+
both	QUANTITY	0.97+
millions of events	QUANTITY	0.96+
Hbase	ORGANIZATION	0.95+
Tablo	TITLE	0.95+
ODPI	ORGANIZATION	0.94+
Big Data Analytics	ORGANIZATION	0.94+
One	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
NiFi	COMMERCIAL_ITEM	0.92+
day two	QUANTITY	0.92+
about five	QUANTITY	0.91+
Kafka	TITLE	0.9+
Zeppelin	ORGANIZATION	0.89+
Atlas	TITLE	0.85+
Ranger	ORGANIZATION	0.84+
Jupyter	ORGANIZATION	0.83+
first	QUANTITY	0.82+
Apache Atlas	ORGANIZATION	0.82+
Hadoop	TITLE	0.79+

Chris Wahl, Rubrik | AWS re:Invent 2017

(upbeat tech music) >> Announcer: Live from Las Vegas, it's the Cube! covering, AWS re:Invent 2017 presented by AWS, intel, and our ecosystem of partners. >> Well, welcome back to the sands, we're here in Las Vegas, just off the strip, as the re:Invent show continues here with a really exciting day one. You talk about buzz on the show for it, the place has been jampacked since they opened the doors at 11 o'clock our time this morning, and continues to do so, and I imagine for the next two or three days, you're going to see a lot of people. 50,000 plus. A lot of exhibitors, a lot of people, a lot of buzz, a lot of excitement here around the AWS community. We have with us now, Chris Wahl, who is the chief technologist at Rubrik, and he knows so much about this space, it takes three hosts to surround him. >> It does, to talk to him. >> John Wahl is here, Lisa Martin to my left, Justin Warren on the far right. You're surrounded. >> I am, you've ganged up on me. >> Yeah right, and rightly so. >> Yeah, yeah. >> Justin can't wait. >> He's got the evil eye from Justin. >> Chris: This feels like a trap. >> There's some good history here going on, so we'll find out a little bit later on. First off, Chris, do welcome. Well, welcome the Cube. Tell us a little bit about Rubrik, and your place here and your feelings about the show. >> Yeah, yeah. So, Rubrik, about two and a half years in the market, about three and a half years old as a company. Really focused on solving the conundrum around, there's all this public cloud stuff out there, and everyone's kind of feeling the elephant with the blindfold on, describing it differently. And we're trying to figure out, how can we take that cloud type architecture that's out there in the world, and combine that with an almost 50 billion dollar TAM that is data protection, back up recovery archive. Put those two together to solve challenges within the enterprise is really struggling with. Onboarding into the cloud, and using those resources, as well as making sure their assets, be it the application, the data itself, or, a physical server, be protected and available for recovery in a really, really quick way. So that's kind of the high level pitch of Rubrik, around the last ten major releases of the product, and it's been a rocket ship, I've really enjoyed it. >> John: Great. >> So bigger focus in enterprise, or are you also playing with the startups and also helping the transition? >> That's a good question, I mean, originally we were kind of looking mid market, you know, like, let's kind of go for that sweet spot, but very early on, a lot of large enterprise customers came up and said, wow, you're fully restful API compliant, the full stack is distributed and scaled out, and really solves their problems, so they kind of pulled us into that space, and ever since, we've really embraced the large enterprise globally. It doesn't really matter where you are in the world, those are challenges that are kind of ubiquitous across verticals and the market. >> So, I've got a good storage and backup background, as you well know. >> Okay. >> There has been such a big shift in data storage and backup, and data protection in the last, say, three or five years. What do you think is driving that, because really, like backup recovery was always a hygiene function, it was boring, no one really wanted to spend any money on it, but now you're part of this guard of brand new ways of doing things, that has that part of the market being kind of exciting again. >> I almost feel like we got used to the horrible nature of that business. Because, as a technologist, I was a customer for about 13 years, I was in the channel for about five. And it was always, well, this is just the way it is, and you've got to put up with slower stores that were clunky, it was seen as an insurance policy. I think as the enterprise matured to the point, where everything else was amazing and hyper converged, and driven by APIs, and cloud is starting to eat up part of the data center, we finally saw, okay, this isn't going to stand, we can't operate in a model where an RTO is days, or even many hours, and it's really heavy lift, and I needed a full team of people to manage this stuff. So I think as the technology advanced, as well as kind of outpaced all of the data protection, software, and solutions are out there, it just kind of had to happen. And another thing, as cloud and object store also permeated the market, it really gave us a great opportunity to use that for long term retention. Beyond just the old tape and things like that. >> Yeah, okay. >> Yeah, that sounds fair. >> And what do you think has bene the biggest cultural change? Because, there's a lot of technology that goes into that, but you're talking about having whole teams of people who have to herd this stuff around with small little toothbrushes and stuff just to keep the thing running, whereas now, you can pretty much run it with one or two guys just sitting there, and go yeah, it just works. >> Well, it's similar to, remember when we went through virtualization, and it used to be whole armies of people managing all these pizza boxes, and tube servers, and there was just a lot of infrastructure and operation people necessary to run the data center, and then we virtualized, and I know my personal story was there was two of us managing 1,300 virtual machines. >> Wow! >> Right? So that scale is astronomical compared to what we're used to, we'll then apply that kind of mentality to data protection and it's yeah, it's a few minutes from one person, or distributed team that spends a few minutes a week, maybe a month, something like that, managing things more at the policy and the tag, and the meta data layer, and it's that journey all over again. So the nice part is we've done this before, we know it can be done, but kind of the hard part is, people are always the hardest part of the equation, and sometimes it's tough to put your hands off the handlebars of the bike and just say, I trust an intelligent system to manage this part of the stack, and I'm gonna go focus on where are we trying to go. >> Justin: Yeah, you know. >> Speaking of trust, you know, you talked about how your enterprise customers had pulled you into or up the chain there, a lot of what Andy Jussy said recently to John Furrier is, 18 billion dollar run rate, growing up 42% a year ... They haven't gotten this big with just startups alone. So he's talking about enterprise as really being on the precipice of this mass migration. How does being a young company, how does your relationship with AWS help give more credibility to Rubrik as a trusted advisor to these enterprises? >> Yeah, I'll kind of start at the end and work my way backwards. So we recently hit the advanced tier partner status with Amazon. And part of that I would site a couple of public references with Castalia schools, as well as Fuji Rabio, are two different companies in different parts of the market, but they're very much focused on, we need a partner that can bring us into the cloud, kind of on board us into that environment. AWS was the specific cloud provider they were looking to get into, without kind of operationally. That's a scary thing, you know? It's tough as an infrastructure, or as an operations focused engineer, or even as a developer I think sometimes, to say, I wanna take this data, and it represents my apps, and my servers, and my solution stack, and put that into public cloud. Either for archive and retention, or potentially to use our cloud instantiation solution that was recently renounced, where they can start building workloads into public cloud. So I think that's why, kind of at that point, we work backwards a little bit to say, as we work with the customers that we're looking to do that, before it was, well, you have to learn all this stuff, and really become super technically deep on it, and I love that article by the way, I thought it was really deep, but if you looked at this week in AWS, that by Quinny Pig on Twitter, he's always pointing out every week, the S3 bucket failure, because it's hard, cloud is really, really hard. So if you have that kind of abstraction layer that can make it really simple for customers, to on board in there. It's simple, but it's also abstracted from the nuances across multiple public cloud providers, including AWS. I think that's the magic sauce that really gets people excited about it. >> That abstraction also probably gives them a little bite more comfort, right? >> Exactly. >> Some of the sausage making, they don't have to see. >> Exactly, cuz part of our secret sauce, is as the data is entering into that environment, we're not just saying okay, it's there, done, it's now your problem. Part of our cloud data management story is that the data enters that environment, but we're constantly checking it, making sure it's valid, making sure that it's secure. We handle all of the encryption. The data efficiency. The whole end to end life cycle of the data is respected, whereas traditionally, it was, you just kind of scrape data out of the data center, you drop it off into an S3 bucket, you pray that it's going to be there when you need it, and who knows? Now it's IT offices problem. We don't just do the hand off and say good luck, we handle it from cradle to grave for all the data. >> Now, you mentioned a little bit ago, you were talking to Justin, you talked about the horrible nature of things four or five years ago, right? So, no matter what time you're in, there's always a horrible nature of things. There's always a problem, so now that whatever was the issue then, what is the issue now? As you, new capabilities will develop, it will open up a whole new Pandora's box of challenges and problems. You have unforeseen issues, so what do you guys, when you're looking at your headlights, twelve, eighteen months down the road, you say, oh yeah, this is our next one we've got to tackle, this is the baby we've got to get our arms around? >> For me more near term, it's around the transition from trusting infrastructure, to provide high availability and disaster recovery, and moving that more towards the application and the stack itself. So, holistically, in the past, you'd have two data centers, they'd replicate, one's for DR, one's not. The cloud wasn't really in that equation, and all of the redundancies was handled at the infrastructure layer. Well, okay, now, if we can kind of surround meta data around the application, provide instant search, global availability, replication, the ability to actually stand up those applications in a public cloud? Well now the question is, do I really need that infrastructure layer anymore? Do I need the second data center? Can't I just use public cloud, or an MSP, or someone that's providing Rubrik as a service as an example of a service to provide that for me? More long term, I tend to look at, kind of in the discussion that I saw between John and Andy Jassy, was around the part where I get really nerdy is around like server lists, and the ability to provide functions kind of in the data path. And now I start to imagine, okay, we're putting a lot of data for customers into public cloud and even into private object store resources, and there's the ability, I think there was Green Grass as an example, where you can kind of put that shim layer into the edge to do the function as the data's going in there. There's a lot of interesting opportunities that I'm looking forward to in the next year where, well, we already have an index of the data, we are already very cognizant and content aware when it comes to what we're protecting. Wouldn't it be cool if we could do more interesting things with the data in flight, as well as where it's ultimately resting, kind of like with the announcements with the media and the trans-coding and the video services that I think came out rather recently. So that's kind of the two stage answer to that question that I have. >> So Chris, one of the ways that AWS has succeeded, is by appealing to developers. And you're talking there about things that are in the application layer, that have nothing to do with infrastructure, and developers hate infrastructure. So what are some of the things that you're doing, that Rubrik is doing to appeal to developers specifically in being able to access their data and not have to manage it, as you say, the way we used to do it, which was, the very infrastructure centric problem? What are you using to expose the data and to manage it as a data problem, rather than an infrastructure problem? >> Well, I think that goes back to traditionally how we managed infrastructure. Especially on Prim, and it was all very manual, very imperative, meaning you're pulling the lever, and you're telling the system ... It's a dumb system that you're the intelligent layer of it and you have to control it. And that, it doesn't work in the cloud model at all, and it really, I don't think it works long term in the data center model. Because then I need, I always have to scale literally people to data. And that doesn't work. >> Yeah, humans don't scale. >> Right, we can't just get magically more of us. So what we've done differently from day one was designed a system where every component within the stack, even internal communications, are calling restful APIs, and the whole system is distributed. So there's no controller that you have to deal with, you don't have to become, you don't have to know anything about storage to use the product. It's not infrastructure bound, so you're able to control it completely through restful APIs, or through configuration management tools, cloud management platforms, etc. So if you're a developer looking to, alright, I have an application, I want to make sure that it's automatically protected as part of that process, and sent to AWS, and automatically build me a cloud instantiation, and EC2 is an instance ... Great, make one, two, maybe three API calls, you don't have to know anything about infrastructure, which is the panacea, no developer wants to like, dig into V lands and things like that. That's really cool, and it solves a very valid business case in that if one person can write the code, and it works, just repeat that process, and it scales infinitely. I don't need extra developers for that. >> So to be able to do that, I need to understand that that API exists, so what are you doing to actually show developers that hey, this thing is here and this is how it works? Here's something that you actually know how to do! How are you exposing that idea to the developers? >> Well, very early on, we worked with swagger, which ultimately has become the open API spec to that O, and so every node within our distributed system actually surfaces the entire API suite, in two formats. One, is like a playground, so you can, even if you're newer to APIs, maybe on the infrastructure side, you can kind of do a, try it now button, you can kind of say, what would happen? And it surfaces what the call would look like, and how to structure properly, and what the return codes are, but more importantly, there's also the why and the how of the API in a different kind of documentation suite, using redoc, where you can go in and literally see, okay, what's the mindset here? What's the use case? What's the example? And I feel like that's typically what's missing in a lot of these equations, where it's just, here is the nuts and bolts, here is the tactical information, here is, push this button, things happen. It's more like, here is why you would use it, here's an example, and a lot of the code to do that has already been created by our ranger team internally and made either exposed publicly as open source as privately as something that we share with out customer base. >> Cool. >> Well, Chris, you described, like you said, a rocket ship, right? You've been on for two and a half years, I think you better fasten that seatbelt, it's not going to slow down for you, I don't think. >> Chris: (laughing) I appreciate that. >> Which is a good thing, right? >> Chris: Yeah, yeah. >> Yeah, it's all good. >> Chris: Yeah, I know. >> John: I hope you didn't feel ganged up on either, right? You came out here, it was okay? >> It's a pretty friendly crowd, I appreciate that. >> I think so. Chris Wahl from Rubrik joining us here as we continue our coverage live here on the Cube, we're at AWS's re:Invent, live in Las Vegas, back with more in just a bit. (soft tech music)

Published Date : Nov 28 2017

SUMMARY :

it's the Cube! and I imagine for the next two or three days, Lisa Martin to my left, Justin Warren on the far right. He's got the evil eye and your place here and your feelings about the show. and everyone's kind of feeling the elephant the full stack is distributed and scaled out, as you well know. and backup, and data protection in the last, say, and cloud is starting to eat up part of the data center, And what do you think has bene and operation people necessary to run the data center, and the meta data layer, as really being on the precipice and I love that article by the way, is that the data enters that environment, You have unforeseen issues, so what do you guys, and the ability to provide functions kind of in and not have to manage it, as you say, and you have to control it. and the whole system is distributed. here's an example, and a lot of the code to do that I think you better fasten that seatbelt, as we continue our coverage live here on the Cube,

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Chris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Justin Warren	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Andy Jussy	PERSON	0.99+
John Wahl	PERSON	0.99+
Justin	PERSON	0.99+
Lisa Martin	PERSON	0.99+
two	QUANTITY	0.99+
Chris Wahl	PERSON	0.99+
Andy Jassy	PERSON	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
twelve	QUANTITY	0.99+
John Furrier	PERSON	0.99+
five years	QUANTITY	0.99+
Pandora	ORGANIZATION	0.99+
Quinny Pig	PERSON	0.99+
two formats	QUANTITY	0.99+
1,300 virtual machines	QUANTITY	0.99+
two and a half years	QUANTITY	0.99+
next year	DATE	0.99+
a month	QUANTITY	0.99+
two guys	QUANTITY	0.99+
about three and a half years	QUANTITY	0.99+
three hosts	QUANTITY	0.99+
Fuji Rabio	ORGANIZATION	0.99+
four	DATE	0.99+
50,000 plus	QUANTITY	0.98+
about 13 years	QUANTITY	0.98+
11 o'clock	DATE	0.98+
one person	QUANTITY	0.98+
First	QUANTITY	0.98+
two data centers	QUANTITY	0.98+
about two and a half years	QUANTITY	0.97+
second data center	QUANTITY	0.96+
almost 50 billion dollar	QUANTITY	0.96+
One	QUANTITY	0.96+
Rubrik	ORGANIZATION	0.95+
two stage	QUANTITY	0.95+
five years ago	DATE	0.94+
EC2	TITLE	0.93+
42% a year	QUANTITY	0.93+
about five	QUANTITY	0.92+
18 billion dollar	QUANTITY	0.91+
Green Grass	ORGANIZATION	0.91+
re:Invent show	EVENT	0.9+
eighteen months	QUANTITY	0.9+
three days	QUANTITY	0.9+
two different companies	QUANTITY	0.88+
TAM	ORGANIZATION	0.87+
day one	QUANTITY	0.87+
S3	COMMERCIAL_ITEM	0.87+
a few minutes a week	QUANTITY	0.86+
this week	DATE	0.83+
minutes	QUANTITY	0.83+
Twitter	ORGANIZATION	0.81+
Cube	PERSON	0.81+
Rubrik	PERSON	0.8+
re:Invent 2017	EVENT	0.78+
Invent	EVENT	0.76+
Castalia schools	ORGANIZATION	0.73+
this morning	DATE	0.7+

Jeff Weidner, Director Information Management | Customer Journey

>> Welcome back everybody. Jeff Frick here with theCube. We're in the Palo Alto studio talking about customer journeys today. And we're really excited to have professional, who's been doing this for a long time, he's Jeff Weidener, he's an Information Management Professional at this moment in time, and still, in the past and future, Jeff Welcome. >> Well thank you for having me. >> So you've been playing in the spheres for a very long time, and we talked a little bit before we turned the cameras on, about one of the great topics that I love in this area is, the customer, the 360 view of the customer. And that the Nirvana that everyone says you know, we're there, we're pulling in all these data sets, we know exactly what's going on, the person calls into the call center and they can pull up all their records, and there's this great vision that we're all striving for. How close are we to that? >> I think we're several years away from that perfect vision that we've talked about, for the last, I would say, 10, 10 to 15 years, that I've dealt with, from folks who were doing catalogs, like Sears catalogs, all the way to today, where we're trying to mix and match all this information, but most companies are not turning that into actionable data, or actionable information, in any way that's reasonable. And it's just because of the historic kind of Silo, nature of all these different systems, I mean, you know, I keep hearing about, we're gonna do it, all these things can tie together, we can dump all the data in a single data lake and pull it out, what are some of the inhibitors and what are some of the approaches to try to break some of those down? >> Most has been around getting that data lake, in order to put the data in its spot, basically try and make sure that, do I have the environment to work in? Many times a traditional enterprise warehouse doesn't have the right processing power, for you, the individual, who wants to do the work, or, doesn't have the capacity that'll allow you to just bring all the data in, try to ratify it. That's really just trying to do the data cleansing, and trying to just make some sense of it, cause many times, there aren't those domain experts. So I usually work in marketing, and on our Customer 360 exercise, was around, direct mail, email, all the interactions from our Salesmaker, and alike. So, when we look at the data, we go, I don't understand why the Salesmaker is forgetting X, of that behavior that we want to roll together. >> Right. >> But really it's finding that environment, second is the harmonization, is I have Bob Smith and Robert Smith, and Master Data Management Systems, are perhaps few and far between, of being real services that I can call as a data scientist, or as a data worker, to be able to say, how do I line these together? How can I make sure that all these customer touchpoints are really talking about the same individual, the company, or maybe just the consumer? >> Right. >> And finally, it is in those Customer 360 projects getting those teams to want to play together, getting that crowdsourcing, either to change the data, such as, I have data, as you mentioned around Chat, and I want you to tell me more about it, or I want you to tell me how I can break it down. >> Right, right. >> And if I wanna make changes to it, you go, we'll wait, where's your money, in order to make that change. >> Right, right. >> And there's so many aspects to it, right. So there's kind of the classic, you know, ingest, you gotta get the data, you gotta run it through the processes you said did harmonize it to bring it together, and then you gotta present it to the person who's in a position at the moment of truth, to do something with it. And those are three very very different challenges. They've been the same challenges forever, but now we're adding all this new stuff to it, like, are you pulling data from other sources outside of the system of record, are you pulling social data, are you pulling other system data that's not necessarily part of the transactional system. So, we're making the job harder, at the same time, we're trying to give more power to more people and not just the data scientists. But as you said I think, the data worker, so how's that transformation taking place where we're enabling more kind of data workers if you will, that aren't necessarily data scientists, to have the power that's available with the analytics, and an aggregated data set behind them. >> Right. Well we are creating or have created the wild west, we gave them tools, and said, go forth and make, make something out of it. Oh okay. Then we started having this decentralization of all the tools, and when we finally gave them the big tools, the big, that's quote unquote, big data tools, like the process, billings of records, that still is the wild west, but at least we're got them centralized with certain tools. So we were able to do at least standardize on the tool set, standardize on the data environment, so that at least when they're working on that space, we get to go, well, what are you working on? How are you working on that? What type of data are you working with? And how do we bring that back as a process, so that we can say, you did something on Chat Data? Great! Bob over here, he likes to work with that Chat data. So that, that exposure and transparency because of these centralization data. Now, new tools are adding on top of that, data catalogs, and putting inside tools that will make it so that you actually tell, that known information, all-in-one wiki-like interface. So we're trying to add more around putting the right permissions on top of that data, cataloging them in some way, with these either worksheets, or these information management tools, so that, if you're starting to deal with privacy data, you've got a flag, from, it's ingest all the way to the end. >> Right. >> But more controls are being seen as a way that a business is improving its maturity. >> Yeah. Now, the good news bad news is, more and more of the actual interactions are electronic. You want it going to places, they're not picking up the phone as much, as they're engaging with the company either via web browser or more and more a mobile browser, a mobile app, whatever. So, now the good news is, you can track all that. The bad news is, you can track all that. So, as we add more complexity, then there's this other little thing that everybody wants to do now, which is real-time, right, so with Kafka and Flink and Spark and all these new technologies, that enable you to basically see all the data as it's flowing, versus a sampling of the data from the past, a whole new opportunity, and challenge. So how are you seeing it and how are you gonna try to take advantage of that opportunity as well as address that challenge in your world. >> Well in my data science world, I've said, hey, give me some more data, keep on going, and when I have to put on the data sheriff hat, I'm now having to ask the executives, and our stakeholders, why streaming? Why do you really need to have all of this? >> It's the newest shiny toy. >> New shiny toy! So, when you talk to a stakeholder and you say, you need a shiny toy, great. I can get you that shiny toy. But I need an outcome. I need a, a value. And that helps me in tempering the next statement I give to them, you want streaming, so, or you want real time data, it's gonna cost you, three X. Are you gonna pay for it? Great. Here's my shiny toy. But yes, with the influx of all of this data, you're having to change the architecture and many times IT traditionally hasn't been able to make that, that rapid transition, which lends itself to shadow IT, or other folks trying to cobble something together, not to make that happen. >> And then there's this other pesky little thing that gets in the way, in the form of governance, and security. >> Compliance, privacy and finally marketability, I wanna give you a, I want you to feel that you're trusting me, in handling your data, but also that when I respond back to you, I'm giving you a good customer experience so called, don't be creepy. >> Right, right. >> Lately, the new compliance rule in Europe, GDPR, a policy that comes with a, well, a shotgun, that says, if there are violations of this policy, which involves privacy, or the ability for me to be forgotten, of the information that a corporation collects, it can mean four percent of a total company's revenue. >> Right. >> And that's on every instance, that's getting a lot of motivation for information governance today. >> Right. >> That risk, but the rules are around, trying to be able to say, where did the data come from? How did the data flow through the system? Who's touched that data? And those information management tools are mostly the human interaction, hey what are you guys working on? How are you guys working on it? What type of assets are you actually driving, so that we can bring it together for that privacy, that compliance, and workflow, and then later on top of that, that deliverability. How do you want to be contacted? How do you, what are the areas that you feel, are the ways that we should engage with you? And of course, everything that gets missed in any optimization exercise, the feedback loop. I get feedback from you that say, you're interested in puppies, but your data set says you're interested in cats. How do I make that go into a Customer 360 product. So, privacy, and being, and coming at, saying, oh, here's an advertisement for, for hippos and you go, what do you know about me that I don't know? >> Wrong browser. >> So you chose Datameer, along the journey, why did you choose them, how did you implement them, and how did they address some of these issues that we've just been discussing? >> Datameer was chosen primarily to take on that self-service data preparational layer from the beginning. Dealing with large amounts of online data, we move from from taking the digital intelligence tools that are out there, knowing about browser activities, the cookies that you have to get your identity, and said, we want the entire feed. We want all of that information, because we wanna make that actionable. I don't wanna just give it to a BI report, I wanna turn it into marketing automation. So we got the entire feed of data, and we worked on that with the usual SQL tools, but after a while, it wasn't manageable, by either, all of the 450 to 950 columns of data, or the fact that there are multiple teams working on it, and I had no idea, what they were able to do. So I couldn't share in that value, I couldn't reuse, the insights that they could have. So Datameer allowed for a visual interface, that was not in a coding language, that allowed people to start putting all of their work inside one interface, that didn't have to worry about saving it up to the server, it was all being done inside one environment. So that it could take not only the digital data, but the Salesforce CRN data, marry them together and let people work with it. And it broadened on the other areas, again allowing it that crowdsourcing of other people's analytics. Why? Mostly because of the state we are in around IT, is an inability to change rapidly, at least for us, in our field. >> Right. >> That my, the biggest problem we had, was there wasn't a scheduler. We didn't have the ability to get value out of our, on our work, without having someone to press the button and run it, and if they ran it, it took eight hours, they walked away, it would fail. And you had no, you had to go back and do it all over again. >> Oh yeah. >> So Datameer allows us to have that self-service interface, that had management that IT could agree upon, to let us have our own lab environment, and execute our work. >> So what was the results, when you suddenly give people access to this tool? I mean, were they receptive, did you have to train them a lot, did some people just get it and some people just don't, they don't wanna act on data, what was kind of the real-world results of rolling this out, within the population? Real-world results allowed us to get ten million dollars in uplift, in our marketing activities across multiple channels. >> Ten million dollars in uplift? How did you measure that? >> That was measured through the operating expenses, by one not sending that work outside, some of the management, of the data, is being, was sent outside, and that team builds their own models off of them, we said, we should be able to drink our own champagne, second, it was on the uplift of a direct mail and email campaign, so having a better response rate, and generally, not sending out a bunch of app store messages, that we weren't needing too. And then turning that into a list that could be sent out to our email and direct mail vendors, to say, this is what we believe, this account or contact is engaged with on the site. Give those a little bit more context. So we add that in, that we were hopefully getting and resonating a better message. >> Right. >> In, and where did you start? What was the easiest way to provide an opportunity for people new to this type of tooling access to have success? >> Mostly it was trying to, was taking pre-doctored worksheets, or already pre-packaged output, and one of the challenges that we had were people saying well I don't wanna work in a visual language, while they're users of tools like Tableau or Clicks, and others that are happy to drag-and-drop in their data, many of the data workers, the tried-and-true, are saying, I wanna write it in SQL. >> Mm hm. >> So, we had to give at least that last mile, analytical data set to them, and say, okay. Yeah, go ahead and move it over to your SQL environment, move it over into the space that you feel comfortable and you feel confident to control, but let' come on back and we'll translate it back to, this tool, we'll show you how easy it was, to go from, working with IT, which would take months, to go and doing it on yourself, which would take weeks, and the processing and the cost of your Siloed, shadowed IT environment, will go down in days. We're able to show them that, that acceleration of time to market of their data. >> What was your biggest surprise? An individual user, an individual use case, something that really you just didn't see coming, that's kind of a pleasant, you know the law of unintended consequences on the positive side. >> That's was such a wide option, I mean honestly, beginning back from the data science background, we thought it would just be, bring your data in, throw it on out there, and we're done. We went from, maybe about 20 large datasets of AdTech and Martech, and information, advertising, technology, marketing technology, data, to CRMM formation, order activity, and many other categories, just within marketing alone, and I think perhaps, the other big ah-ha moment was, since we brought that in, of other divisions data, those own teams came in, said, hey, we can use this too. >> Right. >> The adoption really surprised me that it would, you would have people that say, oh I can work with this, I have this freedom to work with this data. >> Right right. >> Well we see it time and time again, it's a recurring theme of all the things we cover, which is, you know a really, big piece of the innovation story, is giving, you know, more people access to more data, and the tools to actually manipulate it. So that you can unlock that brain power, as opposed to keeping it with the data scientists on Mahogany Row, and the super-big brain. So, sounds like that really validates that whole hypothesis. >> I went through reviewing hands-on 11 different tools, when I chose Datameer. This was everything from, big name companies, to small start-up companies, that have wild artificial intelligence slogans in their marketing material, and we chose it mostly because it had the right fit, as an end-to-end approach. It had the scheduler, it had the visual interface, it had the, enough management and other capabilities that IT would leave us alone. Some of the other products that we were looking at gave you, Pig-El-Lee to work with data, will allow you to schedule data, but they never came all together. And for the value we get out of it, we needed to have something altogether. >> Right. Well Jeff, thanks for taking a few minutes and sharing your story, really appreciate it, and it sounds like it was a really successful project. >> Was! >> All right. He's Jeff Weidener, I'm Jeff Frick, you're watching theCube from Palo Alto. Thanks for watching.

Published Date : Nov 16 2017

SUMMARY :

We're in the Palo Alto studio talking And that the Nirvana that of the approaches to try to the environment to work in? and I want you to tell me to it, you go, we'll wait, the processes you said did harmonize it so that we can say, you that a business is improving its maturity. of the actual interactions are electronic. I give to them, you want gets in the way, in the form I wanna give you a, I want you of the information that of motivation for that you feel, are the ways of the 450 to 950 columns That my, the biggest problem we had, that self-service interface, of the real-world results the data, is being, was sent and others that are happy to that you feel comfortable that really you just didn't back from the data science me that it would, you would So that you can unlock that And for the value we it was a really successful project. Thanks for watching.

ENTITIES

Entity	Category	Confidence
Jeff Weidner	PERSON	0.99+
Jeff Weidener	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jeff	PERSON	0.99+
Europe	LOCATION	0.99+
eight hours	QUANTITY	0.99+
Bob	PERSON	0.99+
ten million dollars	QUANTITY	0.99+
Datameer	ORGANIZATION	0.99+
Ten million dollars	QUANTITY	0.99+
10	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
450	QUANTITY	0.99+
11 different tools	QUANTITY	0.99+
four percent	QUANTITY	0.99+
Sears	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
three	QUANTITY	0.99+
15 years	QUANTITY	0.99+
second	QUANTITY	0.98+
AdTech	ORGANIZATION	0.98+
Martech	ORGANIZATION	0.98+
SQL	TITLE	0.98+
360 view	QUANTITY	0.97+
950 columns	QUANTITY	0.97+
today	DATE	0.97+
theCube	ORGANIZATION	0.97+
one	QUANTITY	0.95+
Tableau	TITLE	0.95+
one interface	QUANTITY	0.93+
single	QUANTITY	0.93+
Pig-El-Lee	ORGANIZATION	0.93+
Master Data Management Systems	ORGANIZATION	0.89+
Mahogany Row	TITLE	0.86+
Spark	TITLE	0.81+
one environment	QUANTITY	0.8+
about 20 large datasets	QUANTITY	0.79+
Clicks	TITLE	0.77+
360	QUANTITY	0.77+
Robert Smith	PERSON	0.73+
Bob	ORGANIZATION	0.7+
Salesmaker	ORGANIZATION	0.7+
Smith	PERSON	0.67+
Salesforce	ORGANIZATION	0.66+
Flink	ORGANIZATION	0.66+
instance	QUANTITY	0.63+
Kafka	ORGANIZATION	0.52+
Nirvana	ORGANIZATION	0.43+
CRN	TITLE	0.39+

David Richards, WANdisco | AWS Summit 2017

>> Narrator: Live from Manhattan, it's theCUBE, covering AWS Summit New York City 2017, brought to you by Amazon Web Services. >> And welcome back to New York, here. AWS Summit, theCUBE continue our coverage of what's happening here in the Big Apple. I'm John Walls along with Stu Miniman, and what this is is maybe not the most prolific CUBE guest of all time, but he's in the hall of fame. He really is a CUBE MVP for sure. It's good to have David Richards with us, the president, chairman, CEO of WANdisco. Good to see you, sir. >> It's a pleasure to be back again. It feels like home. >> It is like home. We need to get you your own microphone, I think, you know? >> David: I know it. I need my name on the back of the seat or something. >> This isn't quite a home game for you. All right, so you've got an office in Sheffield, England. >> David: Yeah. >> You've got an office out in the valley, Silicon Valley. We got ya right in the middle, I think. >> David: Yeah. >> Almost, don't we? So-- >> Exactly. >> We kind of split the difference for you this one. >> I always tell people I'm recolonizing the United States. I've been here for about 20 years. I can change the accent. >> Right. >> I'll get you all, eventually. >> All right, well, another year or two, we'll see how that works for ya. Big, big, I guess six, seven months for you, right? As far as some acquisitions you've done, some vice partnerships and arrangements you've done. >> Yes, as a business, we've really progressed well in the first half of the year. I've got to be a little bit careful. We've got results coming out September the sixth in London, but we did do a pre-announcement of a business update. We signed a record big data cloud contract with a very large bank for over four million dollars. That was our largest ever contract win. We signed a major retailer who we can't name, obviously, which is another sort of cloud ObjectStore on premises. A big data win, and interestingly, we stopped burning cash and investors really like this kind of perfect storm of, 175%, 173% growth in our cloud big data revenue, booking, sorry, combined with a flat cost-base, which meant, first half of last year, burning five point four million dollars down to virtually zero, just $600,000 in the first half. So, investors really like that. We really like that, and it demonstrates that perfect storm of flat cost-base and growing sales. >> David, I'm curious, does working with Amazon, and your customers being on Amazon, does the speed and agility and everything like that contribute to that profitability? >> Well, Amazon kind of changes the game for all vendors, right? Because nobody, it used to be this sort of big four, five, six, whatever it is these days, consulting companies that had to implement ERP systems and all those complex applications. I don't necessarily think they're the people, they're not the go-to people anymore for cloud. So, it's down to uniqueness of technology. Amazon have got such a wide array, we were talking earlier about some of their announcements out today as they continue to go up the stack with applications and so on. So, it does lend itself very well to small vendors with sticky, unique intellectual property and unique products and services that are going to really thrive in this kind of cloud environment. So, we've really enjoyed working with Amazon, but we're also working with the other cloud vendors, as well, and I have to say, when we first saw the Snowmobile and the Snowball, well, actually, the Snowmobile, drive out on stage in New York, was it 12, 18 months ago? It's dog years, so everything goes seven times faster. >> John: Right, right, right. >> I was laughing. I was like, "How on Earth can you possibly use a truck to move data?" But a customer came to us, a prospect came to us the other day, he wanted to move a hundred petabytes of data. Now, if you're going to use the public internet to do that, that's going to take a hell of a long time. So, this idea of a mix between physical and digital data movement I think is, when moving to cloud, is actually fascinating. I think it's a really fascinating subject area. One that customers are definitely going to use. >> Yeah, you've got a great vantage point looking at customers' migrations. >> David: Yeah. >> It was actually something big in the keynote talking about, there are so many migrations out there that Amazon released an AWS Migration Hubs. So, obviously, physics is always a challenge, my legacy mindset. Customers, we heard a customer up onstage and it's usually not lift and shift maybe for the private cloud, but for public cloud, I usually, I need to rewrite, I need to do micro-services. What is the friction for customers, and how are you and Amazon and the other clouds helping customers work through those challenges? >> OK, so, just to take a step back and think about the problems that happen at hyper-scale data movement. So, small-scale data, gigabyte-scale data, the stuff that you typically see in a relational database, they're not particularly big problems. It's kind of minimal outage, press pause, move data, make it consistent, and you're done. You can have a sort of, a small outage, maybe 15 minutes or even a day to move data, but when it gets to hyper-scale, when it gets to petabyte-scale, multi-terabyte-scale data moves, that's when you have a problem, and that's really the problem that we solve. So, the idea that you can move data that's moving and changing without an interruption to service from on-premise to cloud and support a hybrid cloud topology for an elongated period of time is fascinating. I was listening at an investor conference to the CEO of VMware who was talking about, we're going to be in a situation of hybrid cloud for the next 20, 25 years because, overnight, not everybody can just repurpose every single application that they're running on-premise, whether it's in the main frame application, or a relational data application, or wherever it is in the OP application, and repurpose that in cloud overnight. So, we're going to have to gradually move and migrate those applications over. So, it's highly likely we're going to be in a hybrid cloud environment for the foreseeable future, and that's actually fantastic news for us. We're moving, as I said, at scale companies into cloud with transactional data, and nobody else can touch us in terms of the uniqueness of the IP, which is fantastic news for us. >> In terms of just big data in general, Stu has one use for it, I have a different use for it. It's going to live in a lot of different places. How are you responding to different needs within your clients and trying to make them more effective, make them more efficient? And yet, when you're dealing with more and more data, that's a big storm to handle. >> That's a great question. I went to speak a couple of months ago to a new customer of ours who is a major healthcare provider on the east coast, and I kind of said to him, "OK, you've had this deep cluster for the past three years. Why are you calling us? Why now?" Which is the question that I always ask our customers. Why? What changed? Why are you doing this right now?" And maybe for the past three years they've been putting legal data into the system. That's data, but who cares if you can't get access to it? We can move to telephone. We can move to e-mails. We can go into an archive, into a paper archive even, to find it, but the why now is that they're now putting patient record data, patient information with regulated SLA's into this system, and that really is our sweet spot. As you get to, remember that investment thesis, small-scale gigabyte outage is small outage, when you get into petabyte, exabyte-scale, when you've got data sets that are a thousand, a million times greater, it's linear to the quantum of data. That outage becomes a thousand or a million times greater. So, that's kind of intolerable. So, we love it when strategic applications, regardless of what the use case is, we could all have different, it might be patient data, it might be retail information, it might be banking data, it might be customer retention information, when those strategic applications move onto this hyper-scale infrastructure, you have to support RTO and RTP, and that's what we do. >> And is a byte a byte a byte? You have these thousands of needles in haystacks, right? How do you assign value to one as opposed to another? >> So, this is another great question and one that investors kind of ask me a lot. So, we used to model our business from kind of the ground up. So, we take the classic enterprise sales team, you have a sales and marketing organization that's quite large, you would multiply that by their quota and then multiply it by 66% because that's how many of them are going to be successful in selling product. Well, we completely threw that away when we launched WANdisco Fusion, our new technology, early 2016. Then, we moved to a channel-based approach. So, we have IBM, we have an OAM, 5,000 quarter-carrying enterprise sales guys at IBM selling our products. That was a fantastic deal for us. We signed it in April 2016, and they've done the first half of this year, and made at least six million dollars in sales that we have also announced, and then, we've got strategic partnerships with Amazon, with Microsoft, with Google, and we model our business by those channels. So, we're not looking for needles in haystacks. We don't, we could never hire another, I mean, if we had to come into the market and say, "We need to go and hire 5,000 enterprise sales guys," we'd have to be raising, doing fund-raisers like Uber or something. We'd just be untenable. We couldn't do it. So, we have a product that lends itself very well to a channel-based approach, and that's working very nicely for us. So, we're not looking for, we're just looking for haystacks. Somebody else can go and find the needles. >> John: Find me and you, right? >> Right. >> David, how are your customers managing the pace of change these days? We've said Amazon is an example. It's like everyday there's three new services coming out. Are they excited? Are they completely overwhelmed? What do you see these days? >> So, I think it's classic sort of products and option lifecycle stuff. The sort of technical enthusiasts, they love all this change. The early-stage companies that are implementing this new cloud-based technology, ObjectStore technology and so on, they're managing very well. It's the later-stage companies you might go to and say, "ObjectStore," and they'll go, "What's ObjectStore? We're just getting our head around Hadoop, and Hive, and Pig, and all this other stuff that you were talking about three years ago," and sales guys go in there now and say, "Oh, no, no, no, don't worry about Hadoop. Nobody's going to run Hadoop in the cloud." It's like, "Well, that's what you told me three years ago." So, I think the market's certainly divided. I think you're going to see, as we move up products and option lifecycle, you're going to see lots and lots and lots of interesting moves happen. The companies that seem to be owning cloud, I think Alibaba is coming up really fast. We're seeing them doing some interesting things. Obviously, they've got dominoes in the Chinese market. Amazon First-Mover, Microsoft's futures dependent on cloud. So, they all have their different spin and different take on applications that they're going to run in cloud. I think there is, I think it's a bit like the cellphone industry. There's lot and lots of different plans, lots and lots of different confusing nomenclature, but that's going to settle out in the next couple of years, but there's unquestionably, if you look at the audience here today, unquestionably large-scale movement of applications and data to cloud. >> Well, we appreciate the time, as always. Great to see you. Another notch in your CUBE belt. (laughing) So, congratulations for that, and maybe you can settle in to New York for a day or two. You said your travels have had you flip-floppin' back and forth between England and here. So, maybe you can settle in for a day or two. >> Yeah, I need to replicate myself. I need to put myself in at least two different places at the same time. >> Live data replication right here. (laughing) All right, David, thanks for bein' with us. David Richards. >> Thank you. Thanks guys. >> Back with more here on theCUBE, we continue our coverage of AWS Summit from New York City right after this break. (upbeat music)

Published Date : Aug 14 2017

SUMMARY :

brought to you by Amazon Web Services. It's good to have David Richards with us, It's a pleasure to be back again. We need to get you your own microphone, I think, you know? I need my name on the back of the seat or something. All right, so you've got an office in Sheffield, England. You've got an office out in the valley, Silicon Valley. I can change the accent. As far as some acquisitions you've done, I've got to be a little bit careful. So, it's down to uniqueness of technology. One that customers are definitely going to use. Yeah, you've got a great vantage point I need to do micro-services. and that's really the problem that we solve. that's a big storm to handle. and I kind of said to him, because that's how many of them are going to be successful What do you see these days? on applications that they're going to run in cloud. and maybe you can settle in to New York for a day or two. I need to put myself in at least two different places All right, David, thanks for bein' with us. Thank you. we continue our coverage of AWS Summit from New York City

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
Google	ORGANIZATION	0.99+
John	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Stu Miniman	PERSON	0.99+
April 2016	DATE	0.99+
London	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
David Richards	PERSON	0.99+
six	QUANTITY	0.99+
New York	LOCATION	0.99+
15 minutes	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
WANdisco	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
England	LOCATION	0.99+
New York City	LOCATION	0.99+
two	QUANTITY	0.99+
September	DATE	0.99+
175%	QUANTITY	0.99+
66%	QUANTITY	0.99+
a day	QUANTITY	0.99+
173%	QUANTITY	0.99+
Big Apple	LOCATION	0.99+
Earth	LOCATION	0.99+
VMware	ORGANIZATION	0.99+
seven months	QUANTITY	0.99+
early 2016	DATE	0.99+
five	QUANTITY	0.99+
three years ago	DATE	0.99+
AWS	ORGANIZATION	0.99+
Sheffield, England	LOCATION	0.99+
four million dollars	QUANTITY	0.98+
over four million dollars	QUANTITY	0.98+
United States	LOCATION	0.97+
three new services	QUANTITY	0.97+
zero	QUANTITY	0.97+
ObjectStore	ORGANIZATION	0.97+
today	DATE	0.97+
5,000	QUANTITY	0.97+
thousands of needles	QUANTITY	0.96+
about 20 years	QUANTITY	0.96+
AWS Summit	EVENT	0.96+
first	QUANTITY	0.95+
Silicon Valley	LOCATION	0.95+
Hive	ORGANIZATION	0.95+
first half	QUANTITY	0.95+
AWS Summit New York City 2017	EVENT	0.94+
AWS Summit 2017	EVENT	0.93+
sixth	DATE	0.92+
CUBE	ORGANIZATION	0.9+
first half of last year	DATE	0.89+
5,000 enterprise sales guys	QUANTITY	0.88+
a million times	QUANTITY	0.88+
couple of months ago	DATE	0.88+
$600,000	QUANTITY	0.87+
seven times	QUANTITY	0.87+
theCUBE	ORGANIZATION	0.86+
Snowmobile	ORGANIZATION	0.86+
two different places	QUANTITY	0.85+
a thousand	QUANTITY	0.85+
One	QUANTITY	0.85+
Pig	ORGANIZATION	0.85+
at least six million dollars	QUANTITY	0.84+
past three years	DATE	0.83+
four	QUANTITY	0.83+

Fireside Chat with Andy Jassy, AWS CEO, at the AWS Summit SF 2017

>> Announcer: Please welcome Vice President of Worldwide Marketing, Amazon Web Services, Ariel Kelman. (applause) (techno music) >> Good afternoon, everyone. Thank you for coming. I hope you guys are having a great day here. It is my pleasure to introduce to come up on stage here, the CEO of Amazon Web Services, Andy Jassy. (applause) (techno music) >> Okay. Let's get started. I have a bunch of questions here for you, Andy. >> Just like one of our meetings, Ariel. >> Just like one of our meetings. So, I thought I'd start with a little bit of a state of the state on AWS. Can you give us your quick take? >> Yeah, well, first of all, thank you, everyone, for being here. We really appreciate it. We know how busy you guys are. So, hope you're having a good day. You know, the business is growing really quickly. In the last financials, we released, in Q four of '16, AWS is a 14 billion dollar revenue run rate business, growing 47% year over year. We have millions of active customers, and we consider an active customer as a non-Amazon entity that's used the platform in the last 30 days. And it's really a very broad, diverse customer set, in every imaginable size of customer and every imaginable vertical business segment. And I won't repeat all the customers that I know Werner went through earlier in the keynote, but here are just some of the more recent ones that you've seen, you know NELL is moving their their digital and their connected devices, meters, real estate to AWS. McDonalds is re-inventing their digital platform on top of AWS. FINRA is moving all in to AWS, yeah. You see at Reinvent, Workday announced AWS was its preferred cloud provider, and to start building on top of AWS further. Today, in press releases, you saw both Dunkin Donuts and Here, the geo-spatial map company announced they'd chosen AWS as their provider. You know and then I think if you look at our business, we have a really large non-US or global customer base and business that continues to expand very dramatically. And we're also aggressively increasing the number of geographic regions in which we have infrastructure. So last year in 2016, on top of the broad footprint we had, we added Korea, India, and Canada, and the UK. We've announced that we have regions coming, another one in China, in Ningxia, as well as in France, as well as in Sweden. So we're not close to being done expanding geographically. And then of course, we continue to iterate and innovate really quickly on behalf of all of you, of our customers. I mean, just last year alone, we launched what we considered over 1,000 significant services and features. So on average, our customers wake up every day and have three new capabilities they can choose to use or not use, but at their disposal. You've seen it already this year, if you look at Chime, which is our new unified communication service. It makes meetings much easier to conduct, be productive with. You saw Connect, which is our new global call center routing service. If you look even today, you look at Redshift Spectrum, which makes it easy to query all your data, not just locally on disk in your data warehouse but across all of S3, or DAX, which puts a cash in front of DynamoDB, we use the same interface, or all the new features in our machine learning services. We're not close to being done delivering and iterating on your behalf. And I think if you look at that collection of things, it's part of why, as Gartner looks out at the infrastructure space, they estimate the AWS is several times the size business of the next 14 providers combined. It's a pretty significant market segment leadership position. >> You talked a lot about adopts in there, a lot of customers moving to AWS, migrating large numbers of workloads, some going all in on AWS. And with that as kind of backdrop, do you still see a role for hybrid as being something that's important for customers? >> Yeah, it's funny. The quick answer is yes. I think the, you know, if you think about a few years ago, a lot of the rage was this debate about private cloud versus what people call public cloud. And we don't really see that debate very often anymore. I think relatively few companies have had success with private clouds, and most are pretty substantially moving in the direction of building on top of clouds like AWS. But, while you increasingly see more and more companies every month announcing that they're going all in to the cloud, we will see most enterprises operate in some form of hybrid mode for the next number of years. And I think in the early days of AWS and the cloud, I think people got confused about this, where they thought that they had to make this binary decision to either be all in on the public cloud and AWS or not at all. And of course that's not the case. It's not a binary decision. And what we know many of our enterprise customers want is they want to be able to run the data centers that they're not ready to retire yet as seamlessly as they can alongside of AWS. And it's why we've built a lot of the capabilities we've built the last several years. These are things like PPC, which is our virtual private cloud, which allows you to cordon off a portion of our network, deploy resources into it and connect to it through VPN or Direct Connect, which is a private connection between your data centers and our regions or our storage gateway, which is a virtual storage appliance, or Identity Federation, or a whole bunch of capabilities like that. But what we've seen, even though the vast majority of the big hybrid implementations today are built on top of AWS, as more and more of the mainstream enterprises are now at the point where they're really building substantial cloud adoption plans, they've come back to us and they've said, well, you know, actually you guys have made us make kind of a binary decision. And that's because the vast majority of the world is virtualized on top of VMWare. And because VMWare and AWS, prior to a few months ago, had really done nothing to try and make it easy to use the VMWare tools that people have been using for many years seamlessly with AWS, customers were having to make a binary choice. Either they stick with the VMWare tools they've used for a while but have a really tough time integrating with AWS, or they move to AWS and they have to leave behind the VMWare tools they've been using. And it really was the impetus for VMWare and AWS to have a number of deep conversations about it, which led to the announcement we made late last fall of VMWare and AWS, which is going to allow customers who have been using the VMWare tools to manage their infrastructure for a long time to seamlessly be able to run those on top of AWS. And they get to do so as they move workloads back and forth and they evolve their hybrid implementation without having to buy any new hardware, which is a big deal for companies. Very few companies are looking to find ways to buy more hardware these days. And customers have been very excited about this prospect. We've announced that it's going to be ready in the middle of this year. You see companies like Amadeus and Merck and Western Digital and the state of Louisiana, a number of others, we've a very large, private beta and preview happening right now. And people are pretty excited about that prospect. So we will allow customers to run in the mode that they want to run, and I think you'll see a huge transition over the next five to 10 years. >> So in addition to hybrid, another question we get a lot from enterprises around the concept of lock-in and how they should think about their relationship with the vendor and how they should think about whether to spread the workloads across multiple infrastructure providers. How do you think about that? >> Well, it's a question we get a lot. And Oracle has sure made people care about that issue. You know, I think people are very sensitive about being locked in, given the experience that they've had over the last 10 to 15 years. And I think the reality is when you look at the cloud, it really is nothing like being locked into something like Oracle. The APIs look pretty similar between the various providers. We build an open standard, it's like Linux and MySQL and Postgres. All the migration tools that we build allow you to migrate in or out of AWS. It's up to customers based on how they want to run their workload. So it is much easier to move away from something like the cloud than it is from some of the old software services that has created some of this phobia. But I think when you look at most CIOs, enterprise CIOs particularly, as they think about moving to the cloud, many of them started off thinking that they, you know, very well might split their workloads across multiple cloud providers. And I think when push comes to shove, very few decide to do so. Most predominately pick an infrastructure provider to run their workloads. And the reason that they don't split it across, you know, pretty evenly across clouds is a few reasons. Number one, if you do so, you have to standardize in the lowest common denominator. And these platforms are in radically different stages at this point. And if you look at something like AWS, it has a lot more functionality than anybody else by a large margin. And we're also iterating more quickly than you'll find from the other providers. And most folks don't want to tie the hands of their developers behind their backs in the name of having the ability of splitting it across multiple clouds, cause they actually are, in most of their spaces, competitive, and they have a lot of ideas that they want to actually build and invent on behalf of their customers. So, you know, they don't want to actually limit their functionality. It turns out the second reason is that they don't want to force their development teams to have to learn multiple platforms. And most development teams, if any of you have managed multiple stacks across different technologies, and many of us have had that experience, it's a pain in the butt. And trying to make a shift from what you've been doing for the last 30 years on premises to the cloud is hard enough. But then forcing teams to have to get good at running across two or three platforms is something most teams don't relish, and it's wasteful of people's time, it's wasteful of natural resources. That's the second thing. And then the third reason is that you effectively diminish your buying power because all of these cloud providers have volume discounts, and then you're splitting what you buy across multiple providers, which gives you a lower amount you buy from everybody at a worse price. So when most CIOs and enterprises look at this carefully, they don't actually end up splitting it relatively evenly. They predominately pick a cloud provider. Some will just pick one. Others will pick one and then do a little bit with a second, just so they know they can run with a second provider, in case that relationship with the one they choose to predominately run with goes sideways in some fashion. But when you really look at it, CIOs are not making that decision to split it up relatively evenly because it makes their development teams much less capable and much less agile. >> Okay, let's shift gears a little bit, talk about a subject that's on the minds of not just enterprises but startups and government organizations and pretty much every organization we talk to. And that's AI and machine learning. Reinvent, we introduced our Amazon AI services and just this morning Werner announced the general availability of Amazon Lex. So where are we overall on machine learning? >> Well it's a hugely exciting opportunity for customers, and I think, we believe it's exciting for us as well. And it's still in the relatively early stages, if you look at how people are using it, but it's something that we passionately believe is going to make a huge difference in the world and a huge difference with customers, and that we're investing a pretty gigantic amount of resource and capability for our customers. And I think the way that we think about, at a high level, the machine learning and deep learning spaces are, you know, there's kind of three macro layers of the stack. I think at that bottom layer, it's generally for the expert machine learning practitioners, of which there are relatively few in the world. It's a scarce resource relative to what I think will be the case in five, 10 years from now. And these are folks who are comfortable working with deep learning engines, know how to build models, know how to tune those models, know how to do inference, know how to get that data from the models into production apps. And for that group of people, if you look at the vast majority of machine learning and deep learning that's being done in the cloud today, it's being done on top of AWS, are P2 instances, which are optimized for deep learning and our deep learning AMIs, that package, effectively the deep learning engines and libraries inside those AMIs. And you see companies like Netflix, Nvidia, and Pinterest and Stanford and a whole bunch of others that are doing significant amounts of machine learning on top of those optimized instances for machine learning and the deep learning AMIs. And I think that you can expect, over time, that we'll continue to build additional capabilities and tools for those expert practitioners. I think we will support and do support every single one of the deep learning engines on top of AWS, and we have a significant amount of those workloads with all those engines running on top of AWS today. We also are making, I would say, a disproportionate investment of our own resources and the MXNet community just because if you look at running deep learning models once you get beyond a few GPUs, it's pretty difficult to have those scale as you get into the hundreds of GPUs. And most of the deep learning engines don't scale very well horizontally. And so what we've found through a lot of extensive testing, cause remember, Amazon has thousands of deep learning experts inside the company that have built very sophisticated deep learning capabilities, like the ones you see in Alexa, we have found that MXNet scales the best and almost linearly, as we continue to add nodes, as we continue to horizontally scale. So we have a lot of investment at that bottom layer of the stack. Now, if you think about most companies with developers, it's still largely inaccessible to them to do the type of machine learning and deep learning that they'd really like to do. And that's because the tools, I think, are still too primitive. And there's a number of services out there, we built one ourselves in Amazon Machine Learning that we have a lot of customers use, and yet I would argue that all of those services, including our own, are still more difficult than they should be for everyday developers to be able to build machine learning and access machine learning and deep learning. And if you look at the history of what AWS has done, in every part of our business, and a lot of what's driven us, is trying to democratize technologies that were really only available and accessible before to a select, small number of companies. And so we're doing a lot of work at what I would call that middle layer of the stack to get rid of a lot of the muck associated with having to do, you know, building the models, tuning the models, doing the inference, figuring how to get the data into production apps, a lot of those capabilities at that middle layer that we think are really essential to allow deep learning and machine learning to reach its full potential. And then at the top layer of the stack, we think of those as solutions. And those are things like, pass me an image and I'll tell you what that image is, or show me this face, does it match faces in this group of faces, or pass me a string of text and I'll give you an mpg file, or give me some words and what your intent is and then I'll be able to return answers that allow people to build conversational apps like the Lex technology. And we have a whole bunch of other services coming in that area, atop of Lex and Polly and Recognition, and you can imagine some of those that we've had to use in Amazon over the years that we'll continue to make available for you, our customers. So very significant level of investment at all three layers of that stack. We think it's relatively early days in the space but have a lot of passion and excitement for that. >> Okay, now for ML and AI, we're seeing customers wanting to load in tons of data, both to train the models and to actually process data once they've built their models. And then outside of ML and AI, we're seeing just as much demand to move in data for analytics and traditional workloads. So as people are looking to move more and more data to the cloud, how are we thinking about making it easier to get data in? >> It's a great question. And I think it's actually an often overlooked question because a lot of what gets attention with customers is all the really interesting services that allow you to do everything from compute and storage and database and messaging and analytics and machine learning and AI. But at the end of the day, if you have a significant amount of data already somewhere else, you have to get it into the cloud to be able to take advantage of all these capabilities that you don't have on premises. And so we have spent a disproportionate amount of focus over the last few years trying to build capabilities for our customers to make this easier. And we have a set of capabilities that really is not close to matched anywhere else, in part because we have so many customers who are asking for help in this area that it's, you know, that's really what drives what we build. So of course, you could use the good old-fashioned wire to send data over the internet. Increasingly, we find customers that are trying to move large amounts of data into S3, is using our S3 transfer acceleration service, which basically uses our points of presence, or POPs, all over the world to expedite delivery into S3. You know, a few years ago, we were talking to a number of companies that were looking to make big shifts to the cloud, and they said, well, I need to move lots of data that just isn't viable for me to move it over the wire, given the connection we can assign to it. It's why we built Snowball. And so we launched Snowball a couple years ago, which is really, it's a 50 terabyte appliance that is encrypted, the data's encrypted three different ways, and you ingest the data from your data center into Snowball, it has a Kindle connected to it, it allows you to, you know, that makes sure that you send it to the right place, and you can also track the progress of your high-speed ingestion into our data centers. And when we first launched Snowball, we launched it at Reinvent a couple years ago, I could not believe that we were going to order as many Snowballs to start with as the team wanted to order. And in fact, I reproached the team and I said, this is way too much, why don't we first see if people actually use any of these Snowballs. And so the team thankfully didn't listen very carefully to that, and they really only pared back a little bit. And then it turned out that we, almost from the get-go, had ordered 10X too few. And so this has been something that people have used in a very broad, pervasive way all over the world. And last year, at the beginning of the year, as we were asking people what else they would like us to build in Snowball, customers told us a few things that were pretty interesting to us. First, one that wasn't that surprising was they said, well, it would be great if they were bigger, you know, if instead of 50 terabytes it was more data I could store on each device. Then they said, you know, one of the problems is when I load the data onto a Snowball and send it to you, I have to still keep my local copy on premises until it's ingested, cause I can't risk losing that data. So they said it would be great if you could find a way to provide clustering, so that I don't have to keep that copy on premises. That was pretty interesting. And then they said, you know, there's some of that data that I'd actually like to be loading synchronously to S3, and then, or some things back from S3 to that data that I may want to compare against. That was interesting, having that endpoint. And then they said, well, we'd really love it if there was some compute on those Snowballs so I can do analytics on some relatively short-term signals that I want to take action on right away. Those were really the pieces of feedback that informed Snowball Edge, which is the next version of Snowball that we launched, announced at Reinvent this past November. So it has, it's a hundred-terabyte appliance, still the same level of encryption, and it has clustering so that you don't have to keep that copy of the data local. It allows you to have an endpoint to S3 to synchronously load data back and forth, and then it has a compute inside of it. And so it allows customers to use these on premises. I'll give you a good example. GE is using these for their wind turbines. And they collect all kinds of data from those turbines, but there's certain short-term signals they want to do analytics on in as close to real time as they can, and take action on those. And so they use that compute to do the analytics and then when they fill up that Snowball Edge, they detach it and send it back to AWS to do broad-scale analytics in the cloud and then just start using an additional Snowball Edge to capture that short-term data and be able to do those analytics. So Snowball Edge is, you know, we just launched it a couple months ago, again, amazed at the type of response, how many customers are starting to deploy those all over the place. I think if you have exabytes of data that you need to move, it's not so easy. An exabyte of data, if you wanted to move from on premises to AWS, would require 10,000 Snowball Edges. Those customers don't want to really manage a fleet of 10,000 Snowball Edges if they don't have to. And so, we tried to figure out how to solve that problem, and it's why we launched Snowmobile back at Reinvent in November, which effectively, it's a hundred-petabyte container on a 45-foot trailer that we will take a truck and bring out to your facility. It comes with its own power and its own network fiber that we plug in to your data center. And if you want to move an exabyte of data over a 10 gigabit per second connection, it would take you 26 years. But using 10 Snowmobiles, it would take you six months. So really different level of scale. And you'd be surprised how many companies have exabytes of data at this point that they want to move to the cloud to get all those analytics and machine learning capabilities running on top of them. Then for streaming data, as we have more and more companies that are doing real-time analytics of streaming data, we have Kinesis, where we built something called the Kinesis Firehose that makes it really simple to stream all your real-time data. We have a storage gateway for companies that want to keep certain data hot, locally, and then asynchronously be loading the rest of their data to AWS to be able to use in different formats, should they need it as backup or should they choose to make a transition. So it's a very broad set of storage capabilities. And then of course, if you've moved a lot of data into the cloud or into anything, you realize that one of the hardest parts that people often leave to the end is ETL. And so we have announced an ETL service called Glue, which we announced at Reinvent, which is going to make it much easier to move your data, be able to find your data and map your data to different locations and do ETL, which of course is hugely important as you're moving large amounts. >> So we've talked a lot about moving things to the cloud, moving applications, moving data. But let's shift gears a little bit and talk about something not on the cloud, connected devices. >> Yeah. >> Where do they fit in and how do you think about edge? >> Well, you know, I've been working on AWS since the start of AWS, and we've been in the market for a little over 11 years at this point. And we have encountered, as I'm sure all of you have, many buzzwords. And of all the buzzwords that everybody has talked about, I think I can make a pretty strong argument that the one that has delivered fastest on its promise has been IOT and connected devices. Just amazing to me how much is happening at the edge today and how fast that's changing with device manufacturers. And I think that if you look out 10 years from now, when you talk about hybrid, I think most companies, majority on premise piece of hybrid will not be servers, it will be connected devices. There are going to be billions of devices all over the place, in your home, in your office, in factories, in oil fields, in agricultural fields, on ships, in cars, in planes, everywhere. You're going to have these assets that sit at the edge that companies are going to want to be able to collect data on, do analytics on, and then take action. And if you think about it, most of these devices, by their very nature, have relatively little CPU and have relatively little disk, which makes the cloud disproportionately important for them to supplement them. It's why you see most of the big, successful IOT applications today are using AWS to supplement them. Illumina has hooked up their genome sequencing to AWS to do analytics, or you can look at Major League Baseball Statcast is an IOT application built on top of AWS, or John Deer has over 200,000 telematically enabled tractors that are collecting real-time planting conditions and information that they're doing analytics on and sending it back to farmers so they can figure out where and how to optimally plant. Tata Motors manages their truck fleet this way. Phillips has their smart lighting project. I mean, there're innumerable amounts of these IOT applications built on top of AWS where the cloud is supplementing the device's capability. But when you think about these becoming more mission-critical applications for companies, there are going to be certain functions and certain conditions by which they're not going to want to connect back to the cloud. They're not going to want to take the time for that round trip. They're not going to have connectivity in some cases to be able to make a round trip to the cloud. And what they really want is customers really want the same capabilities they have on AWS, with AWS IOT, but on the devices themselves. And if you've ever tried to develop on these embedded devices, it's not for mere mortals. It's pretty delicate and it's pretty scary and there's a lot of archaic protocols associated with it, pretty tough to do it all and to do it without taking down your application. And so what we did was we built something called Greengrass, and we announced it at Reinvent. And Greengrass is really like a software module that you can effectively have inside your device. And it allows developers to write lambda functions, it's got lambda inside of it, and it allows customers to write lambda functions, some of which they want to run in the cloud, some of which they want to run on the device itself through Greengrass. So they have a common programming model to build those functions, to take the signals they see and take the actions they want to take against that, which is really going to help, I think, across all these IOT devices to be able to be much more flexible and allow the devices and the analytics and the actions you take to be much smarter, more intelligent. It's also why we built Snowball Edge. Snowball Edge, if you think about it, is really a purpose-built Greengrass device. We have Greengrass, it's inside of the Snowball Edge, and you know, the GE wind turbine example is a good example of that. And so it's to us, I think it's the future of what the on-premises piece of hybrid's going to be. I think there're going to be billions of devices all over the place and people are going to want to interact with them with a common programming model like they use in AWS and the cloud, and we're continuing to invest very significantly to make that easier and easier for companies. >> We've talked about several feature directions. We talked about AI, machine learning, the edge. What are some of the other areas of investment that this group should care about? >> Well there's a lot. (laughs) That's not a suit question, Ariel. But there's a lot. I think, I'll name a few. I think first of all, as I alluded to earlier, we are not close to being done expanding geographically. I think virtually every tier-one country will have an AWS region over time. I think many of the emerging countries will as well. I think the database space is an area that is radically changing. It's happening at a faster pace than I think people sometimes realize. And I think it's good news for all of you. I think the database space over the last few decades has been a lonely place for customers. I think that they have felt particularly locked into companies that are expensive and proprietary and have high degrees of lock-in and aren't so customer-friendly. And I think customers are sick of it. And we have a relational database service that we launched many years ago and has many flavors that you can run. You can run MySQL, you can run Postgres, you can run MariaDB, you can run SQLServer, you can run Oracle. And what a lot of our customers kept saying to us was, could you please figure out a way to have a database capability that has the performance characteristics of the commercial-grade databases but the customer-friendly and pricing model of the more open engines like the MySQL and Postgres and MariaDB. What you do on your own, we do a lot of it at Amazon, but it's hard, I mean, it takes a lot of work and a lot of tuning. And our customers really wanted us to solve that problem for them. And it's why we spent several years building Aurora, which is our own database engine that we built, but that's fully compatible with MySQL and with Postgres. It's at least as fault tolerant and durable and performant as the commercial-grade databases, but it's a tenth of the cost of those. And it's also nice because if it turns out that you use Aurora and you decide for whatever reason you don't want to use Aurora anymore, because it's fully compatible with MySQL and Postgres, you just dump it to the community versions of those, and off you are. So there's really hardly any transition there. So that is the fastest-growing service in the history of AWS. I'm amazed at how quickly it's grown. I think you may have heard earlier, we've had 23,000 database migrations just in the last year or so. There's a lot of pent-up demand to have database freedom. And we're here to help you have it. You know, I think on the analytic side, it's just never been easier and less expensive to collect, store, analyze, and share data than it is today. Part of that has to do with the economics of the cloud. But a lot of it has to do with the really broad analytics capability that we provide you. And it's a much broader capability than you'll find elsewhere. And you know, you can manage Hadoop and Spark and Presto and Hive and Pig and Yarn on top of AWS, or we have a managed elastic search service, and you know, of course we have a very high scale, very high performing data warehouse in Redshift, that just got even more performant with Spectrum, which now can query across all of your S3 data, and of course you have Athena, where you can query S3 directly. We have a service that allows you to do real-time analytics of streaming data in Kinesis. We have a business intelligence service in QuickSight. We have a number of machine learning capabilities I talked about earlier. It's a very broad array. And what we find is that it's a new day in analytics for companies. A lot of the data that companies felt like they had to throw away before, either because it was too expensive to hold or they didn't really have the tools accessible to them to get the learning from that data, it's a totally different day today. And so we have a pretty big investment in that space, I mentioned Glue earlier to do ETL on all that data. We have a lot more coming in that space. I think compute, super interesting, you know, I think you will find, I think we will find that companies will use full instances for many, many years and we have, you know, more than double the number of instances than you'll find elsewhere in every imaginable shape and size. But I would also say that the trend we see is that more and more companies are using smaller units of compute, and it's why you see containers becoming so popular. We have a really big business in ECS. And we will continue to build out the capability there. We have companies really running virtually every type of container and orchestration and management service on top of AWS at this point. And then of course, a couple years ago, we pioneered the event-driven serverless capability in compute that we call Lambda, which I'm just again, blown away by how many customers are using that for everything, in every way. So I think the basic unit of compute is continuing to get smaller. I think that's really good for customers. I think the ability to be serverless is a very exciting proposition that we're continuing to to fulfill that vision that we laid out a couple years ago. And then, probably, the last thing I'd point out right now is, I think it's really interesting to see how the basic procurement of software is changing. In significant part driven by what we've been doing with our Marketplace. If you think about it, in the old world, if you were a company that was buying software, you'd have to go find bunch of the companies that you should consider, you'd have to have a lot of conversations, you'd have to talk to a lot of salespeople. Those companies, by the way, have to have a big sales team, an expensive marketing budget to go find those companies and then go sell those companies and then both companies engage in this long tap-dance around doing an agreement and the legal terms and the legal teams and it's just, the process is very arduous. Then after you buy it, you have to figure out how you're going to actually package it, how you're deploy to infrastructure and get it done, and it's just, I think in general, both consumers of software and sellers of software really don't like the process that's existed over the last few decades. And then you look at AWS Marketplace, and we have 35 hundred product listings in there from 12 hundred technology providers. If you look at the number of hours, that software that's been running EC2 just in the last month alone, it's several hundred million hours, EC2 hours, of that software being run on top of our Marketplace. And it's just completely changing how software is bought and procured. I think that if you talk to a lot of the big sellers of software, like Splunk or Trend Micro, there's a whole number of them, they'll tell you it totally changes their ability to be able to sell. You know, one of the things that really helped AWS in the early days and still continues to help us, is that we have a self-service model where we don't actually have to have a lot of people talk to every customer to get started. I think if you're a seller of software, that's very appealing, to allow people to find your software and be able to buy it. And if you're a consumer, to be able to buy it quickly, again, without the hassle of all those conversations and the overhead associated with that, very appealing. And I think it's why the marketplace has just exploded and taken off like it has. It's also really good, by the way, for systems integrators, who are often packaging things on top of that software to their clients. This makes it much easier to build kind of smaller catalogs of software products for their customers. I think when you layer on top of that the capabilities that we've announced to make it easier for SASS providers to meter and to do billing and to do identity is just, it's a very different world. And so I think that also is very exciting, both for companies and customers as well as software providers. >> We certainly touched on a lot here. And we have a lot going on, and you know, while we have customers asking us a lot about how they can use all these new services and new features, we also tend to get a lot of questions from customers on how we innovate so quickly, and they can think about applying some of those lessons learned to their own businesses. >> So you're asking how we're able to innovate quickly? >> Mmm hmm. >> I think there's a few things that have helped us, and it's different for every company. But some of these might be helpful. I'll point to a few. I think the first thing is, I think we disproportionately index on hiring builders. And we think of builders as people who are inventors, people who look at different customer experiences really critically, are honest about what's flawed about them, and then seek to reinvent them. And then people who understand that launch is the starting line and not the finish line. There's very little that any of us ever built that's a home run right out of the gate. And so most things that succeed take a lot of listening to customers and a lot of experimentation and a lot of iterating before you get to an equation that really works. So the first thing is who we hire. I think the second thing is how we organize. And we have, at Amazon, long tried to organize into as small and separable and autonomous teams as we can, that have all the resources in those teams to own their own destiny. And so for instance, the technologists and the product managers are part of the same team. And a lot of that is because we don't want the finger pointing that goes back and forth between the teams, and if they're on the same team, they focus all their energy on owning it together and understanding what customers need from them, spending a disproportionate amount of time with customers, and then they get to own their own roadmaps. One of the reasons we don't publish a 12 to 18 month roadmap is we want those teams to have the freedom, in talking to customers and listening to what you tell us matters, to re-prioritize if there are certain things that we assumed mattered more than it turns out it does. So, you know I think that the way that we organize is the second piece. I think a third piece is all of our teams get to use the same AWS building blocks that all of you get to use, which allow you to move much more quickly. And I think one of the least told stories about Amazon over the last five years, in part because people have gotten interested in AWS, is people have missed how fast our consumer business at Amazon has iterated. Look at the amount of invention in Amazon's consumer business. And they'll tell you that a big piece of that is their ability to use the AWS building blocks like they do. I think a fourth thing is many big companies, as they get larger, what starts to happen is what people call the institutional no, which is that leaders walk into meetings on new ideas looking to find ways to say no, and not because they're ill intended but just because they get more conservative or they have a lot on their plate or things are really managed very centrally, so it's hard to imagine adding more to what you're already doing. At Amazon, it's really the opposite, and in part because of the way we're organized in such a decoupled, decentralized fashion, and in part because it's just part of our DNA. When the leaders walk into a meeting, they are looking for ways to say yes. And we don't say yes to everything, we have a lot of proposals. But we say yes to a lot more than I think virtually any other company on the planet. And when we're having conversations with builders who are proposing new ideas, we're in a mode where we're trying to problem-solve with them to get to yes, which I think is really different. And then I think the last thing is that we have mechanisms inside the company that allow us to make fast decisions. And if you want a little bit more detail, you should read our founder and CEO Jeff Bezos's shareholder letter, which just was released. He talks about the fast decision-making that happens inside the company. It's really true. We make fast decisions and we're willing to fail. And you know, we sometimes talk about how we're working on several of our next biggest failures, and we hope that most of the things we're doing aren't going to fail, but we know, if you're going to push the envelope and if you're going to experiment at the rate that we're trying to experiment, to find more pillars that allow us to do more for customers and allow us to be more relevant, you are going to fail sometimes. And you have to accept that, and you have to have a way of evaluating people that recognizes the inputs, meaning the things that they actually delivered as opposed to the outputs, cause on new ventures, you don't know what the outputs are going to be, you don't know consumers or customers are going to respond to the new thing you're trying to build. So you have to be able to reward employees on the inputs, you have to have a way for them to continue to progress and grow in their career even if they work on something didn't work. And you have to have a way of thinking about, when things don't work, how do I take the technology that I built as part of that, that really actually does work, but I didn't get it right in the form factor, and use it for other things. And I think that when you think about a culture like Amazon, that disproportionately hires builders, organizes into these separable, autonomous teams, and allows them to use building blocks to move fast, and has a leadership team that's looking to say yes to ideas and is willing to fail, you end up finding not only do you do more inventing but you get the people at every level of the organization spending their free cycles thinking about new ideas because it actually pays to think of new ideas cause you get a shot to try it. And so that has really helped us and I think most of our customers who have made significant shifts to AWS and the cloud would argue that that's one of the big transformational things they've seen in their companies as well. >> Okay. I want to go a little bit deeper on the subject of culture. What are some of the things that are most unique about the AWS culture that companies should know about when they're looking to partner with us? >> Well, I think if you're making a decision on a predominant infrastructure provider, it's really important that you decide that the culture of the company you're going to partner with is a fit for yours. And you know, it's a super important decision that you don't want to have to redo multiple times cause it's wasted effort. And I think that, look, I've been at Amazon for almost 20 years at this point, so I have obviously drank the Kool Aid. But there are a few things that I think are truly unique about Amazon's culture. I'll talk about three of them. The first is I think that we are unusually customer-oriented. And I think a lot of companies talk about being customer-oriented, but few actually are. I think most of the big technology companies truthfully are competitor-focused. They kind of look at what competitors are doing and then they try to one-up one another. You have one or two of them that I would say are product-focused, where they say, hey, it's great, you Mr. and Mrs. Customer have ideas on a product, but leave that to the experts, and you know, you'll like the products we're going to build. And those strategies can be good ones and successful ones, they're just not ours. We are driven by what customers tell us matters to them. We don't build technology for technology's sake, we don't become, you know, smitten by any one technology. We're trying to solve real problems for our customers. 90% of what we build is driven by what you tell us matters. And the other 10% is listening to you, and even if you can't articulate exactly what you want, trying to read between the lines and invent on your behalf. So that's the first thing. Second thing is that we are pioneers. We really like to invent, as I was talking about earlier. And I think most big technology companies at this point have either lost their will or their DNA to invent. Most of them acquire it or fast follow. And again, that can be a successful strategy. It's just not ours. I think in this day and age, where we're going through as big a shift as we are in the cloud, which is the biggest technology shift in our lifetime, as dynamic as it is, being able to partner with a company that has the most functionality, it's iterating the fastest, has the most customers, has the largest ecosystem of partners, has SIs and ISPs, that has had a vision for how all these pieces fit together from the start, instead of trying to patch them together in a following act, you have a big advantage. I think that the third thing is that we're unusually long-term oriented. And I think that you won't ever see us show up at your door the last day of a quarter, the last day of a year, trying to harass you into doing some kind of deal with us, not to be heard from again for a couple years when we either audit you or try to re-up you for a deal. That's just not the way that we will ever operate. We are trying to build a business, a set of relationships, that will outlast all of us here. And I think something that always ties it together well is this trusted advisor capability that we have inside our support function, which is, you know, we look at dozens of programmatic ways that our customers are using the platform and reach out to you if you're doing something we think's suboptimal. And one of the things we do is if you're not fully utilizing resources, or hardly, or not using them at all, we'll reach out and say, hey, you should stop paying for this. And over the last couple of years, we've sent out a couple million of these notifications that have led to actual annualized savings for customers of 350 million dollars. So I ask you, how many of your technology partners reach out to you and say stop spending money with us? To the tune of 350 million dollars lost revenue per year. Not too many. And I think when we first started doing it, people though it was gimmicky, but if you understand what I just talked about with regard to our culture, it makes perfect sense. We don't want to make money from customers unless you're getting value. We want to reinvent an experience that we think has been broken for the prior few decades. And then we're trying to build a relationship with you that outlasts all of us, and we think the best way to do that is to provide value and do right by customers over a long period of time. >> Okay, keeping going on the culture subject, what about some of the quirky things about Amazon's culture that people might find interesting or useful? >> Well there are a lot of quirky parts to our culture. And I think any, you know lots of companies who have strong culture will argue they have quirky pieces but I think there's a few I might point to. You know, I think the first would be the first several years I was with the company, I guess the first six years or so I was at the company, like most companies, all the information that was presented was via PowerPoint. And we would find that it was a very inefficient way to consume information. You know, you were often shaded by the charisma of the presenter, sometimes you would overweight what the presenters said based on whether they were a good presenter. And vice versa. You would very rarely have a deep conversation, cause you have no room on PowerPoint slides to have any depth. You would interrupt the presenter constantly with questions that they hadn't really thought through cause they didn't think they were going to have to present that level of depth. You constantly have the, you know, you'd ask the question, oh, I'm going to get to that in five slides, you want to do that now or you want to do that in five slides, you know, it was just maddening. And we would often find that most of the meetings required multiple meetings. And so we made a decision as a company to effectively ban PowerPoints as a communication vehicle inside the company. Really the only time I do PowerPoints is at Reinvent. And maybe that shows. And what we found is that it's a much more substantive and effective and time-efficient way to have conversations because there is no way to fake depth in a six-page narrative. So what we went to from PowerPoint was six-page narrative. You can write, have as much as you want in the appendix, but you have to assume nobody will read the appendices. Everything you have to communicate has to be done in six pages. You can't fake depth in a six-page narrative. And so what we do is we all get to the room, we spend 20 minutes or so reading the document so it's fresh in everybody's head. And then where we start the conversation is a radically different spot than when you're hearing a presentation one kind of shallow slide at a time. We all start the conversation with a fair bit of depth on the topic, and we can really hone in on the three or four issues that typically matter in each of these conversations. So we get to the heart of the matter and we can have one meeting on the topic instead of three or four. So that has been really, I mean it's unusual and it takes some time getting used to but it is a much more effective way to pay attention to the detail and have a substantive conversation. You know, I think a second thing, if you look at our working backwards process, we don't write a lot of code for any of our services until we write and refine and decide we have crisp press release and frequently asked question, or FAQ, for that product. And in the press release, what we're trying to do is make sure that we're building a product that has benefits that will really matter. How many times have we all gotten to the end of products and by the time we get there, we kind of think about what we're launching and think, this is not that interesting. Like, people are not going to find this that compelling. And it's because you just haven't thought through and argued and debated and made sure that you drew the line in the right spot on a set of benefits that will really matter to customers. So that's why we use the press release. The FAQ is to really have the arguments up front about how you're building the product. So what technology are you using? What's the architecture? What's the customer experience? What's the UI look like? What's the pricing dimensions? Are you going to charge for it or not? All of those decisions, what are people going to be most excited about, what are people going to be most disappointed by. All those conversations, if you have them up front, even if it takes you a few times to go through it, you can just let the teams build, and you don't have to check in with them except on the dates. And so we find that if we take the time up front we not only get the products right more often but the teams also deliver much more quickly and with much less churn. And then the third thing I'd say that's kind of quirky is it is an unusually truth-seeking culture at Amazon. I think we have a leadership principle that we say have backbone, disagree, and commit. And what it means is that we really expect people to speak up if they believe that we're headed down a path that's wrong for customers, no matter who is advancing it, what level in the company, everybody is empowered and expected to speak up. And then once we have the debate, then we all have to pull the same way, even if it's a different way than you were advocating. And I think, you always hear the old adage of where, two people look at a ceiling and one person says it's 14 feet and the other person says, it's 10 feet, and they say, okay let's compromise, it's 12 feet. And of course, it's not 12 feet, there is an answer. And not all things that we all consider has that black and white answer, but most things have an answer that really is more right if you actually assess it and debate it. And so we have an environment that really empowers people to challenge one another and I think it's part of why we end up getting to better answers, cause we have that level of openness and rigor. >> Okay, well Andy, we have time for one more question. >> Okay. >> So other than some of the things you've talked about, like customer focus, innovation, and long-term orientation, what is the single most important lesson that you've learned that is really relevant to this audience and this time we're living in? >> There's a lot. But I'll pick one. I would say I'll tell a short story that I think captures it. In the early days at Amazon, our sole business was what we called an owned inventory retail business, which meant we bought the inventory from distributors or publishers or manufacturers, stored it in our own fulfillment centers and shipped it to customers. And around the year 1999 or 2000, this third party seller model started becoming very popular. You know, these were companies like Half.com and eBay and folks like that. And we had a really animated debate inside the company about whether we should allow third party sellers to sell on the Amazon site. And the concerns internally were, first of all, we just had this fundamental belief that other sellers weren't going to care as much about the customer experience as we did cause it was such a central part of everything we did DNA-wise. And then also we had this entire business and all this machinery that was built around owned inventory business, with all these relationships with publishers and distributors and manufacturers, who we didn't think would necessarily like third party sellers selling right alongside us having bought their products. And so we really debated this, and we ultimately decided that we were going to allow third party sellers to sell in our marketplace. And we made that decision in part because it was better for customers, it allowed them to have lower prices, so more price variety and better selection. But also in significant part because we realized you can't fight gravity. If something is going to happen, whether you want it to happen or not, it is going to happen. And you are much better off cannibalizing yourself or being ahead of whatever direction the world is headed than you are at howling at the wind or wishing it away or trying to put up blockers and find a way to delay moving to the model that is really most successful and has the most amount of benefits for the customers in question. And that turned out to be a really important lesson for Amazon as a company and for me, personally, as well. You know, in the early days of doing Marketplace, we had all kinds of folks, even after we made the decision, that despite the have backbone, disagree and commit weren't really sure that they believed that it was going to be a successful decision. And it took several months, but thankfully we really were vigilant about it, and today in roughly half of the units we sell in our retail business are third party seller units. Been really good for our customers. And really good for our business as well. And I think the same thing is really applicable to the space we're talking about today, to the cloud, as you think about this gigantic shift that's going on right now, moving to the cloud, which is, you know, I think in the early days of the cloud, the first, I'll call it six, seven, eight years, I think collectively we consumed so much energy with all these arguments about are people going to move to the cloud, what are they going to move to the cloud, will they move mission-critical applications to the cloud, will the enterprise adopt it, will public sector adopt it, what about private cloud, you know, we just consumed a huge amount of energy and it was, you can see both in the results in what's happening in businesses like ours, it was a form of fighting gravity. And today we don't really have if conversations anymore with our customers. They're all when and how and what order conversations. And I would say that this going to be a much better world for all of us, because we will be able to build in a much more cost effective fashion, we will be able to build much more quickly, we'll be able to take our scarce resource of engineers and not spend their resource on the undifferentiated heavy lifting of infrastructure and instead on what truly differentiates your business. And you'll have a global presence, so that you have lower latency and a better end user customer experience being deployed with your applications and infrastructure all over the world. And you'll be able to meet the data sovereignty requirements of various locales. So I think it's a great world that we're entering right now, I think we're at a time where there's a lot less confusion about where the world is headed, and I think it's an unprecedented opportunity for you to reinvent your businesses, reinvent your applications, and build capabilities for your customers and for your business that weren't easily possible before. And I hope you take advantage of it, and we'll be right here every step of the way to help you. Thank you very much. I appreciate it. (applause) >> Thank you, Andy. And thank you, everyone. I appreciate your time today. >> Thank you. (applause) (upbeat music)

Published Date : May 3 2017

SUMMARY :

of Worldwide Marketing, Amazon Web Services, Ariel Kelman. It is my pleasure to introduce to come up on stage here, I have a bunch of questions here for you, Andy. of a state of the state on AWS. And I think if you look at that collection of things, a lot of customers moving to AWS, And of course that's not the case. and how they should think about their relationship And I think the reality is when you look at the cloud, talk about a subject that's on the minds And I think that you can expect, over time, So as people are looking to move and it has clustering so that you don't and talk about something not on the cloud, And I think that if you look out 10 years from now, What are some of the other areas of investment and we have, you know, more than double and you know, while we have customers and listening to what you tell us matters, What are some of the things that are most unique And the other 10% is listening to you, And I think any, you know lots of companies moving to the cloud, which is, you know, And thank you, everyone. Thank you.

ENTITIES

Entity	Category	Confidence
Amadeus	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Western Digital	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
France	LOCATION	0.99+
Sweden	LOCATION	0.99+
Ningxia	LOCATION	0.99+
China	LOCATION	0.99+
Andy Jassy	PERSON	0.99+
Stanford	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Ariel Kelman	PERSON	0.99+
Jeff Bezos	PERSON	0.99+
two	QUANTITY	0.99+
three	QUANTITY	0.99+
2000	DATE	0.99+
Oracle	ORGANIZATION	0.99+
12	QUANTITY	0.99+
26 years	QUANTITY	0.99+
20 minutes	QUANTITY	0.99+
Ariel	PERSON	0.99+
two people	QUANTITY	0.99+
10 feet	QUANTITY	0.99+
six pages	QUANTITY	0.99+
90%	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
six-page	QUANTITY	0.99+
second piece	QUANTITY	0.99+
last year	DATE	0.99+
14 feet	QUANTITY	0.99+
six	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
47%	QUANTITY	0.99+
50 terabytes	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
12 feet	QUANTITY	0.99+
seven	QUANTITY	0.99+
five slides	QUANTITY	0.99+
Today	DATE	0.99+
four	QUANTITY	0.99+
one	QUANTITY	0.99+
10%	QUANTITY	0.99+
2016	DATE	0.99+
350 million dollars	QUANTITY	0.99+
10X	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
November	DATE	0.99+
US	LOCATION	0.99+
second reason	QUANTITY	0.99+
McDonalds	ORGANIZATION	0.99+

Joel Cumming, Kik - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts this is the Cube, covering Spark Summit East 2017 brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, where it's a blizzard outside and a blizzard of content coming to you from Spark Summit East, #SparkSummit. This is the Cube, the worldwide leader in live tech coverage. Joel Cumming is here. He's the head of data at Kik. Kicking butt at Kik. Welcome to the Cube. >> Thank you, thanks for having me. >> So tell us about Kik, this cool mobile chat app. Checked it out a little bit. >> Yeah, so Kik has been around since about 2010. We're, as you mentioned, a mobile chat app, start-up based in Waterloo, Ontario. Kik really took off, really 2010 when it got 2 million users in the first 22 days of its existence. So was insanely popular, specifically with U.S. youth, and the reason for that really is Kik started off in a time where chatting through text cost money. Text messages cost money back in 2010, and really not every kid has a phone like they do today. So if you had an iPod or an iPad all you needed to do was sign up, and you had a user name and now you could text with your friends, so kids could do that just like their parents could with Kik, and that's really where we got our entrenchment with U.S. youth. >> And you're the head of data. So talk a little bit about your background. What does that mean to be a head of data? >> Yes, so prior to working at Kik I worked at Blackberry, and I like to say I worked at Blackberry probably around the time just before you bought your first Blackberry and I left just after you bought your first iPhone. So kind of in that range, but was there for nine years. >> Vellante: Can you do that with real estate? >> Yeah, I'd love to be able to do that with real estate. But it was a great time at Blackberry. It was very exciting to be part of that growth. When I was there, we grew from three million to 80 million customers, from three thousand employees to 17 thousand employees, and of course, things went sideways for Blackberry, but conveniently at the end Blackberry was working in BBM, and leading a team of data scientists and data engineers there. And BBM if you're not familiar with it is a chat app as well, and across town is where Kik is headquartered. The appeal to me of moving to Kik was a company that was very small and fast moving, but they actually weren't leveraging data at all. So when I got there, they had a pile of logs sitting in S3, waiting for someone to take advantage of them. They were good at measuring events, and looking at those events and how they tracked over time, but not really combining them to understand or personalize any experience for their end customers. >> So they knew enough to keep the data. >> They knew enough to keep the data. >> They just weren't sure what to do with it. Okay so, you come in, and where did you start? >> So the first day that I started that was the first day I used any AWS product, so I had worked on the big data tools at the old place, with Hadoop and Pig and Hive and Oracle and those kinds of things, but had never used an AWS product until I got there and it was very much sink or swim and on my first day our CEO in the meeting said, "Okay, you're data guy here now. "I want you to tell me in a week why people leave Kik." And I'm like, man we don't even have a database yet. The first thing I did was I fired up a Redshift cluster. First time I had done that, looked at the tools that were available in AWS to transform the data using EMR and Pig and those kinds of things, and was lucky enough, fortunate enough that they could figure that out in a week and I didn't give him the full answer of why people left, but I was able to give him some ideas of places we could go based on some preliminary exploration. So I went from leading this team of about 40 people to being a team of one and writing all the code myself. Super exciting, not the experience that everybody wants, but for me it was a lot of fun. Over the last three years have built up the team. Now we have three data engineers and three data scientists and indeed it's a lot more important to people every day at Kik. >> What sort of impact has your team had on the product itself and the customer experience? >> So the beginning it was really just trying to understand the behaviors of people across Kik, and that took a while to really wrap our heads around, and any good data analysis combines behaviors that you have to ask people their opinion on and also behaviors that we see them do. So I had an old boss that used to work at Rogers, which is a telecomm provider in Canada, and he said if you ask people the things that they watch they tell you documentaries and the news and very important stuff, but if you see what they actually watch it's reality TV and trashy shows, and so the truth is really somewhere in the middle. There's an aspirational element. So for us really understanding the data we already had, instrumenting new events, and then in the last year and a half, building out an A/B testing framework is something that's been instrumental in how we leverage data at Kik. So we were making decisions by gut feel in the very beginning, then we moved into this era where we were doing A/B testing and very focused on statistical significance, and rigor around all of our experiments, but then stepping back and realizing maybe the bets that we have aren't big enough. So we need to maybe bet a little bit more on some bigger features that have the opportunity to move the needle. So we've been doing that recently with a few features that we've released, but data is super important now, both to stimulate creativity of our product managers as well as to measure the success of those features. >> And how do you map to the product managers who are defining the new features? Are you a central group? Are you sort of point guards within the different product groups? How does that, your evidence-based decisions or recommendations but they make ultimately, presumably, the decisions. What's the dynamic? >> So it's a great question. In my experience, it's very difficult to build a structure that's perfect. So in the purely centralized model you've got this problem of people are coming to you to ask for something, and they may get turned away because you're too busy, and then in the decentralized model you tend to have lots of duplication and overlap and maybe not sharing all the things that you need to share. So we tried to build a hybrid of both. And so we had our data engineers centralized and we tried doing what we called tours of duty, so our data scientists would be embedded with various teams within the company so it could be, it could be the core messenger team. It could be our bot platform team. It could be our anti-spam team. And they would sit with them and it's very easy for product managers and developers to ask them questions and for them to give out answers, and then we would rotate those folks through a different tour of duty after a few months and they would sit with another team. So we did that for a while, and it worked pretty well, but one of the major things we found was a problem was there's no good checkpoint to confirm that what they're doing is right. So in software development you're releasing a version of software. There's QA, there's code review and there's structure in place to ensure that yes, this number I'm providing is right. It's difficult when you've got a data scientist who's out with a team for him to come back to the team and get that peer review. So now we're kind of reevaluating that. We use an agile approach, but we have primes for each of these groups but now we all sit together. >> So the accountability is after the data scientist made a recommendation that the product manager agrees with, how do you ensure that it measured up to the expectation? Like sort of after the fact. >> Yeah, so in those cases our A/B tests are it's nice to have that unbiased data resource on the team that's embedded with them that can step back and say yes, this idea worked, or it didn't work. So that's the approach that we're taking. It's not a dedicated resource, but a prime resource for each of these teams that's a subject matter expert and then is evaluating the results in an unbiased kind of way. >> So you've got this relatively small, even though it's quadruple the size when you started, data team and then application development team as sort of colleagues or how do you interact with them? >> Yeah, we're actually part of the engineering organization at Kik, part of R and D, and in different times in my life I've been part of different organizations whether it's marketing or whether it's I.T. or whether it's R and D, and R and D really fits nicely. And the reason why I think it's the best is because if there's data that you need to understand users more there's much more direct control over getting that element instrumented within a product that you have when you're part of R and D. If you're in marketing, you're like hey, I'd love to know how many times people tap on that red button, but no event fires when that red button is tapped. Good luck trying to get the software developers to put that in. But when there's an inherent component of R and D that's dependent on data, and data has that direct path to those developers, getting that kind of thing done is much easier. >> So from a tooling standpoint, thinking about data scientists and data engineers, a lot of the tools that we've seen in this so-called big data world have been quite spoke. Different interfaces, different experience. How are you addressing that? Does Spark help with that? Maybe talk about that a bit more. >> Yeah, so I was fortunate enough to do a session today that sort of talked about data V1 at Kik versus data V2 at Kik, and we drew this kind of a line in the sand. So when I started it was just me. I'm trying to answer these questions very quickly on these three or five day timelines that we get from our CEO. >> Vallente: You've been here a week, come on! >> Yeah exactly, so you sacrifice data engineering and architecture when you're living like that. So you can answer questions very quickly. It worked well for a while, but then all of a sudden we come up and we have 300 data pipelines. They're a mess. They're hard to manage and control. We've got code sometimes in Sequel or sometimes in Python scripts, or sometimes on people's laptops. We have no real plan for Getup integration. And then you know real scalability out of Redshift. We were doing a lot of our workloads in Redshift to do transformations just because, get the data into Redshift, write some Sequel and then have your results. We're running into contention problems with that. So what we decided to do is sort of stop, step back and say, okay so how are we going to house all of this atomic data that we have in a way that's efficient. So we started with Redshift, our database was 10 terabytes. Now it's 100, except for we get five terabytes of data per day that's new coming in, so putting that all in Redshift, it doesn't make sense. It's not all that useful. So if we cull that data under supervision, we don't want to get rid of the atomic data, how do we control that data under supervision. So we decided to go the data lake route, even though we hate the term data lake, but basically a folder structure within S3 that's stored in a query optimized format like Parquet, and now we can access that data very quickly at an atomic level, at a cleansed level and also an at aggregate level. So for us, this data V2 was the evolution of stopping doing a lot of things the way we used to do, which was lots of data pipelines, kind of code that was all over the place, and then aggregations in Redshift, and starting to use Spark, specifically Databricks. Databricks we think of in two ways. One is kind of managed Spark, so that we don't have to do all the configuration that we used to have to do with EMR, and then the second is notebooks that we can align with all the work that we're doing and have revision control and Getup integration as well. >> A question to clarify, when you've put the data lake, which is the file system and then the data in Parquet format, or Parquet files, so this is where you want to have some sort of interactive experience for business intelligence. Do you need some sort of MPP server on top of that to provide interactive performance, or, because I know a lot customers are struggling at that point where they got all the data there, and it's kind of organized, but then if they really want to munge through that huge volume they find it slows to lower than a crawl. >> Yeah, it's a great point. And we're at the stage right now where our data lake at the top layer of our data lake where we aggregate and normalize, we also push that data into Redshift. So Redshift what we're trying to do with that is make it a read-only environment, so that our analysts and developers, so they know they have consistent read performance on Redshift, where before when it's a mix of batch jobs as well as read workload, they didn't have that guarantee. So you're right, and we think what will probably happen over the next year or so is the advancements in Spark will make it much more capable as a data warehousing product, and then you'd have to start a question do I need both Redshift and Spark for that kind of thing? But today I think some of the cost-based optimizations that are coming, at least the promise of them coming I would hope that those would help Spark becoming more of a data warehouse, but we'll have to see. >> So carry that thread a little further through. I mean in terms of things that you'd like to see in the Spark roadmap, things that could be improved. What's your feedback to Databricks? >> We're fortunate, we work with them pretty closely. We've been a customer for about half a year, and they've been outstanding working with us. So structured streaming is a great example of something we worked pretty closely with on. We're really excited about. We don't have, you know we have certain pockets within our company that require very real-time data, so obviously your operational components. Are your servers up or down, as well as our anti-spam team. They require very low latency access to data. We haven't typically, if we batch every hour that's fine in most cases, but structured streaming when our data streams are coming in now through Kinesis Firehose, and we can process those without have to worry about checking to see if it's time we should start this or is all the data there so we can run this batch. Structured streaming solves a lot of those, it simplifies a lot of that workload for us. So that's something we've been working with them on. The other things that we're really interested in. We've got a bit of list, but the other major ones are how do you start to leverage this data to use it for personalization back in the app? So today we think of data in two ways at Kik. It's data as KPIs, so it's like the things you need to run your business, maybe it's A/B testing results, maybe it's how many active users you had yesterday, that kind of thing. And then the second is data as a product, and how do you provide personalization at an individual level based on your data sciences models back out to the app. So we do that, I should point out at Kik we don't see anybody's messages. We don't read your messages. We don't have access to those. But we have the metadata around the transactions that you have, like most companies do. So that helps us improve our products and services under our privacy policy to say okay, who's building good relationships and who's leaving the platform and why are they doing it. But we can also service components that are useful for personalization, so if you've chatted with three different bots on our platform that's important for us to know if we want to recommend another bot to you. Or you know the classic people people you may know recommendations. We don't do that right now, but behind the scenes we have the kind of information that we could help personalize that experience for you. So those two things are very different. In a lot of companies there's an R and D element, like at Blackberry, the app world recommendation engine was something that there was a team that ran in production but our team was helping those guys tweak and tune their models. So it's the same kind of thing at Kik where we can build, our data scientist are building models for personalization, and then we need to service them back up to the rest of the company. And the process right now of taking the results of our models and then putting them into a real time serving system isn't that clean, and so we do batches every day on things that don't need to be near real-time, so things like predicted gender. If we know your first name, we've downloaded the list of baby names from the U.S. Social Security website and we can say the frequency of the name Pat 80 percent of the time it's a male, and 20 percent it's a female, but Joel is 99 percent of the time it's male and one percent of the time it's a female, so based on your tolerance for whatever you want to use this personalization for we can give you our degrees of confidence on that. That's one example of what we surface rate now in our API back to our own first party components of our app. But in the future with more real-time data coming in from Spark streaming with more real-time model scoring, and then the ability to push that over into some sort of capability that can be surfaced up through an API, it gives our data team the capability of being much more flexible and fast at surfacing things that can provide personalization to the end user, as opposed to what we have now which is all this batch processing and then loading once a day and then knowing that we can't react on the fly. >> So if I were to try and turn that into a sort of a roadmap, a Spark roadmap, it sounds like the process of taking the analysis and doing perhaps even online training to update the models, or just rescoring if you're doing a little slightly less fresh, but then serving it up from a high speed serving layer, that's when you can take data that's coming in from the game and send it back to improve the game in real time. >> Exactly. Yep. >> That's what you're looking for. >> Yeah. >> You and a lot of other people. >> Yeah I think so. >> So how's the event been for you? >> It's been great. There's some really smart people here. It's humbling when you go to some of these sessions and you know, we're fortunate where we try and not have to think about a lot of the details that people are explaining here, but it's really good to understand them and know that there are some smart people that are fixing these problems. As like all events, been some really good sessions, but the networking is amazing, so meeting lots of great people here, and hearing their stories too. >> And you're hoping to go to the hockey game tonight. >> Yeah, I'd love to go to the hockey game. See if we can get through the snow. >> Who are the Bruins playing tonight. >> San Jose. >> Oh, good. >> It could be a good game. >> Yeah, the rivalry. You guys into the hockey game? Alright, good. Alright, Joel, listen, thanks very much for coming on the Cube. Great segment. I really appreciate your insights and sharing. >> Okay, thanks for having me. >> You're welcome. Alright, keep it right there, everybody. George and I will be back right after this short break. This is the Cube. We're live from Spark Summit in Boston.

Published Date : Feb 9 2017

SUMMARY :

brought to you by Databricks. and a blizzard of content coming to you So tell us about Kik, this cool mobile chat app. and the reason for that really is Kik started off What does that mean to be a head of data? and I like to say I worked at Blackberry but conveniently at the end Blackberry was working Okay so, you come in, and where did you start? and on my first day our CEO in the meeting said, and also behaviors that we see them do. And how do you map to the product managers but one of the major things we found was a problem So the accountability is after the data scientist So that's the approach that we're taking. and data has that direct path to those developers, a lot of the tools that we've seen and we drew this kind of a line in the sand. One is kind of managed Spark, so that we don't have to do and it's kind of organized, but then if they that are coming, at least the promise of them coming in the Spark roadmap, things that could be improved. It's data as KPIs, so it's like the things you need from the game and send it back to improve the game and not have to think about a lot of the details See if we can get through the snow. Yeah, the rivalry. This is the Cube.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Canada	LOCATION	0.99+
Joel Cumming	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Blackberry	ORGANIZATION	0.99+
2010	DATE	0.99+
Joel	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 terabytes	QUANTITY	0.99+
20 percent	QUANTITY	0.99+
nine years	QUANTITY	0.99+
99 percent	QUANTITY	0.99+
Boston	LOCATION	0.99+
iPad	COMMERCIAL_ITEM	0.99+
three million	QUANTITY	0.99+
17 thousand employees	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
three thousand employees	QUANTITY	0.99+
Kik	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Waterloo, Ontario	LOCATION	0.99+
iPod	COMMERCIAL_ITEM	0.99+
three data scientists	QUANTITY	0.99+
two things	QUANTITY	0.99+
Python	TITLE	0.99+
100	QUANTITY	0.99+
one percent	QUANTITY	0.99+
first	QUANTITY	0.99+
Redshift	TITLE	0.99+
both	QUANTITY	0.99+
2 million users	QUANTITY	0.99+
80 percent	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
today	DATE	0.99+
Kik	PERSON	0.99+
five day	QUANTITY	0.99+
each	QUANTITY	0.99+
three data engineers	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
second	QUANTITY	0.99+
300 data pipelines	QUANTITY	0.98+
One	QUANTITY	0.98+
yesterday	DATE	0.98+
two ways	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
S3	TITLE	0.98+
one	QUANTITY	0.98+
Parquet	TITLE	0.98+
first day	QUANTITY	0.98+
Rogers	ORGANIZATION	0.98+
about half a year	QUANTITY	0.97+
once a day	QUANTITY	0.97+
Spark	TITLE	0.97+
Spark Summit East 2017	EVENT	0.97+
first 22 days	QUANTITY	0.97+
about 40 people	QUANTITY	0.97+
next year	DATE	0.97+
first thing	QUANTITY	0.96+
First time	QUANTITY	0.96+
Spark	ORGANIZATION	0.95+
U.S. Social Security	ORGANIZATION	0.95+
a week	QUANTITY	0.95+
80 million customers	QUANTITY	0.95+

David Richards, WANdisco - BigDataNYC - #BigDataNYC - #theCUBE

(silence) (upbeat techno music) >> Narrator: Live from New York, it's theCUBE, covering Big Data NYC 2016, brought to you by headline sponsors: Cisco... IBM... Nvidia, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Peter Burris. >> Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. David Richards is here. He's the CEO of WANdisco, a long time CUBE alum. Great to see you again. >> Great to be back. >> It was good fun hanging out with last night and a good surprise at the IBM event. There was good action across the street. >> Yeah, you're both looking surprisingly well, actually. >> (Dave laughs) Yes. >> Well, we also heard about the WANdisco versus theCUBE golf tournament, that apparently theCUBE just did really, really well in it and WANdisco went running away with their tail between their legs. >> Well, I talked to Furrier last night. I said, "David Richards was telling me "that he kicked your butt on the golf course." He goes, "Yeah, that's true, actually." (laughter) >> I think I've got some video proof that he actually gave me $20 live on air because, of course, his wallet was empty. (laughter) He was blowing the dust off it, you know? >> Of course, yeah, the body swerve. >> Alligator arms. >> So David, it's, again, great to see you again. You guys have been in this business since day one, and things are evolving. How are things changing for WANdisco? >> So, when we first came into this market, back in the mid-2006, 2007, and then we obviously made a bunch of acquisitions around 2011 and 2012 that took us headlong into the big data marketplace. We pretty much had a completely different business model to our business model now. Then, we had a product called Non-Stop NameNode... My God, can you imagine that? (Dave laughs) That was very focused on the Hadoop marketplace because, at that time, we believed, like everybody else, that Hadoop was going to take over the world, people were going to move to commoditized servers, open-source software, and solve the huge storage problems that they were going to have from both a cost and efficiency perspective. What I think has happened, or is happening right now, is this evolution, and it really is more of a revolution than an evolution is taking place, where workloads, and we were discussing this last night, are moving at massive scale to cloud, and people are really skipping that step, where we thought they were going to have 5, 10,000 sort of clusters on-premise, but now they have some clusters on-prem, but the bulk of the workloads are actually moving into cloud. I was just discussing with George, off-camera a few minutes ago, why that is happening, and there's a lot of applications that are very efficient. The cloud packs are up there ready to use, off the shelf, and it becomes very simplistic, and to be quite frank, do we really care anymore about all these different open-source components? Is the CIO waking up in the middle of the night thinking, oh, my God, am I going to use Ignite, am I going to use Spark, am I going to use Pig, am I going to use Hive, et cetera, et cetera, et cetera? Of course they're not. They really just want to-- Let's inverse the question to ourselves. If you were going to start a competitor to Uber tomorrow, would you go and build a data center (Dave laughs) or would you just throw up a thousand servers up in the cloud and have done with it, and use all the apps that are up there? Of course, the answer's simple, so that's really what's happening. >> Well, one of the things that I... I wrote a piece of research a million years ago in which I prognosticated, the Dictionary Word of the Day, that the value of middleware was inversely proportional to the degree to which anybody knew anything about it. (Dave laughs) CIOs are waking up and asking those questions today, which is an indication that they're creating a problem. >> Yep. >> Infrastructure has to do no harm in the organization. I had a CIO friend for years who still asks his chief CTO, "To what degree is infrastructure creating a problem "for me today?" >> Yeah. >> And if it's creating a problem, it's a problem. >> Mm-hmm. >> You don't want to have to know about this stuff, and so what degree are you helping companies mask some of those... that visibility, so that people can spend less time worrying about the infrastructure? >> So, what we're focused on is a business model that has gone from direct, where we were hiring out a very large direct sales force enterprise, the classic enterprise sales guys that would go knock on doors, knock deals down, go and sell to the Global 1000s, to an indirect model, and we announced that OAM, recently with IBM, IBM Big Replicate, that is under the covers, is WANdisco Fusion, which is a great deal for us. So, our focus very much is on data movement, and data movement between data centers, for companies that want to stay on-prem, and between data centers and in and out of cloud seamlessly, and the word there is seamlessly. So, we worked very hard for the past 18 months on our product such that anybody can go to, if you want to go to the AWS Marketplace, you can, in a few clicks, begin to replicate petabyte-scale in and out of cloud, and we think, and we were discussing this last night, that the hybrid-cloud model is really fascinating, so the ability to take data on-premise, query it in cloud, get complete consistency between on-prem and cloud, but also have all the efficiency in the cloud economics, the elasticity, all the applications that exist in cloud, and I think that model is really interesting, and what's interesting is, I'm not sure that the little guys can execute in that model other than, like we're doing, veer on OAM, an indirect model. So, I'm not sure whether or not, just to go back to the conversation, CIOs are as concerned as they used to be about which Hadoop distribution, for example, they're using. I never hear that question anymore. That question was a 2012, 2013 question. What the CIOs are now concerned about is the economics of cloud, and how do I get that less than $5 per terabyte of data economics that I get in a cloud environment. >> Well, but also increasingly, they're talking about the use cases. >> David: Yeah. >> They want to get their people... They don't want to replicate the Linux or Unix versus NT wars of the 1990s, which was made possible because they were focused on what accounting package am I going to run? Am I going to run it-- >> Yeah. >> on this or that? You know, it was known process, unknown technology. In today's universe, it's unknown process, and they don't want to know as much about the technology, so they're focused on how do I get my men and women focused on use cases that are delivering value for their business. >> Exactly, and the economics question is really simple. Am I going to build a massive, partially used, elastic infrastructure on-premise or am I just going to go and use the elastic infrastructure that already exists in the cloud? That's a no-brainer. That's already happening, and the good news for us, the good news for WANdisco, is it's precisely what we do. It's a data movement problem. Now, I'm bound to say that, but it is actually a data movement problem. In this idea that you have data that changes, active transactional data, as we call it, so the active transactional data movement is a really hard problem. You can't just take a snapshot, right? A file scan and then a snapshot and then move the data, and that's the problem that all the other data replication guys have got. That's what IBM, OAM, that's why we've got strategic partnerships with companies like Oracle, like Amazon, and why I'm sure we'll be announcing things in due course with the other cloud vendors, like Google, for example, and Microsoft with their Azure products. They all have that problem, so data movement, in and out of cloud, if it's batch, if it's static, if it's archival data, easy problem to solve. There's a million and one different replication products. >> Dave: Right. >> You can use rsync if you really wanted to do that, but active transactional data, data that changes, data that moves, you know, at petabyte scale, hard problem. That's the problem that we solve. >> Because you've got speed of light problems and you're exposing yourself to data loss-- >> Yep. >> if something goes wrong. >> Peter: Fidelity is a problem. >> An eventual consistency replication model-- >> Yeah, it... >> doesn't work. You can't... If I'm query... We've got a customer that's trying to look at cardiographs, right, in and out of cloud. I mean, would you really feel comfortable in your cardiograph eventually getting into the cloud and being analyzed? You know, would you? You've got to be absolutely crystal clear that the data is completely consistent from the stuff that I'm generating on-premise versus the models that I'm building in cloud. It's vitally important. >> Well, I would imagine there's regulations, in certain industries anyway, that-- >> Oh, yeah, absolutely. >> require that eventual consistency doesn't fit, right? >> Yeah. Well, I mean, at the moment, without us, that's all you got, I'm afraid... >> Okay. >> Well, so, I'm on a mission, let me and I want to get your take on it, that we always talk about elastic infrastructure, which is a given workload, being able to scale up and scale down. >> David: Yeah. >> I think it's time to start talking about plastic infrastructure-- >> David: Oh, yeah, I like it. >> where a given workload, but a reconfiguration of how that workload is applied because of the value of data, because of integration, because of the need to be able to move in response to business needs. So we talk about plastic infrastructure, where we are reconfiguring based on policy and rules and some other things. What do you think about that? >> I love it, and the reason I love it is because, just to take a step back, the definition of hybrid cloud is... You would imagine it would be relatively simple, but to me, a hybrid means that you have... You know, it's a bit like a hybrid golf club. It's neither a driver nor an iron. It's somewhere in between. So, you have the same workload that can exist both on-premise and in the cloud. I can use both the cloud and on-premise interchangeably. What hybrid cloud actually means, for all the vendors, and this is their dirty little secret, it means that you have some workloads running against some data in the cloud and others that will run against some data on-premise. Now, why do they do that? Because they have to. Because they can't guarantee complete consistency between on-premise and cloud. Our definition of hybrid cloud is exactly the same data, if you want, between on-premise and cloud, and I love this plastic phrase, the idea of repurposing all of those applications, and they can live anywhere. It doesn't matter 'cause it's the same data. >> Yeah, so we have two terms we have to copyright here, plastic infrastructure. >> Plastic... >> What was the other one we heard? >> Data portfolio. >> Data portfolio, yeah. We'll run the tape back >> Plastic infrastructure. (laughter) >> Plastic infrastructure. >> I'm going to steal it (laughs). >> Please do, you know? But the key thing is, as these technologies get more deeply embedded within business and how the business runs, it's incumbent upon the technology leadership to be able to rapidly be able to reconfigure the infrastructure in response to what the business needs. That's not elasticity. >> Yeah. >> That's plasticity. >> I love it, absolutely. (Peter laughs) And I think you're touching on something that's changing, and what we discussed earlier, which is that CIOs aren't waking up in the middle of the night thinking, am I going to use Pig or Hive or any of those other open source components. They're thinking about the applications that they're going to build. How am I actually going to start using this data? And I think the agenda's kind of moved on, and walking around the whole... There's still a little bit of confusion. You still have people talking about infrastructure like it really still matters. I'm not absolutely sure it does. >> Well, so let's talk about that. We got a few minutes or something like that. >> Dave: It matters when it breaks, you know? >> What's that? >> It matters when it breaks. >> It sure does matter when it breaks. >> You know, but otherwise, nobody wants to think about it. >> No, yeah, because like I said earlier, it's the degree to which-- >> We have time, but I want to explore the new distribution model as well. >> Yeah, go ahead. >> Let me do that, get that out, tick that box, if I can. Help me understand, David, how it all works. So you, the partnership with IBM and others, you mentioned Amazon, how does it work? You are in the IBM cloud offering? IBM is actually selling that offering? Is it a branded IBM product? >> So, it's in the big data analytics and cloud offerings. So, at the moment, IBM are very focused, as you know, on owning the platform. IBM, as a company, have the own the platform. >> Dave: Yeah, absolutely. >> So, I'm delighted to say that we're embedded into their platform. Now, they had a big launch of some products last night. >> Yeah. >> I know that they were talking about IBM Big Replicate, which is 100% white label OAM of WANdisco Fusion to solve some very specific problems, primarily around data movement. So, at the hybrid cloud, how do I punch data out into clouds, run the analytics against it, and be sure that I'm going to get the right results? That's what Big Replicate solves, and also, they're moving into mixed environments, whether they're NetApp, just kind of Teradata environment, SAS-based environments, or whether a customer already has an existing distribution of, say, Cloudera or Hortonworks, so they can live alongside that, so we can replicate data between existing deployments, where they may have already made a strategic decision to go with one of those distributions, and also be able to migrate not just into IBM Big Insights, but also into their cloud offering, so that's a great deal for us. We're not... They're selling it themselves. I mean, obviously we've done a lot of field enablement, trained 5,000 or so IBM sales rep, and, you know, if a small company like WANdisco, or a small company like virtually any of the vendors in there that are not in the Global 1000 list, the go-to market has to be indirect. >> And so you're... Totally agree, and so you're basically, if I understand it correctly, you're moving what are conventional filers into the cloud. Customers are doing that. >> Oh. >> How fast is that happening and why are they doing that? >> My, God. I mean, we have not announced this product yet, but we're in the middle of launching it. It's, at scale, moving petabyte-scale data from, and this is transactional data, so it's a hard problem to solve, right, so it's an active data... It's an active transactional data replication problem. So, a lot of... The dirty little secret in the cloud is that a lot of those NFS filers have not moved yet-- >> Right. >> And why haven't they moved? 'Cause they can't. Because you can't just... You know, if you were to travel, one of the customaries of banks and travel companies is they can't press pause in their organization, do a file scan that's going to take six months, and then turn it back on again, and hey, presto, it's in the cloud. You can't do that. So, you kind of have to... At every single migration of those filers, of any sort of data, is a hybrid model, so you have to be able to run both on-prem and cloud while that migration is happening, and there, I can tell you, are a lot, a hell of a lot of NetApp filers that are going to move very soon here, in time. >> Dave: Oh, 'cause that's the problem that you solve. Otherwise, you'd have to freeze everything, which would kill your business, so you can't do it. >> Yeah, so when human beings imagine things, we're always imagining small use cases, small sets, like moving a few files into Dropbox or something, and that's okay that I can't edit those files for the few seconds it takes to move. I took a look at a deal the other day that was 3 billion files. (Dave laughs) Right, 3 billion. You can't even... My brain can't even calculate that, right? That's a three to six month data movement, and Amazon, for example, thought of this product called Snowball, which-- >> Yeah. >> You know, no techy ever believes this story, but, of course, they FedEx a box, a ruggedized hard drive to you essentially, a ruggedized server that you pour your data into it and then you mail it back to them and they can put it there. That doesn't work, of course, for transactional data, for data that changes all the time. >> These are hard problems to solve, and I go to market, getting back to your question, it is all about indirect, you know? So, AWS, a strategic partnership, that, Oracle, a strategic partnership, that, IBM... And as I said, I'm sure that we'll be doing things with Google and Microsoft soon, and they're the five partnerships that I really care about, to be quite frank with you. >> Mm-hmm. and this comes back to this notion of infrastructure, the value of infrastructure, and just to touch on it for a second, so many years ago, when we were doing client-server, >> David: Mm-hmm. >> We would test it on a local area network and deploy it on a WAN (David laughs) and wonder why it blew up. >> David: Yeah. >> The realities of the speed of light and the practical limitations have a real impact on design, and so where infrastructure still matters is we still have to worry about design, we still have to worry about legacy financial assets, how we're deploying those assets, and I want to come back to this because we were talking earlier about data as an asset, the value of data within the business, and you don't want to be limited by the legacy as you try to find new ways of generating value out of your data, and what you guys are trying to allow is that the data can be moved in response to the use case as opposed to the use case not being made possible because of the legacy decisions about where to put your data. >> David: That's precisely it, and I don't think that any CIO, in their right mind, wants to continue with the huge maintenance costs, maintenance payments they have to make to some of those vendors, some of those NFS-based vendors. They need to shut them down. They have to figure out a way to move them into cloud so you get cloud economics, and also be able to query the data in a massively efficient way. You simply cannot do that at the moment. They simply cannot do that at the moment, so, as I said, as we continue to launch these products in the marketplace, I'm sure you'll see, at scale, some pretty large companies surprising-- You know, the two that spring to my mind are that the regulators in the US and the UK, Fenero and the FCA, are both in the process of their moving all into cloud, 100% into cloud, and I would expect to see that trend continue. I mean, the re:Invent... I don't want to talk about another-- and we're here at Strata, but the AWS re:Invent, I would expect to see several major financial service companies announcing cloud strategy. >> Yeah, and Fenero's a big user of the AWS cloud. They talk about it pretty aggressively, and really interesting use case there. So, yeah, so we got to end. What's next for you guys? You've mentioned you're going to be at re:Invent, you're going to be at World of Watson (laughs)? Where are we going to find you next? >> Both of those. Obviously, the white label with IBM is a really interesting deal for us. I can't talk about deal flow yet 'cause it's our end of quarter at the moment, but I can tell you that they're doing a pretty damn good job of selling, so we're in execution mode at the moment, where we've already announced some key partnerships. There'll be more key partnerships to come, I'm sure. We're obviously chasing deals down with some of the other cloud vendors, and I'd expect to see us announcing some interesting new customer wins in the coming days and weeks. >> Dave: Great. Well, congratulations on the momentum and the renewed strategy. I love it, and I appreciate you coming to theCUBE. >> Always a pleasure. >> All right, keep it right there, buddy. We'll be back with our next guest. This is theCUBE. We're live at Big Data NYC, Strata and Hadoop World. Be right back. (spacey electronica music)

Published Date : Sep 29 2016

SUMMARY :

brought to you by headline sponsors: Great to see you again. and a good surprise at the IBM event. Yeah, you're both looking and WANdisco went running away butt on the golf course." He was blowing the dust off it, you know? great to see you again. Let's inverse the question to ourselves. that the value of middleware no harm in the organization. And if it's creating a and so what degree are so the ability to take data on-premise, they're talking about the use cases. Am I going to run it-- as much about the technology, and that's the problem That's the problem that we solve. that the data is completely consistent Well, I mean, at the moment, without us, being able to scale up and scale down. because of the need to be but to me, a hybrid means that you have... Yeah, so we have two terms We'll run the tape back Plastic infrastructure. in response to what the business needs. that they're going to build. Well, so let's talk about that. You know, but otherwise, to explore the new You are in the IBM cloud offering? So, it's in the big data analytics So, I'm delighted to the go-to market has to be indirect. into the cloud. The dirty little secret in the cloud is and hey, presto, it's in the cloud. the problem that you solve. for the few seconds it takes to move. for data that changes all the time. and I go to market, getting and this comes back to this notion and deploy it on a WAN (David laughs) and the practical limitations You simply cannot do that at the moment. going to be at re:Invent, and I'd expect to see us announcing and the renewed strategy. Strata and Hadoop World.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
FCA	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Fenero	ORGANIZATION	0.99+
David Richards	PERSON	0.99+
$20	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
George	PERSON	0.99+
2012	DATE	0.99+
three	QUANTITY	0.99+
100%	QUANTITY	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
OAM	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
3 billion	QUANTITY	0.99+
two terms	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
5,000	QUANTITY	0.99+
US	LOCATION	0.99+
two	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Linux	TITLE	0.99+
mid-2006	DATE	0.99+
Both	QUANTITY	0.99+
both	QUANTITY	0.99+
five partnerships	QUANTITY	0.99+
2013	DATE	0.99+
tomorrow	DATE	0.99+
Unix	TITLE	0.98+
2011	DATE	0.98+
one	QUANTITY	0.98+
six month	QUANTITY	0.98+
5, 10,000	QUANTITY	0.98+
last night	DATE	0.97+
less than $5 per terabyte	QUANTITY	0.97+
Hadoop World	LOCATION	0.97+
1990s	DATE	0.96+
Dropbox	ORGANIZATION	0.96+

Joel Horwitz, IBM & David Richards, WANdisco - Hadoop Summit 2016 San Jose - #theCUBE

>> Narrator: From San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016. Brought to you by Hortonworks. Here's your host, John Furrier. >> Welcome back everyone. We are here live in Silicon Valley at Hadoop Summit 2016, actually San Jose. This is theCUBE, our flagship program. We go out to the events and extract the signal to the noise. Our next guest, David Richards, CEO of WANdisco. And Joel Horowitz, strategy and business development, IBM analyst. Guys, welcome back to theCUBE. Good to see you guys. >> Thank you for having us. >> It's great to be here, John. >> Give us the update on WANdisco. What's the relationship with IBM and WANdisco? 'Cause, you know. I can just almost see it, but I'm not going to predict. Just tell us. >> Okay, so, I think the last time we were on theCUBE, I was sitting with Re-ti-co who works very closely with Joe. And we began to talk about how our partnership was evolving. And of course, we were negotiating an OEM deal back then, so we really couldn't talk about it very much. But this week, I'm delighted to say that we announced, I think it's called IBM Big Replicate? >> Joel: Big Replicate, yeah. We have a big everything and Replicate's the latest edition. >> So it's going really well. It's OEM'd into IBM's analytics, big data products, and cloud products. >> Yeah, I'm smiling and smirking because we've had so many conversations, David, on theCUBE with you on and following your business through the bumpy road or the wild seas of big data. And it's been a really interesting tossing and turning of the industry. I mean, Joel, we've talked about it too. The innovation around Hadoop and then the massive slowdown and realization that cloud is now on top of it. The consumerization of the enterprise created a little shift in the value proposition, and then a massive rush to build enterprise grade, right? And you guys had that enterprise grade piece of it. IBM, certainly you're enterprise grade. You have enterprise everywhere. But the ecosystem had to evolve really fast. What happened? Share with the audience this shift. >> So, it's classic product adoption lifecycle and the buying audience has changed over that time continuum. In the very early days when we first started talking more at these events, when we were talking about Hadoop, we all really cared about whether it was Pig and Hive. >> You once had a distribution. That's a throwback. Today's Thursday, we'll do that tomorrow. >> And the buying audience has changed, and consequently, the companies involved in the ecosystem have changed. So where we once used to really care about all of those different components, we don't really care about the machinations below the application layer anymore. Some people do, yes, but by and large, we don't. And that's why cloud for example is so successful because you press a button, and it's there. And that, I think, is where the market is going to very, very quickly. So, it makes perfect sense for a company like WANdisco who've got 20, 30, 40, 50 sales people to move to a company like IBM that have 4 or 5,000 people selling our analytics products. >> Yeah, and so this is an OEM deal. Let's just get that news on the table. So, you're an OEM. IBM's going to OEM their product and brand it IBM, Big Replication? >> Yeah, it's part of our Big Insights Portfolio. We've done a great job at growing this product line over the last few years, with last year talking about how we decoupled all the value-as from the core distribution. So I'm happy to say that we're both part of the ODPI. It's an ODPI-certified distribution. That is Hadoop that we offer today for free. But then we've been adding not just in terms of the data management capabilities, but the partnership here that we're announcing with WANdisco and how we branded it as Big Replicate is squarely aimed at the data management market today. But where we're headed, as David points out, is really much bigger, right? We're talking about support for not only distributed storage and data, but we're also talking about a hybrid offering that will get you to the cloud faster. So not only does Big Replicate work with HDFS, it also works with the Swift objects store, which as you know, kind of the underlying storage for our cloud offering. So what we're hoping to see from this great partnership is as you see around you, Hadoop is a great market. But there's a lot more here when you talk about managing data that you need to consider. And I think hybrid is becoming a lot larger of a story than simply distributing your processing and your storage. It's becoming a lot more about okay, how do you offset different regions? How do you think through that there are multiple, I think there's this idea that there's one Hadoop cluster in an enterprise. I think that's factually wrong. I think what we're observing is that there's actually people who are spinning up, you know, multiple Hadoop distributions at the line of business for maybe a campaign or for maybe doing fraud detection, or maybe doing log file, whatever. And managing all those clusters, and they'll have Cloud Arrow. They'll have Hortonworks. They'll have IBM. They'll have all of these different distributions that they're having to deal with. And what we're offering is sanity. It's like give me sanity for how I can actually replicate that data. >> I love the name Big Replicate, fantastic. Big Insights, Big Replicate. And so go to market, you guys are going to have bigger sales force. It's a nice pop for you guys. I mean, it's good deal. >> We were just talking before we came on air about sort of a deal flow coming through. It's coming through, this potential deal flow coming through, which has been off the charts. I mean, obviously when you turn on the tap, and then suddenly you enable thousands and thousands of sales people to start selling your products. I mean, IBM, are doing a great job. And I think IBM are in a unique position where they own both cloud and on-prem. There are very few companies that own both the on-prem-- >> They're going to need to have that connection for the companies that are going hybrid. So hybrid cloud becomes interesting right now. >> Well, actually, it's, there's a theory that says okay, so, and we were just discussing this, the value of data lies in analytics, not in the data itself. It lies in you've been able to pull out information from that data. Most CIOs-- >> If you can get the data. >> If you can get the data. Let's assume that you've got the data. So then it becomes a question of, >> That's a big assumption. Yes, it is. (laughs) I just had Nancy Handling on about metadata. No, that's an issue. People have data they store they can't do anything with it. >> Exactly. And that's part of the problem because what you actually have to have is CPU slash processing power for an unknown amount of data any one moment in time. Now, that sounds like an elastic use case, and you can't do elastic on-prem. You can only do elastic in cloud. That means that virtually every distribution will have to be a hybrid distribution. IBM realized this years ago and began to build this hybrid infrastructure. We're going to help them to move data, completely consistent data, between on-prem and cloud, so when you query things in the cloud, it's exactly the same results and the correct results you get. >> And also the stability too on that. There's so many potential, as we've discussed in the past, that sounds simple and logical. To do an enterprise grade is pretty complex. And so it just gives a nice, stable enterprise grade component. >> I mean, the volumes of data that we're talking about here are just off the charts. >> Give me a use case of a customer that you guys are working with, or has there been any go-to-market activity or an ideal scenario that you guys see as a use case for this partnership? >> We're already seeing a whole bunch of things come through. >> What's the number one pattern that bubbles up to the top? Use case-wise. >> As Joel pointed out, that he doesn't believe that any one company just has one version of Hadoop behind their firewall. They have multiple vendors. >> 100% agree with that. >> So how do you create one, single cluster from all of those? >> John: That's one problem you solved. >> That's of course a very large problem. Second problem that we're seeing in spades is I have to move data to cloud to run analytics applications against it. That's huge. That required completely guaranteed consistent data between on-prem and cloud. And I think those two use cases alone account for pretty much every single company. >> I think there's even a third here. I think the third is actually, I think frankly there's a lot of inefficiencies in managing just HDFS and how many times you have to actually copy data. If I looked across, I think the standard right now is having like three copies. And actually, working with Big Replicate and WANdisco, you can actually have more assurances and actually have to make less copies across the cluster and actually across multiple clusters. If you think about that, you have three copies of the data sitting in this cluster. Likely, an analysts have a dragged a bunch of the same data in other clusters, so that's another multiple of three. So there's amount of waste in terms of the same data living across your enterprise. That I think there's a huge cost-savings component to this as well. >> Does this involve anything with Project Atlas at all? You guys are working with, >> Not yet, no. >> That project? It's interesting. We're seeing a lot of opening up the data, but all they're doing is creating versions of it. And so then it becomes version control of the data. You see a master or a centralization of data? Actually, not centralize, pull all the data in one spot, but why replicate it? Do you see that going on? I guess I'm not following the trend here. I can't see the mega trend going on. >> It's cloud. >> What's the big trend? >> The big trend is I need an elastic infrastructure. I can't build an elastic infrastructure on-premise. It doesn't make economic sense to build massive redundancy maybe three or four times the infrastructure I need on premise when I'm only going to use it maybe 10, 20% of the time. So the mega trend is cloud provides me with a completely economic, elastic infrastructure. In order to take advantage of that, I have to be able to move data, transactional data, data that changes all the time, into that cloud infrastructure and query it. That's the mega trend. It's as simple as that. >> So moving data around at the right time? >> And that's transaction. Anybody can say okay, press pause. Move the data, press play. >> So if I understand this correctly, and just, sorry, I'm a little slow. End of the day today. So instead of staging the data, you're moving data via the analytics engines. Is that what you're getting at? >> You use data that's being transformed. >> I think you're accessing data differently. I think today with Hadoop, you're accessing it maybe through like Flume or through Oozy, where you're building all these data pipelines that you have to manage. And I think that's obnoxious. I think really what you want is to use something like Apache Spark. Obviously, we've made a large investment in that earlier, actually, last year. To me, what I think I'm seeing is people who have very specific use cases. So, they want to do analysis for a particular campaign, and so they may just pull a bunch of data into memory from across their data environment. And that may be on the cloud. It may be from a third-party. It may be from a transactional system. It may be from anywhere. And that may be done in Hadoop. It may not, frankly. >> Yeah, this is the great point, and again, one of the themes on the show is, this is a question that's kind of been talked about in the hallways. And I'd love to hear your thoughts on this. Is there are some people saying that there's really no traction for Hadoop in the cloud. And that customers are saying, you know, it's not about just Hadoop in the cloud. I'm going to put in S3 or object store. >> You're right. I think-- >> Yeah, I'm right as in what? >> Every single-- >> There's no traction for Hadoop in the cloud? >> I'll tell you what customers tell us. Customers look at what they actually need from storage, and they compare whatever it is, Hadoop or any on-premise proprietor storage array and then look at what S3 and Swift and so on offer to them. And if you do a side-by-side comparison, there isn't really a difference between those two things. So I would argue that it's a fact that functionally, storage in cloud gives you all the functionality that any customer would need. And therefore, the relevance of Hadoop in cloud probably isn't there. >> I would add to that. So it really depends on how you define Hadoop. If you define Hadoop by the storage layer, then I would say for sure. Like HDFS versus an objects store, that's going to be a difficult one to find some sort of benefit there. But if you look at Hadoop, like I was talking to my friend Blake from Netflix, and I was asking him so I hear you guys are kind of like replatforming on Spark now. And he was basically telling me, well, sort of. I mean, they've invested a lot in Pig and Hive. So if you think it now about Hadoop as this broader ecosystem which you brought up Atlas, we talk about Ranger and Knox and all the stuff that keeps coming out, there's a lot of people who are still invested in the peripheral ecosystem around Hadoop as that central point. My argument would be that I think there's still going to be a place for distributed computing kind of projects. And now whether those will continue to interface through Yarn via and then down to HDFS, or whether that'll be Yarn on say an objects store or something and those projects will persist on their own. To me that's kind of more of how I think about the larger discussion around Hadoop. I think people have made a lot of investments in terms of that ecosystem around Hadoop, and that's something that they're going to have to think through. >> Yeah. And Hadoop wasn't really designed for cloud. It was designed for commodity servers, deployment with ease and at low cost. It wasn't designed for cloud-based applications. Storage in cloud was designed for storage in cloud. Right, that's with S3. That's what Swift and so on were designed specifically to do, and they fulfill most of those functions. But Joel's right, there will be companies that continue to use-- >> What's my whole argument? My whole argument is that why would you want to use Hadoop in the cloud when you can just do that? >> Correct. >> There's object store out. There's plenty of great storage opportunities in the cloud. They're mostly shoe-horning Hadoop, and I think that's, anyway. >> There are two classes of customers. There were customers that were born in the cloud, and they're not going to suddenly say, oh you know what, we need to build our own server infrastructure behind our own firewall 'cause they were born in the cloud. >> I'm going to ask you guys this question. You can choose to answer or not. Joel may not want to answer it 'cause he's from IBM and gets his wrist slapped. This is a question I got on DM. Hadoop ecosystem consolidation question. People are mailing in the questions. Now, keep sending me your questions if you don't want your name on it. Hold on, Hadoop system ecosystem. When will this start to happen? What is holding back the M and A? >> So, that's a great question. First of all, consolidation happens when you sort of reach that tipping point or leveling off, that inflection point where the market levels off, and we've reached market saturation. So there's no more market to go after. And the big guys like IBM and so on come in-- >> Or there was never a market to begin with. (laughs) >> I don't think that's the case, but yes, I see the point. Now, what's stopping that from happening today, and you're a naughty boy by the way for asking this question, is a lot of these companies are still very well funded. So while they still have cash on the balance sheet, of course, it's very, very hard for that to take place. >> You picked up my next question. But that's a good point. The VCs held back in 2009 after the crash of 2008. Sequoia's memo, you know, the good times role, or RIP good times. They stopped funding companies. Companies are getting funded, continually getting funding. Joel. >> So I don't think you can look at this market as like an isolated market like there's the Hadoop market and then there's a Spark market. And then even there's like an AI or cognitive market. I actually think this is all the same market. Machine learning would not be possible if you didn't have Hadoop, right? I wouldn't say it. It wouldn't have a resurgence that it has had. Mahout was one of the first machine learning languages that caught fire from Ted Dunning and others. And that kind of brought it back to life. And then Spark, I mean if you talk to-- >> John: I wouldn't say it creates it. Incubated. >> Incubated, right. >> And created that Renaissance-like experience. >> Yeah, deep learning, Some of those machine learning algorithms require you to have a distributed kind of framework to work in. And so I would argue that it's less of a consolidation, but it's more of an evolution of people going okay, there's distributed computing. Do I need to do that on-premise in this Hadoop ecosystem, or can I do that in the cloud, or in a growing Spark ecosystem? But I would argue there's other things happening. >> I would agree with you. I love both areas. My snarky comment there was never a market to begin with, what I'm saying there is that the monetization of commanding the hill that everyone's fighting for was just one of many hills in a bigger field of hills. And so, you could be in a cul-de-sac of being your own champion of no paying customers. >> What you have-- >> John: Or a free open-source product. >> Unlike the dotcom era where most of those companies were in the public markets, and you could actually see proper valuations, most of the companies, the unicorns now, most are not public. So the valuations are really difficult to, and the valuation metrics are hard to come by. There are only few of those companies that are in the public market. >> The cash story's right on. I think to Joel' point, it's easy to pivot in a market that's big and growing. Just 'cause you're in the wrong corner of the market pivoting or vectoring into the value is easier now than it was 10 years ago. Because, one, if you have a unicorn situation, you have cash on the bank. So they have a good flush cash. Your runway's so far out, you can still do your thing. If you're a startup, you can get time to value pretty quickly with the cloud. So again, I still think it's very healthy. In my opinion, I kind of think you guys have good analysis on that point. >> I think we're going to see some really cool stuff happen working together, and especially from what I'm seeing from IBM, in the fact that in the IT crowd, there is a behavioral change that's happening that Hadoop opened the door to. That we're starting to see more and more It professionals walk through. In the sense that, Hadoop has opened the door to not thinking of data as a liability, but actually thinking about data differently as an asset. And I think this is where this market does have an opportunity to continue to grow as long as we don't get carried away with trying to solve all of the old problems that we solved for on-premise data management. Like if we do that, then we're just, then there will be a consolidation. >> Metadata is a huge issue. I think that's going to be a big deal. And on the M and A, my feeling on the M and A is that, you got to buy something of value, so you either have revenue, which means customers, and or initial property. So, in a market of open source, it comes back down to the valuation question. If you're IBM or Oracle or HP, they can pivot too. And they can be agile. Now slower agile, but you know, they can literally throw some engineers at it. So if there's no customers in I and P, they can replicate, >> Exactly. >> That product. >> And we're seeing IBM do that. >> They don't know what they're buying. My whole point is if there's nothing to buy. >> I think it depends on, ultimately it depends on where we see people deriving value, and clearly in WANdisco, there's a huge amount of value that we're seeing our customers derive. So I think it comes down to that, and there is a lot of IP there, and there's a lot of IP in a lot of these companies. I think it's just a matter of widening their view, and I think WANdisco is probably the earliest to do this frankly. Was to recognize that for them to succeed, it couldn't just be about Hadoop. It actually had to expand to talk about cloud and talk about other data environments, right? >> Well, congratulations on the OEM deal. IBM, great name, Big Replicate. Love it, fantastic name. >> We're excited. >> It's a great product, and we've been following you guys for a long time, David. Great product, great energy. So I'm sure there's going to be a lot more deals coming on your. Good strategy is OEM strategy thing, huh? >> Oh yeah. >> It reduces sales cost. >> Gives us tremendous operational leverage. Getting 4,000, 5,000-- >> You get a great partner in IBM. They know the enterprise, great stuff. This is theCUBE bringing all the action here at Hadoop. IBM OEM deal with WANdisco all happening right here on theCUBE. Be back with more live coverage after this short break.

Published Date : Jul 1 2016

SUMMARY :

Brought to you by Hortonworks. extract the signal to the noise. What's the relationship And of course, we were Replicate's the latest edition. So it's going really well. The consumerization of the enterprise and the buying audience has changed That's a throwback. And the buying audience has changed, Let's just get that news on the table. of the data management capabilities, I love the name Big that own both the on-prem-- for the companies that are going hybrid. not in the data itself. If you can get the data. I just had Nancy Handling and the correct results you get. And also the stability too on that. I mean, the volumes of bunch of things come through. What's the number one pattern that any one company just has one version And I think those two use cases alone of the data sitting in this cluster. I guess I'm not following the trend here. data that changes all the time, Move the data, press play. So instead of staging the data, And that may be on the cloud. And that customers are saying, you know, I think-- Swift and so on offer to them. and all the stuff that keeps coming out, that continue to use-- opportunities in the cloud. and they're not going to suddenly say, What is holding back the M and A? And the big guys like market to begin with. hard for that to take place. after the crash of 2008. And that kind of brought it back to life. John: I wouldn't say it creates it. And created that or can I do that in the cloud, that the monetization that are in the public market. I think to Joel' point, it's easy to pivot And I think this is where this market I think that's going to be a big deal. there's nothing to buy. the earliest to do this frankly. Well, congratulations on the OEM deal. So I'm sure there's going to be Gives us tremendous They know the enterprise, great stuff.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Joel	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Joe	PERSON	0.99+
David Richards	PERSON	0.99+
Joel Horowitz	PERSON	0.99+
2009	DATE	0.99+
John	PERSON	0.99+
4	QUANTITY	0.99+
WANdisco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
20	QUANTITY	0.99+
San Jose	LOCATION	0.99+
HP	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Joel Horwitz	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Big Replicate	ORGANIZATION	0.99+
last year	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Big Replicate	ORGANIZATION	0.99+
40	QUANTITY	0.99+
30	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
third	QUANTITY	0.99+
today	DATE	0.99+
Hadoop	TITLE	0.99+
San Jose, California	LOCATION	0.99+
three	QUANTITY	0.99+
two things	QUANTITY	0.99+
2008	DATE	0.99+
5,000 people	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
David Richards	PERSON	0.99+
Blake	PERSON	0.99+
4,000, 5,000	QUANTITY	0.99+
S3	TITLE	0.99+
two classes	QUANTITY	0.99+
tomorrow	DATE	0.99+
Second problem	QUANTITY	0.99+
both areas	QUANTITY	0.99+
three copies	QUANTITY	0.99+
Hadoop Summit 2016	EVENT	0.99+
Swift	TITLE	0.99+
both	QUANTITY	0.99+
Big Insights	ORGANIZATION	0.99+
one problem	QUANTITY	0.98+
Today	DATE	0.98+

Dr. Amr Awadallah - Interview 2 - Hadoop World 2011 - theCUBE

Yeah, I'm Aala, They're the co-founder back to back. This is the cube silicon angle.com, Silicon angle dot TV's production of the cube, our flagship telecasts. We go out to the event. That was a great conversation. I was really just, just cool. I could have, we could have probably hit on a few more things, obviously well read. Awesome. Co-founder of Cloudera a. You were, you did a good job teaming up with that co-founder, huh? Not bad on the cube, huh? He's not bad on the cube, isn't he? He, >>He reads the internet. >>That's what I'm saying. >>Anything is going on. >>He's a cube star, you know, And >>Technology. Jeff knows it. Yeah. >>We, we tell you, I'm smarter just by being in Cloudera all those years. And I actually was following what he was saying, Sad and didn't dust my brain. So, Okay, so you're back. So we were talking earlier with Michaels and about the relational database thing. So I kind of pick that up where we left off with you around, you know, he was really excited. It's like, you know, hey, we saw that relational database movement happen. He was part of that. Yeah, yeah. That generation. And then, but things were happening or kind of happening the same way in a similar way, still early. So I was trying to really peg with him, how early are we, like, so, you know, as the curve, you know, this is 1400, it's not the Javit Center yet. Maybe the Duke world, you know, next year might be at the Javit Center, 35,000 just don't go to Vegas. So I'm trying to figure out where we are on that curve. Yeah. And we on the upwards slope, you know, down here, not even hitting that, >>I think, I think, I think we're moving up quicker than previous waves. And actually if you, if you look for example, Oracle, I think it took them 15, 20 years until they, they really became a mature company, VM VMware, which started about, what, 12, 13 years ago. It took them about maybe eight years to, to be a big company, met your company, and I'm hoping we're gonna do it in five. So a couple more years. >>Highly accelerated. >>Yes. But yeah, we see, I mean, I'm, I'm, I've been surprised by the growth. I have been, Right? I've been told, warned about enterprise software and, and that it takes long for production to take place. >>But the consumerization trend is really changing that. I mean, it seems to be that, yeah, the enterprises always last. Why the shorter >>Cycle? I think the shorter cycle is coming from having the, the, the, the right solution for the right problem at the right time. I think that's a big part of it. So luck definitely is a big part of this. Now, in terms of why this is changing compared to a couple of dec decades ago, why the adoption is changing compared to a couple of decades ago. I, I think that's coming just because of how quickly the technology itself, the underlying hardware is evolving. So right now, the fact that you can buy a single server and it has eight cores to 16 cores has 12 hards to terabytes. Each is, is something that's just pushing the, the, the, the limits what you can do with the existing systems and hence making it more likely for new systems to disrupt them. >>Yeah. We can talk about a lot. It's very easy for people to actually start a, a big data >>Project. >>Yes. For >>Example. Yes. And the hardest part is, okay, what, what do I really, what problem do I need to solve? How am I gonna, how am I gonna monetize it? Right? Those are the hard parts. It's not the, not the underlying >>Technology. Yes, Yes, that's true. That's true. I mean, >>You're saying, eh, you're saying >>Because, because I'm seeing both so much. I'm, I'm seeing both. I'm seeing both. And like, I'm seeing cases where you're right. There's some companies that was like, Oh, this Hadoop thing is so cool. What problem can I solve with it? And I see other companies, like, I have this huge problem and, and, and they don't know that HA exists. It's so, And once they know, they just jump on it right away. It's like, we know when you have a headache and you're searching for the medicine in Espin. Wow. It >>Works. I was talking to Jeff Hiba before he came on stage and, and I didn't even get to it cuz we were so on a nice riff there. Right. Bunch of like a musicians playing the guitar together. But like he, we talked about the it and and dynamics and he said something that I thoughts right. On money and SAP is talking the same thing and said they're going to the lines of business. Yes. Because it is the gatekeeper that's, it's like selling mini computers to a mainframe selling client servers from a mini computer team. Yeah. >>There's not, we're seeing, we're seeing both as well. So more likely the, the former one meaning, meaning that yes, line of business and departments, they adopt the technology and then it comes in and they see there's already these five different departments having it and they think, okay, now we need to formalize this across the organization. >>So what happens then? What are you seeing out there? Like when that happens, that mean people get their hands on, Hey, we got a problem to solve. Yeah. Is that what it comes down to? Well, Hadoop exist. Go get Hadoop. Oh yeah. They plop it in there and I what does it do? They, >>So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and solving the problem for them. Yeah. And when that happens, it's a very, it's a very easy adoption from there on because they just go tell it, We need this right now because it's solving this problem and it's gonna make, make us much >>More money moving it right in. Yes. No problems. >>Is is that another reason why the cycle's compressed? I mean, you know, you think client server, there was a lot of resistance from it and now it's more much, Same thing with mobile. I mean mobile is flipped, right? I mean, so okay, bring it in. We gotta deal with it. Yep. I would think the same thing. We, we have a data problem. Let's turn it into an >>Opportunity. Yeah. In my, and it goes back to what I said earlier, the right solution for the right problem at the right time. Like when they, when you have larger amounts of unstructured data, there isn't anything else out there that can even touch what had, can >>Do. So Amar, I need to just change gears here a minute. The gaming stuff. So we have, we we're featured on justin.tv right now on the front page. Oh wow. But the numbers aren't coming in because there's a competing stream of a recently released Modern Warfare three feature. Yes. Yes. So >>I was looking for, we >>Have to compete with Modern Warfare three. So can you, can we talk about Modern Warfare three for a minute and share the folks what you think of the current version, if any, if you played it. Yeah. So >>Unfortunately I'm waiting to get back home. I don't have my Xbox with me here. >>A little like a, I'm talking about >>My lines and business. >>Boom. Water warfares like a Christmas >>Tree here. Sorry. You know, I love, I'm a big gamer. I'm a big video gamer at Cloudera. We have every Thursday at five 30 end office, we, we play Call of of Beauty version four, which is modern world form one actually. And I challenge, I challenge people out there to come challenge our team. Just ping me on Twitter and we'll, we'll do a Cloudera versus >>Let's, let's, let's reframe that. Let team out. There am Abalas company. This is the geeks that invent the future. Jeff Haer Baer at Facebook now at Cloudera. Hammerer leading the charge. These guys are at gamers. So all the young gamers out there am are saying they're gonna challenge you. At which version? >>Modern Warfare one. >>Modern Warfare one. Yes. How do they fire in? Can you set up an >>External We'll >>We'll figure it out. We'll figure it out. Okay. >>Yeah. Just p me on Twitter and We'll, >>We can carry it live actually we can stream that. Yeah, >>That'd be great. >>Great. >>Yeah. So I'll tell you some of our best Hadooop committers and Hadoop developers pitch >>A picture. Modern Warfare >>Three going now Model Warfare three. Very excited about the game. I saw the, the trailers for it looks, graphics look just amazing. Graphics are amazing. I love the Sirius since the first one that came out. And I'm looking forward to getting back home to playing the game. >>I can't play, my son won't let me play. I'm such a fumbler with the Hub. I'm a keyboard controller. I can't work the Xbox controller. Oh, I have a coordination problem my age and I'm just a gluts and like, like Dad, sorry, Charity's over. I can I play with my friends? You the box. But I'm around big gamer. >>But, but in terms of, I mean, something I wanted to bring up is how to link up gaming with big data and analysis and so on. So like, I, I'm a big gamer. I love playing games, but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. So it's like, I mean, yeah, it's fun and I'm getting lots of enjoyment on it makes my life much more cheerful. But still, how can we harness all of this, all of these hours that gamers spend playing a game like Modern Warfare three, How can we, how can we collect instrument, all of the data that's coming from that and coming up, for example, with something useful with predicted. >>This is exactly, this is exactly the kind of application that's mainstream is gaming. Yeah. Yeah. Danny at Riot G is telling me, we saw him at Oracle Open World. He's up there for the Java one. He said that they, they don't really have a big data platform and their business is about understanding user behavior rep tons of data about user playing time, who they're playing with. Yeah, Yeah. How they want us to get into currency trading, You know, >>Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop right now. And, and depending on CDH for doing exactly that kind of thing, creating >>A good user experience >>Today, they're doing it for the purpose of enhancing the user experience and improving retention. So they do track everything. Like every single bullet, you fire everything in best Ball Head, you get everything home run, you do. And, and, and in, in a three >>Type of game consecutive headshot, you get >>Everything, everything is being Yeah. Headshot you get and so on. But, but as you said, they are using that information today to sell more products and, and, and retain their users. Now what I'm suggesting is that how can you harness that energy for the good as well? I mean for making money, money is good and everything, but how can you harness that for doing something useful so that all of this entertainment time is also actually productive time as well. I think that'd be a holy grail in this, in this environment if we >>Can achieve that. Yeah. It used to be that corn used to be the telegraph of the future of about, of applications, but gaming really is, if you look at gaming, you know, you get the headset on. It's a collaborative environment. Oh yeah. You got unified communications. >>Yeah. And you see our teenager kids, how, how many hours they spend on these things. >>You got play as a play environments, very social collaborative. Yeah. You know, some say, you know, we we're saying, what I'm saying is that that's the, that's the future work environment with Skype evolving. We're our multiplayer game's called our job. Right? Yeah. You know, so I'm big on gaming. So all the gamers out there, a has challenged you. Yeah. Got a big data example. What else are we seeing? So let's talk about the, the software. So we, one of the things you were talking about that I really liked, you were going down the list. So on Mike's slide he had all the new features. So around the core, can you just go down the core and rattle off your version of what, what it means and what it is. So you start off with say H Base, we talked about that already. What are the other ones that are out there? >>So the projects that we have right there, >>The projects that are around those tools that are being built. Cause >>Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use for processing. Yeah. And then the, the immediate layer above that is how to make MAP reduce easier for the masses. So how can, not everybody knows how to learn map, use Java, everybody knows sql, right? So, so one of the most successful projects right now that has the highest attach rate, meaning people usually when they install had do installed as well is Hive. So Hive takes sequel and so Jeff Harm Becker, my co-founder, when he was at Facebook, his team built the Hive system. Essentially Hive takes sql so you don't have to learn a new language, you already know sql. And then converts that into MAP use for you. That not only expands the developer base for how many people can use adu, but also makes it easier to integrate Hadoop through all DBC and JDBC integrated with BI tools like MicroStrategy and Tableau and Informatica, et cetera, et cetera. >>You mentioned R too. You mentioned R Program R >>As well. Yeah, R is one of our best partnerships. We're very, very happy with them. So that's, that's one of the very key projects is Hive assisted project to Hive ISS called Pig. A pig Latin is a language that ya invented that you have to learn the language. It's very easy, it's very easy to learn compared to map produce. But once you learn it, you can, you can specify very deep data pipelines, right? SQL is good for queries. It's not good for data pipelines because it becomes very convoluted. It becomes very hard for the, the human brain to understand it. So Pig is much more natural to the human. It's more like Pearl very similar to scripting kind of languages. So with Peggy can write very, very long data pipelines, again, very successful projects doing very, very well. Another key project is Edge Base, like you said. So Edge Base allows you to do low latencies. So you can do very, very quick lookups and also allows you to do transactions. So you can do updates in inserts and deletes. So one of the talks here that had World we try to recommend people watch when the videos come out is the Talk by Jonathan Gray from Facebook. And he talked about how they use Edge Base, >>Jonathan, something on here in the Cube later. Yeah. So >>Drill him on that. So they use Edge Base now for many, many things within Facebook. They have a big team now committed to building an improving edge base with us and with the community at large. And they're using it for doing their online messaging system. The live mail system in Facebook is powered by Edge Base right now. Again, Pro and eBay, The Casini project, they gave a keynote earlier today at the conference as well is using Edge Base as well. So Edge Base is definitely one of the projects that's growing very, very quickly right now within the Hudu system. Another key project that Jeff alluded to earlier when he was on here is Flum. So Flume is very instrumental because you have this nice system had, but Hadoop is useless unless you have data inside it. So how do you get the data inside do? >>So Flum essentially is this very nice framework for having these agents all over your infrastructure, inside your web servers, inside your application servers, inside your mobile devices, your network equipment that collects all of that data and then reliably and, and materializes it inside Hado. So Flum does that. Another good project is Uzi, so many of them, I dunno how, how long you want me to keep going here, But, but Uzi is great. Uzi is a workflow processing system. So Uzi allows you to define a series of jobs. Some of them in Pig, some of them in Hive, some of them in map use. You can define a series of them and then link them to each other and say, only start this job when these other jobs, two jobs finish because I'm waiting for the input from them before I can kick off and so on. >>So Uzi is a very nice framework that will will do that. We'll manage the whole graph of jobs for you and retry things when they fail, et cetera, et cetera. Another good project is where W H I R R and where allows you to very easily start ADU cluster on top of Amazon. Easy two on top of Rackspace, virtualized environ. It's more for kicking off, it's for kicking off Hadoop instances or edge based instances on any virtual infrastructure. Okay. VMware, vCloud. So that it supports all of the major vCloud, sorry, all of the me, all of the major virtualized infrastructure systems out there, Eucalyptus as well, and so on. So that's where W H I R R ARU is another key project. It's one, it's duck cutting's main kind of project right now. Don of that gut cutting came on stage with you guys has, So Aru ARO is a project about how do we encode with our files, the schema of these files, right? >>Because when you open up a text file and you don't know how to what the columns mean and how to pars it, it becomes very hard to work for it. So ARU allows you to do that much more easily. It's also useful for doing rrp. We call rtc remove procedure calls for having different services talk to each other. ARO is very useful for that as well. And the list keeps going on and on Maha. Yeah. Which we just, thanks for me for reminding me of my house. We just added Maha very recently actually. What is that >>Adam? I'm not >>Familiar with it. So Maha is a data mining library. So MAHA takes some of the most popular data mining algorithms for doing clustering and regression and statistical modeling and implements them using the map map with use model. >>They have, they have machine learning in it too or Yes, yes. So that's the machine learning. >>So, So yes. Stay vector to machines and so on. >>What Scoop? >>So Scoop, you know, all of them. Thanks for feeding me all the names. >>The ones I don't understand, >>But there's so many of them, right? I can't even remember all of them. So Scoop actually is a very interesting project, is short for SQL to Hadoop, hence the name Scoop, right? So SQ from SQL and Oops from Hadoop and also means Scoop as in scooping up stuff when you scoop up ice cream. Yeah. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata and it is a vertical and so on and Hadoop. So you can very simply say, Scoop the name of the table inside the relation system, the name of the file inside Hadoop. And the, the table will be copied over to the file and Vice and Versa can say Scoop the name of the file in Hadoop, the name of the table over there, it'll move the table over there. So it's a connectivity tool between the relational world and the Hadoop world. >>Great, great tutorial. >>And all of these are Apache projects. They're all projects built. >>It's not part of your, your unique proprietary. >>Yes. But >>These are things that you've been contributing >>To, We're contributing to the whole ecosystem. Yes. >>And you understand very well. Yes. And >>And contribute to your knowledge of the marketplace >>And Absolutely. We collaborate with the, with the community on creating these projects. We employ committers and founders for many of these projects. Like Duck Cutting, the founder of He works in Cloudera, the founder for that UIE project. He works at Calera for zookeeper works at Calera. So we have a number of them on stuff >>Work. So we had Aroon from Horton Works. Yes. And and it was really good because I tell you, I walk away from that conversation and I gotta say for the folks out there, there really isn't a war going on in Apache. There isn't. And >>Apache, there isn't. I mean isn't but would be honest. Like, and in the developer community, we are friends, we're working together. We want to achieve the, there's >>No war. It's all Kumbaya. Everyone understands the rising tide floats, all boats are all playing nice in the same box. Yes. It's just a competitive landscape in Horton. Works >>In the business, >>Business business, competitive business, PR and >>Pr. We're trying to be friendly, as friendly as we can. >>Yeah, no, I mean they're, they're, they're hying it up. But he was like, he was cool. Like, Hey, you know, we know each other. Yes. We all know each other and we're just gonna offer free Yes. And charge with support. And so are they. And that's okay. And they got other things going on. Yes. But he brought up the question. He said they're, they're launching a management console. So I said, Tyler's got a significant lead. He kind of didn't really answer the question. So the question is, that's your core bread and butter, That's your yes >>And no. Yes and no. I mean if you look at, if you look at Cloudera Enterprise, and I mentioned this earlier and when we talked in the morning, it has two main things in it. Cloudera Enterprise has the management suite, but it also has the, the the the support and maintenance that we provide to our customers and all the experience that we have in our team part That subscription. Yes. For a description. And I, I wanna stress the point that the fact that I built a sports car doesn't mean that I'm good at running that sports car. The driver of the car usually is much better at driving the car than the guy who built the car, right? So yes, we have many people on staff that are helping build had, but we have many more people on stuff that helped run Hado at large scale, at at financial indu, financial industry, retail industry, telecom industry, media industry, health industry, et cetera, et cetera. So that's very, very important for our customer. All that experience that we bring in on how to run the system technically Yeah. Within these verticals. >>But their strategies clear. We're gonna create an open source project within Apache for a management consult. Yes. And we sell support too. Yes. So there'll be a free alternative to management. >>So we have to see, But I mean we look at the product, I mean our products, >>It's gotta come down to product differentiation. >>Our product has been in the market for two years, so they just started building their products. It's >>Alpha, It's just Alpha. The >>Product is Alpha in Alpha right now. Yeah. Okay. >>Well the Apache products, it is >>Apache, right? Yeah. The Apache project is out. So we'll see how it does it compare to ours. But I think ours is way, way ahead of anything else out there. Yeah. Essentially people to try that for themselves and >>See essentially, John, when I asked Arro why does the world need Hortonwork? You know, eventually the answer we got was, well it's free. It needs to be more open. Had needs to be more open. >>No, there's, >>It's going to be, That's not really the reason why Warton >>Works. >>No, they want, they want to go make money. >>Exactly. We wasn't >>Gonna say them you >>When I kept pushing and pushing and that's ultimately the closest we can get cuz you >>Just listens. Not gonna >>12 open source projects. Yes. >>I >>Mean, yeah, yeah. You can't get much more open. Yeah. Look >>At management >>Consult, but Airs not shooting on all those. I mean, I mean not only we are No, no, not >>No, no, we absolutely >>Are. No, you are contributing. You're not. But that's not all your projects. There's other people >>Involved. Yeah, we didn't start, we didn't start all of these projects. Yeah, that's >>True. You contributing heavily to all of them. >>Yes, we >>Are. And that's clear. Todd Lipkin said that, you know, he contributed his first patch to HPAC in 2008. Yes. So I mean, you go back through the ranks >>Of your people and Todd now is a committer on Edge base is a committer on had itself. So on a number >>Of you clearly the lead and, and you know, and, but >>There is a concern. But we, we've heard it and I wanna just ask you No, no. So there's a concern that if I build processes around a proprietary management console, Yes. I'm gonna end up being locked into that proprietary management CNA all over again. Now this is so far from ca Yes. >>Right. >>But that's a concern that some people have expressed. And, and, and I think one of the reasons why Port Works is getting so much attention. So Yes. >>Talk about that. It's, it's a very good, it's a very good observation to make. Actually, >>There there is two separate things here. There's the platform where all the data sets and then there's this management parcel beside the platform. Now why did we make the management console why the cloud didn't make the management console? Because it makes our job for supporting the customers much more achievable. When a customer calls in and says, We have a problem, help us fix this problem. When they go to our management console, there is a button they click that gives us a dump of the state, of the cluster. And that's what allows us to very quickly debug what's going on. And within minutes tell them you need to do this and you to do that. Yeah. Without that we just can't offer the support services. There's >>Real value there. >>Yes. So, so now a year from, But, but, but you have to keep in mind that the, the underlying platform is completely open source and free CBH is completely a hundred percent open source, a hundred percent free, a hundred percent Apache. So a year from now, when it comes time to renew with us, if the customer is not happy with our management suite is not happy with our support data, they can, they can go to work >>And works. People are afraid >>Of all they can go to ibm. >>The data, you can take the data that >>You don't even need to take the data. You're not gonna move the data. It's the same system, the same software. Every, everything in CDH is Apache. Right? We're not putting anything in cdh, which is not Apache. So a year from now, if you're not happy with our service to you and the value that we're providing, you can switch. There is no lock in. There is no lock. And >>Your, your argument would be the switching costs to >>The only lock in is happiness. The only lock in is which >>Happiness inspection customer delay. Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. We made that statement. We've got some heat for it. Yes. And >>This is sort of at scale though. What the, what the people are saying, they're throwing the tomatoes is saying if this is, again, in theory at scale, the customers are so comfortable with that, the console that they don't switch. Now my argument was >>Yes, but that means they're happy with it. That means they're satisfied and happy >>With it. >>And it's more economical for them than going and hiding people full-time on stuff. Yeah. >>So you're, you're always on check as, as long as the customer doesn't feel like Oracle. >>Yeah. See that's different. Oracle is very, Oracle >>Is like different, right? Yeah. Here it's like Cisco routers, they get nested into the environment, provide value. That's just good competitive product strategy. Yes. If it they're happy. Yeah. It's >>Called open washing with >>Oracle, >>I mean our number one core attribute on the company, the number one value for us is customer satisfaction. Keeping our people Yeah. Our customers happy with the service that we provide. >>So differentiate in the product. Yes. Keep the commanding lead. That's the strategist. That's the, that's what's happening. That's your goal. Yes. >>That's what's happening. >>Absolutely. Okay. Co-founder of Cloudera, Always a pleasure to have you on the cube. We really appreciate all the hospitality over the beer and a half. And wanna personally thank you for letting us sit in your office and we'll miss you >>And we'll miss you too. We'll >>See you at the, the Cube events off Swing by, thanks for coming on the cube and great to see you and congratulations on all your success. >>Thank >>You. And thanks for the review on Modern Warfare three. Yeah, yeah. >>Love me again. If there any gaming stuff, you know, I.

Published Date : May 1 2012

SUMMARY :

Yeah, I'm Aala, They're the co-founder back to back. Yeah. So I kind of pick that up where we left off with you around, you know, he was really excited. So a couple more years. takes long for production to take place. But the consumerization trend is really changing that. So right now, the fact that you can buy a single server and it It's very easy for people to actually start a, a big data Those are the hard parts. I mean, It's like, we know when you have a headache and you're On money and SAP is talking the same thing and said they're going to the lines of business. the former one meaning, meaning that yes, line of business and departments, they adopt the technology and What are you seeing out there? So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and Yes. I mean, you know, you think client server, there was a lot of resistance from for the right problem at the right time. Do. So Amar, I need to just change gears here a minute. of the current version, if any, if you played it. I don't have my Xbox with me here. And I challenge, I challenge people out there to come challenge our team. So all the young gamers out there am are saying they're gonna challenge you. Can you set up an We'll figure it out. We can carry it live actually we can stream that. Modern Warfare I love the Sirius since the first one that came out. You the box. but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. Danny at Riot G is telling me, we saw him at Oracle Open World. Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop So they do Now what I'm suggesting is that how can you harness that energy for the good as well? but gaming really is, if you look at gaming, you know, you get the headset on. So around the core, can you just go down the core and rattle off your version of what, The projects that are around those tools that are being built. Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use You mentioned R too. So one of the talks here that had World we Jonathan, something on here in the Cube later. So Edge Base is definitely one of the projects that's growing very, very quickly right now So Uzi allows you to define a series of So that it supports all of the major vCloud, So ARU allows you to do that much more easily. So MAHA takes some of the most popular data mining So that's the machine learning. So, So yes. So Scoop, you know, all of them. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata And all of these are Apache projects. To, We're contributing to the whole ecosystem. And you understand very well. So we have a number of them on And and it was really good because I tell you, Like, and in the developer community, It's all Kumbaya. So the question is, the experience that we have in our team part That subscription. So there'll be a free alternative to management. Our product has been in the market for two years, so they just started building their products. Alpha, It's just Alpha. Product is Alpha in Alpha right now. So we'll see how it does it compare to ours. You know, eventually the answer We wasn't Not gonna Yes. Yeah. I mean, I mean not only we are No, But that's not all your projects. Yeah, we didn't start, we didn't start all of these projects. So I mean, you go back through the ranks So on a number But we, we've heard it and I wanna just ask you No, no. So there's a concern that So Yes. It's, it's a very good, it's a very good observation to make. And within minutes tell them you need to do this and you to do that. So a year from now, when it comes time to renew with us, if the customer is And works. It's the same system, the same software. The only lock in is which Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. the console that they don't switch. Yes, but that means they're happy with it. And it's more economical for them than going and hiding people full-time on stuff. Oracle is very, Oracle Yeah. I mean our number one core attribute on the company, the number one value for us is customer satisfaction. So differentiate in the product. And wanna personally thank you for letting us sit in your office and we'll miss you And we'll miss you too. you and congratulations on all your success. Yeah, yeah. If there any gaming stuff, you know, I.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Hiba	PERSON	0.99+
Todd Lipkin	PERSON	0.99+
2008	DATE	0.99+
Cisco	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
John	PERSON	0.99+
Mike	PERSON	0.99+
Modern Warfare three	TITLE	0.99+
Apache	ORGANIZATION	0.99+
Danny	PERSON	0.99+
Jonathan Gray	PERSON	0.99+
Jeff Haer Baer	PERSON	0.99+
15	QUANTITY	0.99+
two years	QUANTITY	0.99+
Calera	ORGANIZATION	0.99+
Modern Warfare	TITLE	0.99+
16 cores	QUANTITY	0.99+
Jeff Harm Becker	PERSON	0.99+
Todd	PERSON	0.99+
eight cores	QUANTITY	0.99+
Jonathan	PERSON	0.99+
both	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Java	TITLE	0.99+
next year	DATE	0.99+
Skype	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
Vegas	LOCATION	0.99+
Michaels	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
hundred percent	QUANTITY	0.99+
35,000	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Today	DATE	0.99+
Peggy	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Horton	LOCATION	0.99+
12 hards	QUANTITY	0.99+
Each	QUANTITY	0.99+
vCloud	TITLE	0.99+
HPAC	ORGANIZATION	0.99+
Aala	PERSON	0.99+
Adam	PERSON	0.99+
Tyler	PERSON	0.98+
UIE	ORGANIZATION	0.98+
Hadoop World	TITLE	0.98+
first one	QUANTITY	0.98+
12 open source projects	QUANTITY	0.98+
Edge Base	TITLE	0.98+
W H I R R	TITLE	0.98+
five	QUANTITY	0.98+
Hammerer	PERSON	0.98+
Xbox	COMMERCIAL_ITEM	0.98+
Port Works	ORGANIZATION	0.98+
Hive	TITLE	0.98+
Amar	PERSON	0.98+
five different departments	QUANTITY	0.98+
today	DATE	0.98+
Christmas	EVENT	0.98+
SQL	TITLE	0.97+
Silicon angle dot TV	ORGANIZATION	0.97+
Tableau	TITLE	0.97+
two	QUANTITY	0.97+
W H I R R	TITLE	0.97+

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
yahoo	ORGANIZATION	0.99+
Mike Charles	PERSON	0.99+
san Francisco	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Aaron T. Myers	PERSON	0.99+
University of Washington	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
facebook	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
richard Kimble	PERSON	0.99+
Michaelson	PERSON	0.99+
two interns	QUANTITY	0.99+
Oregon	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Todd	PERSON	0.99+
Claudia	PERSON	0.99+
AMC	ORGANIZATION	0.99+
five weeks ago	DATE	0.99+
Northeastern University	ORGANIZATION	0.99+
monday	DATE	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
Dave	PERSON	0.99+
TMC	ORGANIZATION	0.99+
ralph kimball	PERSON	0.99+
burlingame	LOCATION	0.99+
Ferrari	ORGANIZATION	0.98+
today	DATE	0.98+
five	QUANTITY	0.98+
Brown	ORGANIZATION	0.98+
thirties	QUANTITY	0.98+
one	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
Apache	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
erin	PERSON	0.98+
google	ORGANIZATION	0.97+
One	QUANTITY	0.97+
twitter	ORGANIZATION	0.97+
Brown	PERSON	0.97+
a year ago	DATE	0.97+
Salesforce	ORGANIZATION	0.97+
john furry	PERSON	0.96+
one big computer	QUANTITY	0.95+
new york city	LOCATION	0.95+
Mendel	PERSON	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Pig: