Image Title

Search Results for Mike Charles:

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop


 

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Mike OlsonPERSON

0.99+

yahooORGANIZATION

0.99+

Mike CharlesPERSON

0.99+

san FranciscoLOCATION

0.99+

Palo AltoLOCATION

0.99+

YahooORGANIZATION

0.99+

AaronPERSON

0.99+

Aaron T. MyersPERSON

0.99+

University of WashingtonORGANIZATION

0.99+

HadoopTITLE

0.99+

facebookORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

richard KimblePERSON

0.99+

MichaelsonPERSON

0.99+

two internsQUANTITY

0.99+

OregonLOCATION

0.99+

GoogleORGANIZATION

0.99+

ToddPERSON

0.99+

ClaudiaPERSON

0.99+

AMCORGANIZATION

0.99+

five weeks agoDATE

0.99+

Northeastern UniversityORGANIZATION

0.99+

mondayDATE

0.99+

first timeQUANTITY

0.99+

bothQUANTITY

0.99+

DavePERSON

0.99+

TMCORGANIZATION

0.99+

ralph kimballPERSON

0.99+

burlingameLOCATION

0.99+

FerrariORGANIZATION

0.98+

todayDATE

0.98+

fiveQUANTITY

0.98+

BrownORGANIZATION

0.98+

thirtiesQUANTITY

0.98+

oneQUANTITY

0.98+

HortonORGANIZATION

0.98+

ApacheORGANIZATION

0.98+

HadoopORGANIZATION

0.98+

erinPERSON

0.98+

googleORGANIZATION

0.97+

OneQUANTITY

0.97+

twitterORGANIZATION

0.97+

BrownPERSON

0.97+

a year agoDATE

0.97+

SalesforceORGANIZATION

0.97+

john furryPERSON

0.96+

one big computerQUANTITY

0.95+

new york cityLOCATION

0.95+

MendelPERSON

0.94+