Yuvi Kochar, GameStop | Mayfield People First Network

>> Announcer: From Sand Hill Road in the heart of Silicon Valley, it's theCUBE, presenting the People First Network, insights from entrepreneurs and tech leaders. (bright electronic music) >> Everyone, welcome to this special CUBE conversation. We're here at Sand Hill Road at Mayfield Fund. This is theCUBE, co-creation of the People First Network content series. I'm John Furrier, host of theCUBE. Our next guest, Yuvi Kochar, who's the Data-centric Digital Transformation Strategist at GameStop. Variety of stints in the industry, going in cutting-edge problems around data, Washington Post, comScore, among others. You've got your own practice. From Washington, DC, thanks for joining us. >> Thank you, thanks for hosting me. >> This is a awesome conversation. We were just talking before we came on camera about data and the roles you've had over your career have been very interesting, and this seems to be the theme for some of the innovators that I've been interviewing and were on the People First is they see an advantage with technology, and they help companies, they grow companies, and they assist. You did a lot of different things, most notably that I recognized was the Washington Post, which is on the mainstream conversations now as a rebooted media company with a storied, historic experience from the Graham family. Jeff Bezos purchased them for a song, with my opinion, and now growing still, with the monetization, with subscriber base growing. I think they're number one in subscribers, I don't believe, I believe so. Interesting time for media and data. You've been there for what, how many years were you at the Washington Post? >> I spent about 13 years in the corporate office. So the Washington Post company was a conglomerate. They'd owned a lot of businesses. Not very well known to have owned Kaplan, education company. We owned Slate, we owned Newsweek, we owned TV stations and now they're into buying all kinds of stuff. So I was involved with a lot of varied businesses, but obviously, we were in the same building with the Washington Post, and I had front row seat to see the digital transformation of the media industry. >> John: Yeah, we-- >> And how we responded. >> Yeah, I want to dig into that because I think that illustrates kind of a lot what's happening now, we're seeing with cloud computing. Obviously, Cloud 1.0 and the rise of Amazon public cloud. Clearly, check, done that, a lot of companies, startups go there. Why would you provision a data center? You're a startup, you're crazy, but at some point, you can have a data center. Now, hybrid cloud's important. Devops, the application development market, building your own stack, is shifting now. It seems like the old days, but upside down. It's flipped around, where applications are in charge, data's critical for the application, infrastructure's now elastic. Unlike the old days of here's your infrastructure. You're limited to what you can run on it based on the infrastructure. >> Right. >> What's your thoughts on that? >> My thoughts are that, I'm a very, as my title suggests, data-centric person. So I think about everything data first. We were in a time when cloud-first is becoming old, and we are now moving into data-first because what's happening in the marketplace is the ability, the capability, of data analytics has reached a point where prediction, in any aspect of a business, has become really inexpensive. So empowering employees with prediction machines, whether you call them bots, or you call them analytics, or you call them machine learning, or AI, has become really inexpensive, and so I'm thinking more of applications, which are built data-out instead of data-in, which is you build process and you capture data, and then you decide, oh, maybe I should build some reporting. That's what we used to do. Now, you need to start with what's the data I have got? What's the data I need? What's the data I can get? We were just talking about, everybody needs a data monetization strategy. People don't realize how much asset is sitting in their data and where to monetize it and how to use it. >> It's interesting. I mean, I got my computer science degree in the 80s and one of the tracks I got a degree in was database, and let's just say that my main one was operating system. Database was kind of the throwaway at that time. It wasn't considered a big field. Database wasn't sexy at all. It was like, database, like. Now, if you're a database, you're a data guru, you're a rock star. The world has changed, but also databases are changing. It used to be one centralized database rules the world. Oracle made a lot of money with that, bought all their competitors. Now you have open source came into the realm, so the world of data is also limited by where the data's stored, how the data is retrieved, how the data moves around the network. This is a new dynamic. How do you look at that because, again, lagging in business has a lot to do with the data, whether it's in an application, that's one thing, but also having data available, not necessarily in real time, but if I'm going to work on something, I want the data set handy, which means I can download it or maybe get real-time. What's your thoughts on data as an element in all that moving around? >> So I think what you're talking about is still data analytics. How do I get insights about my business? How do I make decisions using data in a better way? What flexibility do I need? So you talk about open source, you think about MongoDB and those kind of databases. They give you a lot of flexibility. You can develop interesting insights very quickly, but I think that is still very much thinking about data in an old-school kind of way. I think what's happening now is we're teaching algorithms with data. So data is actually the software, right? So you get an open source algorithm. I mean Google and everybody else is happy to open source their algorithms. They're all available for free. But what, the asset is now the data, which means how you train your algorithm with your data, and then now, moving towards deploying it on the edge, which is you take an algorithm, you train it, then you deploy it on the edge in an IoT kind of environment, and now you're doing decision-making, whether it's self-driving cars, I mean those are great examples, but I think it's going down into very interesting spaces in enterprise, which is, so we have to all think about software differently because, actually, data is a software. >> That's an interesting take on it, and I love that. I mean I wrote a blog post in 2007 when we first started playing with the, in looking at the network effects on social media and those platforms was, I wrote a post, it was called Data is the New Development Kit. Development kit was what people did back then. They had a development kit and they would download stuff and then code, but the idea was is that data has to be part of the runtime and the compilation of, as software acts, data needs to be resident, not just here's a database, access it, pull it out, use it, present it, where data is much more of a key ingredient into the development. Is that kind of what you're getting at? >> Yes. >> Notion of-- >> And I think we're moving from the age of arithmetic-based machines, which is we put arithmetic onto chips, and we then made general-purpose chips, which were used to solve a huge amount of problems in the world. We're talking about, now, prediction machines on a chip, so you think about algorithms that are trained using data, which are going to be available on chips. And now you can do very interesting algorithmic work right on the edge devices, and so I think a lot of businesses, and I've seen that recently at GameStop, I think business leaders have a hard time understanding the change because we have moved from process-centric, process automation, how can I do it better? How can I be more productive? How can I make better decisions? We have trained our business partners on that kind of thinking, and now we are starting to say, no, no, no, we've got something that's going to help you make those decisions. >> It's interesting, you mentioned GameStop. Obviously, well-known, my sons are all gamers. I used to be a gamer back before I had kids, but then, can't keep up anymore. Got to be on that for so long, but GameStop was a retail giant in gaming. Okay, when they had physical displays, but now, with online, they're under pressure, and I had interviewed, again, at an Amazon event, this Best Buy CIO, and he says, "We don't compete with price anymore. "If they want to buy from Amazon, no problem, "but our store traffic is off the charts. "We personalize 50,000 emails a day." So personalization became their strategy, it was a data strategy. This is a user experience, not a purchase decision. Is this how you guys are thinking about it at GameStop? >> I think retail, if you look at the segment per se, personalization, Amazon obviously led the way, but it's obvious that personalization is key to attract the customer. If I don't know what games you play, or if I don't know what video you watched a little while ago, about which game, then I'm not offering you the product that you are most prone or are looking for or what you want to buy, and I think that's why personalization is key. I think that's-- >> John: And data drives that, and data drives that. >> Data drives that, and for personalization, if you look at retail, there's customer information. You need to know the customer. You need to know, understand the customer preferences, but then there's the product, and you need to marry the two. And that's where personalization comes into play. >> So I'll get your thoughts. You have, obviously, a great perspective on how tech has been built and now working on some real cutting-edge, clear view on what the future looks like. Totally agree with you, by the way, on the data. There's kind of an old guard/new guard, kind of two sides of the street, the winners and the losers, but hey, look, I think the old guard, if they don't innovate and become fresh and new and adopt the modern things that need to attract the new expectations and new experiences from their customers, are going to die. That being said, what is the success formula, because some people might say, hey, I'm data-driven. I'm doing it, look at me, I'm data. Well, not really. Well, how do you tell if someone's really data-driven or data-centric? What's the difference? Is there a tell sign? >> I think when you say the old guard, you're talking about companies that have large assets, that have been very successful in a business model that maybe they even innovated, like GameStop came up with pre-owned games, and for the longest of times, we've made huge amount of revenue and profit from that segment of our business. So yes, that's becoming old now, but I think the most important thing for large enterprises at least, to battle the incumbent, the new upstarts, is to develop strategies which are leveraging the new technologies, but are building on their existing capability, and that's what I drive at GameStop. >> And also the startups too, that they were here in a venture capital firm, we're at Mayfield Fund, doing this program, startups want to come and take a big market down, or come in on a narrow entry and get a position and then eat away at an incumbent. They could do it fast if they're data-centric. >> And I think it's speed is what you're talking about. I think the biggest challenge large companies have is an ability to to play the field at the speed of the new upstarts and the firms that Mayfield and others are investing in. That's the big challenge because you see this, you see an opportunity, but you're, and I saw that at the Washington Post. Everybody went to meetings and said, yes, we need to be digital, but they went-- >> They were talking. >> They went back to their desk and they had to print a paper, and so yes, so we'll be digital tomorrow, and that's very hard because, finally, the paper had to come out. >> Let's take us through the journey. You were the CTO, VP of Technology, Graham Holdings, Washington Post, they sold it to Jeff Bezos, well-documented, historic moment, but what a storied company, Washington Post, local paper, was the movie about it, all the historic things they've done from a reporting and journalism standpoint. We admire that. Then they hit, the media business starts changing, gets bloated, not making any money, online classifieds are dying, search engine marketing is growing, they have to adjust. You were there. What was the big, take us through that journey. >> I think the transformation was occurring really fast. The new opportunities were coming up fast. We were one of the first companies to set up a website, but we were not allowed to use the brand on the website because there was a lot of concern in the newsroom that we are going to use or put the brand on this misunderstood, nearly misunderstood opportunity. So I think it started there, and then-- >> John: This is classic old guard mentality. >> Yes, and it continued down because people had seen downturns. It's not like media companies hadn't been through downturns. They had, because the market crashes and we have a recession and there's a downturn, but it always came back because-- >> But this was a wave. I mean the thing is, downturns are economic and there's business that happens there, advertisers, consumption changes. This was a shift in their user base based upon a technology wave, and they didn't see it coming. >> And they hadn't ever experienced it. So they were experiencing it as it was happening, and I think it's very hard to respond to a transformation of that kind in a very old-- >> As a leader, how did you handle that? Give us an example of what you did, how you make your mark, how do you get them to move? What were some of the things that were notable moments? >> I think the main thing that happened there was that we spun out washingtonpost.com. So it became an independent business. It was actually running across the river. It moved out of the corporate offices. It went to a separate place. >> The renegades. >> And they were given-- >> John: Like Steve Jobs and the Macintosh team, they go into separate building. >> And we were given, I was the CTO of the dotcom for some time while we were turning over our CTO there, and we were given a lot of flexibility. We were not held accountable to the same level. We used the, obviously, we used-- >> John: You were running fast and loose. >> And we were, yes, we had a lot of flexibility and we were doing things differently. We were giving away the content in some way. On the online side, there was no pay wall. We started with a pay wall, but advertising kind of was so much more lucrative in the beginning, that the pay wall was shut down, and so I think we experimented a lot, and I think where we missed, and a lot of large companies miss, is that you need to leave your existing business behind and scale your new business, and I think that's very hard to do, which is, okay, we're going to, it's happening at GameStop. We're no longer completely have a control of the market where we are the primary source of where, you talk about your kids, where they go to get their games. They can get the games online and I think-- >> It's interesting, people are afraid to let go because they're so used to operating their business, and now it has to pivot to a new operating model and grow. Two different dynamics, growth, operation, operating and growing. Not all managers have that growth mindset. >> And I think there's also an experience thing. So most people who are in these businesses, who've been running these businesses very successfully, have not been watching what's happening in technology. And so the technology team comes out and says, look, let me show you what we can do. I think there has to be this open and very, very candid discussion around how we are going to transform-- >> How would you talk about your peer, developed peers out there, your peers and other CIOs, and even CISOs on the security side, have been dealing with the same suppliers over, and in fact, on the security side, the supplier base is getting larger. There's more tools coming out. I mean who wants another tool? So platform, tool, these are big decisions being made around companies, that if you want to be data-centric, you want to be a data-centric model, you got to understand platforms, not just buying tools. If you buy a hammer, they will look like a nail, and you have so many hammers, what version, so platform discussions come in. What's your thoughts on this? Because this is a cutting-edge topic we've been talking about with a lot of senior engineering leaders around Platform 2.0 coming, not like a classic platform to... >> Right, I think that each organization has to leverage or build their, our stack on top of commodity platforms. You talked about AWS or Azure or whatever cloud you use, and you take all their platform capability and services that they offer, but then on top of that, you structure your own platform with your vertical capabilities, which become your differentiators, which is what you take to market. You enable those for all your product lines, so that now you are building capability, which is a layer on top of, and the commodity platforms will continue to bite into your platform because they will start offering capabilities that earlier, I remember, I started at this company called BrassRing, recruitment automation. One of the first software-as-a-service companies, and I, we bought a little company, and the CTO there had built a web server. It was called, it was his name, it was called Barrett's Engine. (chuckles) And so-- >> Probably Apache with something built around it. >> So, in those days, we used to build our own web servers. But now today, you can't even find an engineer who will build a web server. >> I mean the web stack and these notions of just simple Web 1.0 building blocks of change. We've been calling it Cloud 2.0, and I want to get your thoughts on this because one of the things I've been riffing on lately is this, I remember Marc Andreessen wrote the famous article in Wall Street Journal, Software is Eating the World, which I agree with in general, no debate there, but also the 10x Engineer, you go into any forum online, talking about 10x Engineers, you get five different opinions, meaning, a 10x Engineer's an engineer who can do 10 times more work than an old school, old classical engineer. I bring this up because the notion of full stack developer used to be a real premium, but what you're talking about here with cloud is a horizontally scalable commodity layer with differentiation at the application level. That's not full stack, that's half stack. So you think the world's kind of changing. If you're going to be data-centric, the control plane is data. The software that's domain-specific is on top. That's what you're essentially letting out. >> That's what I'm talking about, but I think that also, what I'm beginning to find, and we've been working on a couple of projects, is you put the data scientists in the same room with engineers who write code, write software, and it's fascinating to see them communicate and collaborate. They do not talk the same language at all. >> John: What's it like? Give us a mental picture. >> So a data scientist-- >> Are they throwing rocks at each other? >> Well, nearly, because the data scientists come from the math side of the house. They're very math-oriented, they're very algorithm-oriented. Mathematical algorithms, whereas software engineers are much more logic-oriented, and they're thinking about scalability and a whole lot of other things, and if you think about, a data scientist develops an algorithm, it rarely scales. You have to actually then hand it to an engineer to rewrite it in a scalable form. >> I want to ask you a question on that. This is why I got you and you're an awesome guest. Thanks for your insights here, and we'll take a detour into machine learning. Machine learning really is what AI is about. AI is really nothing more than just, I love AI, it gets people excited about computer science, which is great. I mean my kids talk about AI, they don't talk about IoT, which is good that AI does that, but it's really machine learning. So there's two schools of thought on machine. I call it the Berkeley school on one end, not Berkeley per se but Berkeley talks about math, machine learning, math, math, math, and then you have other schools of thought that are on cognition, that machine learning should be more cognitive, less math-driven, spectrum of full math, full cognition, and everything in between. What's your thoughts on the relationship between math and cognition? >> Yeah, so it's interesting. You get gray hair and you kind of move up the stack, and I'm much more business-focused. These are tools. You can get passionate about either school of thought, but I think that what that does is you lose sight of what the business needs, and I think it's most important to start with what are we here trying to do, and what is the best tool? What is the approach that we should utilize to meet that need? Like the other day, we were looking at product data from GameStop, and we know that the quality of data should be better, but we found a simple algorithm that we could utilize to create product affinity. Now whether it's cognition or math, it doesn't matter. >> John: The outcome's the outcome. >> The outcome is the outcome, and so-- >> They're not mutually exclusive, and that's a good conversation debate but it really gets to your point of does it really matter as long as it's accurate and the data drives that, and this is where I think data is interesting. If you look at folks who are thinking about data, back to the cloud as an example, it's only good as what you can get access to, and cybersecurity, the transparency issue around sharing data becomes a big thing. Having access to the data's super important. How do you view that for, as CIOs, and start to think about they're re-architecting their organizations for these digital transformations. Is there a school of thought there? >> Yes, so I think data is now getting consolidated. For the longest time, we were building data warehouses, departmental data warehouses. You can go do your own analytics and just take your data and add whatever else you want to do, and so the part of data that's interesting to you becomes much more clean, much more reliable, but the rest, you don't care much about. I think given the new technologies that are available and the opportunity of the data, data is coming back together, and it's being put into a single place. >> (mumbles) Well, that's certainly a honeypot for a hacker, but we'll get that in a second. If you and I were doing a startup, we say, hey, let's, we've got a great idea, we're going to build something. How would we want to think about the data in terms of having data be a competitive advantage, being native into the architecture of the system. I'll say we use cloud unless we need some scale on premise for privacy reasons or whatever, but we would, how would we go to market, and we have an app, as apps defined, great use case, but I want to have extensibility around the data, I don't want to foreclose any future options, How should I think about my, how should we think about our data strategy? >> Yes, so there was a very interesting conversation I had just a month ago with a friend of mine who's working at a startup in New York, and they're going to build a solution, take it to market, and he said, "I want to try it only in a small market "and learn from it," and he's going very old school, focus groups, analytics, analysis, and I sat down, we sat at Grand Central Station, and we talked about how, today, he should be thinking about capturing the data and letting the data tell him what's working and what's not working, instead of trying to find focus groups and find very small data points to make big decisions. He should actually utilize the target, the POC market, to capture data and get ready for scale because if you want to go national after having run a test in... >> Des Moines, Iowa. >> Part of New York or wherever, then you need to already have built the data capability to scale that business in today's-- >> John: Is it a SaaS business? >> No, it's a service and-- >> So he can instrument it, just watch the data. >> And yes, but he's not thinking like that because most business people are still thinking the old way, and if you look at Uber and others, they have gone global at such a rapid pace because they're very data-centric, and they scale with data, and they don't scale with just let's go to that market and then let's try-- >> Yeah, ship often, get the data, then think of it as part of the life cycle of development. Don't think it as the old school, craft, launch it, and then see how it goes and watch it fail or succeed, and know six months later what happened, know immediately. >> And if you go data-centric, then you can turn the R&D crank really fast. Learn, test and learn, test and learn, test and learn at a very rapid pace. That changes the game, and I think people are beginning to realize that data needs to be thought about as the application and the service is being developed, because the data will help scale the service really fast. >> Data comes into applications. I love your line of data is the new software. That's better than the new oil, which has been said before, but data comes into the app. You also mentioned that app throws off data. >> Yuvi: Yes. >> We know that humans have personal, data exhaust all the time. Facebook made billions of dollars on our exhaust and our data. The role of data in and out of the application, the I/O of the application, is a new concept, you brought that up. I like that and I see that happening. How should we capture that data? This used to be log files. Now you got observability, all kinds of new words kind of coming into this cloud equation. How should people think about this? >> I think that has to be part of the design of your applications, because data is application, and you need to design the application with data in mind, and that needs to be thought of upfront, and not later. >> Yuvi, what's next for you? We're here in Sand Hill Road, VC firm, they're doing a lot of investments, you've got a great project with GameStop, you're advising startups, what's going on in your world? >> Yes, so I'm totally focused, as you probably are beginning to sense, on the opportunity that data is enabling, especially in the enterprise. I'm very interested in helping business understand how to leverage data, because this is another major shift that's occurring in the marketplace. Opportunities have opened up, prediction is becoming cheap and at scale, and I think any business runs on their capability to predict, what is the shirt I should buy? How many I should buy? What color should I buy? I think data is going to drive that prediction at scale. >> This is a legit way that everyone should pay attention to. All businesses, not just one-- >> All businesses, everything, because prediction is becoming cheap and automated and granular. That means you need to be able to not just, you need to empower your people with low-level prediction that comes out of the machines. >> Data is the new software. Yuvi, thanks so much for great insight. This is theCUBE conversation. I'm John Furrier here at Sand Hill Road at the Mayfield Fund, for the People First Network series. Thanks for watching. >> Yuvi: Thank you. (bright electronic music)

Published Date : Sep 11 2019

SUMMARY :

Announcer: From Sand Hill Road in the heart of the People First Network content series. and the roles you've had over your career So the Washington Post company was a conglomerate. Obviously, Cloud 1.0 and the rise of Amazon public cloud. and then you decide, oh, and one of the tracks I got a degree in was database, So data is actually the software, right? of the runtime and the compilation of, as software acts, that's going to help you make those decisions. Is this how you guys are thinking about it at GameStop? I think retail, if you look at the segment per se, but then there's the product, and you need to marry the two. and become fresh and new and adopt the modern things I think when you say the old guard, And also the startups too, that they were here That's the big challenge because you see this, and they had to print a paper, and so yes, Washington Post, they sold it to Jeff Bezos, I think the transformation was occurring really fast. They had, because the market crashes and we have a recession I mean the thing is, downturns are economic and I think it's very hard to respond to a transformation It moved out of the corporate offices. John: Like Steve Jobs and the Macintosh team, and we were given a lot of flexibility. is that you need to leave your existing business behind and now it has to pivot to a new operating model and grow. I think there has to be this open and in fact, on the security side, and you take all their platform capability and services But now today, you can't even find an engineer but also the 10x Engineer, you go into any forum online, and it's fascinating to see them communicate John: What's it like? and if you think about, a data scientist and then you have other schools of thought but I think that what that does is you lose sight as what you can get access to, and cybersecurity, much more reliable, but the rest, you don't care much about. being native into the architecture of the system. and letting the data tell him what's working Yeah, ship often, get the data, then think of it That changes the game, and I think people but data comes into the app. the I/O of the application, is a new concept, and you need to design the application with data in mind, I think data is going to drive that prediction at scale. This is a legit way that everyone should pay attention to. you need to empower your people with low-level prediction Data is the new software. (bright electronic music)

ENTITIES

Entity	Category	Confidence
Marc Andreessen	PERSON	0.99+
Yuvi Kochar	PERSON	0.99+
John	PERSON	0.99+
Jeff Bezos	PERSON	0.99+
GameStop	ORGANIZATION	0.99+
2007	DATE	0.99+
Facebook	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Graham	PERSON	0.99+
New York	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
10 times	QUANTITY	0.99+
Washington Post	ORGANIZATION	0.99+
Yuvi	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Washington, DC	LOCATION	0.99+
Steve Jobs	PERSON	0.99+
Kaplan	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Macintosh	ORGANIZATION	0.99+
two schools	QUANTITY	0.99+
Berkeley	ORGANIZATION	0.99+
Sand Hill Road	LOCATION	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
Mayfield Fund	ORGANIZATION	0.99+
a month ago	DATE	0.99+
Graham Holdings	ORGANIZATION	0.99+
one	QUANTITY	0.98+
People First Network	ORGANIZATION	0.98+
Slate	ORGANIZATION	0.98+
Mayfield	ORGANIZATION	0.98+
comScore	ORGANIZATION	0.98+
six months later	DATE	0.98+
tomorrow	DATE	0.98+
Newsweek	ORGANIZATION	0.98+
Best Buy	ORGANIZATION	0.98+
BrassRing	ORGANIZATION	0.98+
two sides	QUANTITY	0.98+
washingtonpost.com	OTHER	0.97+
50,000 emails a day	QUANTITY	0.97+
about 13 years	QUANTITY	0.97+
MongoDB	TITLE	0.97+
80s	DATE	0.97+
Software is Eating the World	TITLE	0.96+
Apache	ORGANIZATION	0.96+
Des Moines, Iowa	LOCATION	0.96+
dotcom	ORGANIZATION	0.96+
five different opinions	QUANTITY	0.96+
Cloud 1.0	TITLE	0.95+
CUBE	ORGANIZATION	0.95+
one end	QUANTITY	0.95+
Mayfield People First Network	ORGANIZATION	0.94+
Grand Central Station	LOCATION	0.94+
each organization	QUANTITY	0.94+

Jack Norris - Hadoop Summit 2014 - theCUBE - #HadoopSummit

>>The queue at Hadoop summit, 2014 is brought to you by anchor sponsor Hortonworks. We do, I do. And headline sponsor when disco we make Hadoop invincible >>Okay. Welcome back. Everyone live here in Silicon valley in San Jose. This is a dupe summit. This is Silicon angle and Wiki bonds. The cube is our flagship program. We go out to the events and extract the signal to noise. I'm John barrier, the founder SiliconANGLE joins my cohost, Jeff Kelly, top big data analyst in the, in the community. Our next guest, Jack Norris, COO of map R security enterprise. That's the buzz of the show and it was the buzz of OpenStack summit. Another open source show. And here this year, you're just seeing move after, move at the moon, talking about a couple of critical issues. Enterprise grade Hadoop, Hortonworks announced a big acquisition when all in, as they said, and now cloud era follows suit with their news. Today, I, you sitting back saying, they're catching up to you guys. I mean, how do you look at that? I mean, cause you guys have that's the security stuff nailed down. So what Dan, >>You feel about that now? I think I'm, if you look at the kind of Hadoop market, it's definitely moving from a test experimental phase into a production phase. We've got tremendous customers across verticals that are doing some really interesting production use cases. And we recognized very early on that to really meet the needs of customers required some architectural innovation. So combining the open source ecosystem packages with some innovations underneath to really deliver high availability, data protection, disaster recovery features, security is part of that. But if you can't predict the PR protect the data, if you can't have multitenancy and separate workflows across the cluster, then it doesn't matter how secure it is. You know, you need those. >>I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. Silicon lucky bond is so successful, but I just don't understand your business model without plates were free content and they have some underwriters. So you guys have been very successful yet. People aren't looking at map are as good at the quiet leader, like you doing your business, you're making money. Jeff. He had some numbers with us that in the Hindu community, about 20% are paying subscriptions. That's unlike your business model. So explain to the folks out there, the business model and specifically the traction because you have >>Customers. Yeah. Oh no, we've got, we've got over 500 paying customers. We've got at least $1 million customer in seven different verticals. So we've got breadth and depth and our business model is simple. We're an enterprise software company. That's looking at how to provide the best of open source as well as innovations underneath >>The most open distribution of Hadoop. But you add that value separately to that, right? So you're, it's not so much that you're proprietary at all. Right. Okay. >>You clarify that. Right. So if you look at, at this exciting ecosystem, Hadoop is fairly early in its life cycle. If it's a commoditization phase like Linux or, or relational database with my SQL open source, kind of equates the whole technology here at the beginning of this life cycle, early stages of the life cycle. There's some architectural innovations that are really required. If you look at Hadoop, it's an append only file system relying on Linux. And that really limits the types of operations. That types of use cases that you can do. What map ours done is provide some deep architectural innovations, provide complete read-write file systems to integrate data protection with snapshots and mirroring, et cetera. So there's a whole host of capabilities that make it easy to integrate enterprise secure and, and scale much better. Do you think, >>I feel like you were maybe a little early to the market in the sense that we heard Merv Adrian and his keynote this morning. Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about nine years into Hadoop. Do you feel like maybe you guys were a little early and now you're at a tipping point, whereas these more, as more and more deployments get ready to go to production, this is going to be an area that's going to become increasingly important. >>I think, I think our timing has been spectacular because we, we kind of came out at a time when there was some customers that were really serious about Hadoop. We were able to work closely with them and prove our technology. And now as the market is just ramping, we're here with all of those features that they need. And what's a, what's an issue. Is that an incremental improvement to provide those kind of key features is not really possible if the underlying architecture isn't there and it's hard to provide, you know, online real-time capabilities in a underlying platform that's append only. So the, the HDFS layer written in Java, relying on the Linux file system is kind of the, the weak underbelly, if you will, of, of the ecosystem. There's a lot of, a lot of important developments happening yarn on top of it, a lot of really kind of exciting things. So we're actively participating in including Apache drill and on top of a complete read-write file system and integrated Hindu database. It just makes it all come to life. >>Yeah. I mean, those things on top are critical, but you know, it's, it's the underlying infrastructure that, you know, we asked, we keep on community about that. And what's the, what are the things that are really holding you back from Paducah and production and the, and the biggest challenge is they cited worth high availability, backup, and recovery and maintaining performance at scale. Those are the top three and that's kind of where Matt BARR has been focused, you know, since day one. >>So if you look at a major retailer, 2000 nodes and map bar 50 unique applications running on a single cluster on 10,000 jobs a day running on top of that, if you look at the Rubicon project, they recently went public a hundred million add actions, a hundred billion ad auctions a day. And on top of that platform, beats music that just got acquired for $3 billion. Basically it's the underlying map, our engine that allowed them to scale and personalize that music service. So there's a, there's a lot of proof points in terms of how quickly we scale the enterprise grade features that we provide and kind of the blending of deep predictive analytics in a batch environment with online capabilities. >>So I got to ask you about your go to market. I'll see Cloudera and Hortonworks have different business models. Just talk about that, but Cloudera got the massive funding. So you get this question all the time. What do you, how do you counter that army and the arms race? I think >>I just wrote an article in Forbes and he says cash is not a strategy. And I think that was, that was an excellent, excellent article. And he goes in and, you know, in this fast growing market, you know, an amount of money isn't necessarily translate to architectural innovations or speeding the development of that. This is a fairly fragmented ecosystem in terms of the stack that runs on top of it. There's no single application or single vendor that kind of drives value. So an acquisition strategy is >>So your field Salesforce has direct or indirect, both mixable. How do you handle the, because Cloudera has got feet on the street and every squirrel will find it, not if they're parked there, parking sales reps and SCS and all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. Yeah. And they're going to actually try to engage the clients. So, you know, I guess it is a strategy if they're deploying sales and marketing, right? So >>The beauty about that, and in fact, we're all in this together in terms of sharing an API and driving an ecosystem, it's not a fragmented market. You can start with one distribution and move to another, without recompiling or without doing any sort of changes. So it's a fairly open community. If this were a vendor lock-in or, you know, then spending money on brand, et cetera, would, would be important. Our focus is on the, so the sales execution of direct sales, yes, we have direct sales. We also have partners and it depends on the geographies as to what that percentage is. >>And John Schroeder on with the HP at fifth big data NYC has updated the HP relationship. >>Oh, excellent. In fact, we just launched our application gallery app gallery, make it very easy for administrators and developers and analysts to get access and understand what's available in the ecosystem. That's available directly on our website. And one of the featured applications there today is an integration with the map, our sandbox and HP Vertica. So you can get early access, try it and get the best of kind of enterprise grade SQL first, >>First Hadoop app store, basically. Yeah. If you want to call it that way. Right. So like >>Sure. Available, we launched with close to 30, 30 with, you know, a whole wave kind of following that. >>So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. So, you know, there's a lot of talk about that. Some confusion about the different methods for applying SQL on predicts or map art takes an open approach. I know you'll support things like Impala from, from a competitor Cloudera, talk about that approach from a map arts perspective. >>So I guess our, our, our perspective is kind of unbiased open source. We don't try to pick and choose and dictate what's the right open source based on either our participation or some community involvement. And the reality is with multiple applications being run on the platform, there are different use cases that make difference, you know, make different sense. So whether it's a hive solution or, you know, drill drills available, or HP Vertica people have the choice. And it's part of, of a broad range of capabilities that you want to be able to run on the platform for your workflows, whether it's SQL access or a MapReduce or a spark framework shark, et cetera. >>So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, you've got Impala, you've got hive. And the stinger initiative is, is that whole kind of SQL on Hadoop ecosystem, still working itself out. Are we going to have this many options in a year or two years from now? Or are they complimentary and potentially, you know, each has its has its role. >>I think the major differences is kind of how it deals with the new data formats. Can it deal with self-describing data? Sources can leverage, Jason file does require a centralized metadata, and those are some of the perspectives and advantages say the Apache drill has to expand the data sets that are possible enabled data exploration without dependency on a, on an it administrator to define that, that metadata. >>So another, maybe not always as exciting, but taking workloads from existing systems, moving them to Hadoop is one of the ways that a lot of people get started with, to do whether associated transformation workloads or there's something in that vein. So I know you've announced a partnership with Syncsort and that's one of the things that they focus on is really making it as easy as possible to meet those. We'll talk a little bit about that partnership, why that makes sense for you and, and >>When your customer, I think it's a great proof point because we announced that partnership around mainframe offload, we have flipped comScore and experience in that, in that press release. And if you look at a workload on a mainframe going to duke, that that seems like that's a, that's really an oxymoron, but by having the capabilities that map R has and making that a system of record with that full high availability and that data protection, we're actually an option to offload from mainframe offload, from sand processing and provide a really cost effective, scalable alternative. And we've got customers that had, had tried to offload from the mainframe multiple times in the past, on successfully and have done it successfully with Mapbox. >>So talk a little bit more about kind of the broader partnership strategy. I mean, we're, we're here at Hadoop summit. Of course, Hortonworks talks a lot about their partnerships and kind of their reseller arrangements. Fedor. I seem to take a little bit more of a direct approach what's map R's approach to kind of partnering and, and as that relates to kind of resell arrangements and things like, >>I think the app gallery is probably a great proof point there. The strategy is, is an ecosystem approach. It's having a collection of tools and applications and management facilities as well as applications on top. So it's a very open strategy. We focus on making sure that we have open API APIs at that application layer, that it's very easy to get data in and out. And part of that architecture by presenting standard file system format, by allowing non Java applications to run directly on our platform to support standard database connections, ODBC, and JDBC, to provide database functionality. In addition to kind of this deep predictive analytics really it's about supporting the broadest set of applications on top of a single platform. What we're seeing in this kind of this, this modern architecture is data gravity matters. And the more processing you can do on a single platform, the better off you are, the more agile, the more competitive, right? >>So in terms of, so you're partnering with people like SAS, for example, to kind of bring some of the, some of the analytic capabilities into the platform. Can you kind of tell us a little bit about any >>Companies like SAS and revolution analytics and Skytree, and I mean, just a whole host of, of companies on the analytics side, as well as on the tools and visualization, et cetera. Yeah. >>Well, I mean, I, I bring up SAS because I think they, they get the fact that the, the whole data gravity situation is they've got it. They've got to go to where the data is and not have the data come to them. So, you know, I give them credit for kind of acknowledging that, that kind of big data truth ism, that it's >>All going to the data, not bringing the data >>To the computer. Jack talk about the success you had with the customers had some pretty impressive numbers talking about 500 customers, Merv agent. The garden was on with us earlier, essentially reiterating not mentioning that bar. He was just saying what you guys are doing is right where the puck is going. And some think the puck is not even there at the same rink, some other vendors. So I gotta give you props on that. So what I want you to talk about the success you have in specifically around where you're winning and where you're successful, you guys have struggled with, >>I need to improve on, yeah, there's a, there's a whole class of applications that I think Hadoop is enabling, which is about operations in analytics. It's taking this, this higher arrival rate machine generated data and doing analytics as it happens and then impacting the business. So whether it's fraud detection or recommendation engines, or, you know, supply chain applications using sensor data, it's happening very, very quickly. So a system that can tolerate and accept streaming data sources, it has real-time operations. That is 24 by seven and highly available is, is what really moves the needle. And that's the examples I used with, you know, add a Rubicon project and, you know, cable TV, >>The very outcome. What's the primary outcomes your clients want with your product? Is it stability? And the platform has enabled development. Is there a specific, is there an outcome that's consistent across all your wins? >>Well, the big picture, some of them are focused on revenues. Like how do we optimize revenue either? It's a new data source or it's a new application or it's existing application. We're exploding the dataset. Some of it's reducing costs. So they want to do things like a mainframe offload or data warehouse offload. And then there's some that are focused on risk mitigation. And if there's anything that they have in common it's, as they moved from kind of test and looked at production, it's the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. So it's not, it's not anything new. It's just like, Hey, we've got SLS and I've got data protection policies, and I've got a disaster recovery procedure. And why can't I expect the same level of capabilities in Hindu that I have today in those other systems. >>It's a final question. Where are you guys heading this year? What's your key objectives. Obviously, you're getting these announcements as flurry of announcements, good success state of the company. How many employees were you guys at? Give us a quick update on the numbers. >>So, you know, we just reported this incredible momentum where we've tripled core growth year over year, we've added a tremendous amount of customers. We're over 500 now. So we're basically sticking to our knitting, focusing on the customers, elevating the proof points here. Some of the most significant customers we have in the telco and financial services and healthcare and, and retail area are, you know, view this as a strategic weapon view, this is a huge competitive advantage, and it's helping them impact their business. That's really spring our success. We've, you know, we're, we're growing at an incredible clip here and it's just, it's a great time to have made those calls and those investments early on and kind of reaping the benefits. >>It's. Now I've always said, when we, since the first Hadoop summit, when Hortonworks came out of Yahoo and this whole community kind of burst open, you had to duke world. Now Riley runs at it's a whole different vibe of itself. This was look at the developer vibe. So I got to ask you, and we would have been a big fan. I mean, everyone has enough beachhead to be successful, not about map arbors Hortonworks or cloud air. And this is why I always kind of smile when everyone goes, oh, Cloudera or Hortonworks. I mean, they're two different animals at this point. It would do different things. If you guys were over here, everyone has their quote, swim lanes or beachhead is not a lot of super competition. Do you think, or is it going to be this way for awhile? What's your fork at some? At what point do you see more competition? 10 years out? I mean, Merv was talking a 10 year horizon for innovation. >>I think that the more people learn and understand about Hadoop, the more they'll appreciate these kind of set of capabilities that matter in production and post-production, and it'll migrate earlier. And as we, you know, focus on more developer tools like our sandbox, so people can easily get experienced and understand kind of what map are, is. I think we'll start to see a lot more understanding and momentum. >>Awesome. Jack Norris here, inside the cube CMO, Matt BARR, a very successful enterprise grade, a duke player, a leader in the space. Thanks for coming on. We really appreciate it. Right back after the short break you're live in Silicon valley, I had dupe December, 2014, the right back.

Published Date : Jun 4 2014

SUMMARY :

The queue at Hadoop summit, 2014 is brought to you by anchor sponsor I mean, cause you guys have that's the security stuff nailed down. I think I'm, if you look at the kind of Hadoop market, I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. That's looking at how to provide the best of open source But you add that value separately to So if you look at, at this exciting ecosystem, Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about isn't there and it's hard to provide, you know, online real-time And what's the, what are the things that are really holding you back from Paducah So if you look at a major retailer, 2000 nodes and map bar 50 So I got to ask you about your go to market. you know, in this fast growing market, you know, an amount of money isn't necessarily all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. We also have partners and it depends on the geographies as to what that percentage So you can get early If you want to call it that way. a whole wave kind of following that. So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. And it's part of, of a broad range of capabilities that you want So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, of the perspectives and advantages say the Apache drill has to expand the data sets why that makes sense for you and, and And if you look at a workload on a mainframe going to duke, So talk a little bit more about kind of the broader partnership strategy. And the more processing you can do on a single platform, the better off you are, Can you kind and I mean, just a whole host of, of companies on the analytics side, as well as on the tools So, you know, I give them credit for kind of acknowledging that, that kind of big data truth So what I want you to talk about the success you have in specifically around where you're winning and you know, add a Rubicon project and, you know, cable TV, And the platform has enabled development. the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. Where are you guys heading this year? So, you know, we just reported this incredible momentum where we've tripled core and this whole community kind of burst open, you had to duke world. And as we, you know, focus on more developer tools like our sandbox, a duke player, a leader in the space.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Jack Norris	PERSON	0.99+
John Schroeder	PERSON	0.99+
HP	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
$3 billion	QUANTITY	0.99+
December, 2014	DATE	0.99+
Jason	PERSON	0.99+
Matt BARR	PERSON	0.99+
10,000 jobs	QUANTITY	0.99+
Today	DATE	0.99+
10 year	QUANTITY	0.99+
Syncsort	ORGANIZATION	0.99+
Dan	PERSON	0.99+
Silicon valley	LOCATION	0.99+
John barrier	PERSON	0.99+
Java	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
24	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
this year	DATE	0.99+
Jack	PERSON	0.99+
fifth	QUANTITY	0.99+
Linux	TITLE	0.99+
Skytree	ORGANIZATION	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
today	DATE	0.98+
one	QUANTITY	0.98+
Merv	PERSON	0.98+
about 10 years	QUANTITY	0.98+
San Jose	LOCATION	0.98+
Hadoop	EVENT	0.98+
about 20%	QUANTITY	0.97+
seven	QUANTITY	0.97+
over 500	QUANTITY	0.97+
a year	QUANTITY	0.97+
about 500 customers	QUANTITY	0.97+
SQL	TITLE	0.97+
seven different verticals	QUANTITY	0.97+
two years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
2014	DATE	0.96+
Apache	ORGANIZATION	0.96+
Hadoop	LOCATION	0.95+
SiliconANGLE	ORGANIZATION	0.94+
comScore	ORGANIZATION	0.94+
single vendor	QUANTITY	0.94+
day one	QUANTITY	0.94+
Salesforce	ORGANIZATION	0.93+
about nine years	QUANTITY	0.93+
Hadoop Summit 2014	EVENT	0.93+
Merv	ORGANIZATION	0.93+
two different animals	QUANTITY	0.92+
single application	QUANTITY	0.92+
top three	QUANTITY	0.89+
SAS	ORGANIZATION	0.89+
Riley	PERSON	0.88+
First	QUANTITY	0.87+
Forbes	TITLE	0.87+
single cluster	QUANTITY	0.87+
Mapbox	ORGANIZATION	0.87+
map R	ORGANIZATION	0.86+
map	ORGANIZATION	0.86+

Jack Norris - BigDataNYC 2013 - theCUBE - #BigDataNYC

>>I from Midtown Manhattan, the cute quiet coverage of big data NYC Civicon angled, Wiki bonds production made possible by Hortonworks. We do hairdo and lamb disco and new made invincible. And now your hosts, John furrier and Volante >>Hi buddy. We're back. This is Dave Volante with Jeff Kelly with Wiki bond. And this is the cube Silicon angle's continuous production. We're here at big data NYC right across the street from the Hilton where strata comp and a dupe world is going on. We've got a multi-time cube guest, Jack Norris, the CMO of map bars here, Jack. Welcome back to the cube first. So by the way, thank you so much for the support. As you know, we're across the street here at the Warwick hotel map, our, you guys have always been so generous supporting the cube. We can't thank you enough for that. So really appreciate it. Thank you. So we were able to listen to your keynote yesterday. It was, we, we, we weren't broadcasting, you know, head to head yesterday and had an opportunity to hear your keynote. So, first of all, how did that go? I want to ask you some questions about it. >>It, it was a really well-received and I think people were kind of clamoring to try to separate the myths from, from reality on, on Hadoop, >>We had three myths that you talked about, you know, one related to the distraction. I'd like to get into some of those. So what was the, the first myth was around the, the, the, the district distribution battle. So take us through that. >>So, you know, th the impression that it's a knock-down drag-out competitive battle across Hadoop distributions was the first myth. And the reality is that all of the distribution share the same open source Apache code. And this is one of the first markets that's really, really created, or the first open-source technologies it's really created a market. I mean, look, what's happened here with this whole, this whole big data and Hadoop, but given that early stage, there's the requirement to really combine that open source code with additional innovations to meet customer needs. And so what you see is you see those aggregators that are taken open source, you see others that are taking the open source, and then adding maybe management utility, couple of, of, you know, different applications on top. And then our approach at map R is we're taking the open source with those management innovations, doing some development, the open source community with things like Apache drill, and then really focusing on the underlying architecture, the data platform and providing innovations at that layer. So >>Actually sort of the three major destroys that we talk about all the time. You know, you guys, Hortonworks and Hadoop, you guys have been consistent the whole time as has Hortonworks, right? Cloud era basically put out a post recently saying, Hey, kind of going in a different direction, sort of what I call the tapped out of the Hadoop distro, you know, piece of it. But so there's a lot of discussion around it. You're putting forth the, Hey, it's not an internet seen war, but does it matter is my question? >>Well, I think if you take a step back, the Hadoop ecosystem is incredibly strong growing very, very quickly, fastest growing big data technology, one of the top 10 technologies overall. And I think it's because we are sharing the same API. It is possible for customers to learn on one, develop and move seamlessly to another. And, you know, in the keynote, I talked about the difference between the no SQL market, which is, you know, there is no consensus there and, and customers have to figure out not only what's the right word workload, but what's the technology that's actually going to have some staying power, right? >>That's a powerful comment. Amazon turn the data center and into an API, or you as the duke community is essentially turning data, access into an API. And that is a very powerful and leverageable concept. Okay. Your second myth was around the whole, no SQL yes. Piece of it. You help you put up a slide. I thought I read Jeff Kelly's reports. And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, but so take us through myth. Number two >>Too. I'm sure we missed some >>There wasn't room on the slide for anymore. >>The, yeah, it's basically about the consensus. There is no real consensus. There's no common API. There's no ability to move applications seamlessly across no SQL solutions. If you look at one no SQL solution, and that's, HBase a big inherent advantage because it's integrated with Hindu, you know, this whole trend is about compute and data together. So if you've got a no sequel solution, that's on that same, you know, massive data store, you know, big leg up. And, and then we got into the, well, if you've got HBase, it's included in all the distributions and all the distribution share the same open source, then obviously it must run the same across all distributions. And there, we shared some pretty interesting data to show the difference. When you, when you do architectural differences and innovations underneath that you can dramatically change the performance of, of not only MapReduce, but of no SQL. Yes. >>Okay. So not all no SQL is created equally. Not all HBase is created equally as essentially what you're saying there. Now the third piece was to dupe is enterprise ready, right? Yeah. So you guys were first to say, well, we have a Hadoop platform that's enterprise ready way ahead on that. Got criticized a lot for going down that path shrugged and said, okay, we'll just keep doing business with customers. And you've been again, very clear and consistent on that. So talk about the third myth >>And that's, you know, is, is Hadoop ready for prime time? And I think the way to combat that myth is by customer examples and showing the tremendous success that customers are enjoying with Hadoop. And, you know, we, we don't have time on the cube here to go through all of them, but, you know, I like to point out 90 billion auctions a day with Rubicon, they've surpassed Google in terms of ad reach. They're doing that on Mapbox 1.7 trillion events a month with comScore that's on, on map bar. You look in, in traditional enterprise, you know, a single retailer with over 2000 nodes of Hadoop. I mean, it's a key part of their merchandising and retail operations, and combining all sorts of, of data feeds and all sorts of use cases there, financial services over a thousand nodes of risk medication, personalized offers streamlining their operations. I mean, it's, it's dramatic. And then, you know, we shared some of the more, more interesting ones, esoteric ones like garbage and whiskey and weather prediction. >>There was consider these, we even as diverse and eclectic as they are, they consider these mission critical application. >>Oh, absolutely. No it it's. And I think that's the difference because what we're talking about is not Hadoop as this cash, right? This temporary processing, where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And yes, there are applications like that, but companies soon realized that if I'm going to use this as a key part of my operations, and it's about data on compute, then I want a consistent permanent store. I want a system of record. So all of the SLS and high availability and data protection features that they expect in their enterprise applications should be present in Hadoop, right? That's where we focus. Let's run down a couple of those. >>What are some of the key capabilities that you need in an enterprise enterprise grade platform? That map bar is >>Well, let's, let's take, let's take business continuity cause that's important if you're really going to trust data there. And you know, one of the big drivers as you expand data is how much am I going to spend on it? And if you look at a large investment bank, $270 million of their budget, not total, but incremental to address the additional capacity, there's a big emphasis for let's look at a better way to do that. So instead of spending $15,000 a terabyte, if you can spend a few hundred dollars a terabyte, that's a huge, huge advantage. And that's the focus of Hindu, but to do that, well, then the features that are in this enterprise storage have to be present. And we're talking about, you know, mirroring and not a copy table function, but replication, that's how that's how organizations do it, right. If you're going to recovery and recovery, you know, you can't back up a petabyte of information through a copy function, right? You have to do a snapshot and the snapshots have to be consistent, right. And, and we're not saying anything that, you know, an enterprise administrator doesn't know, there is some confusion when you're more on the developer side as to what these features are and the difference between a fuzzy snapshot and a point in time, consistent snaps. >>Got it. So let's talk a little bit about the, the enterprise data hub, this, this concept that Michael Wilson with clutter introduced yesterday. Tell us a little bit about your take on, on, on Mike's I guess, definition and, and essentially I think trying to name the category of kind of what Hadoop can do and what, and where it sits in the architecture. Did you agree with his, his, >>Yeah. I mean, if you look at, at that description, it's about I'm taking important data and I'm putting it in a dupe and I'm combining a lot of different data sources and it's been referred to as a data lake and a data reservoir and a data ocean. I mean, we've heard a lot of terms. We worked with an outside consultant that was originally an architect at Terre data. It's been about eight months, almost a year ago now where he defined it and enterprise data hub. And it's it's, he went through kind of the list of requirements. And once you move from a transitory to a permanent store, then that becomes an enterprise data hub. And an enterprise data hub can be used to select and process information, maybe it's ETL and serve some downstream applications. It can also be useful to do analysis directly on it, to, you know, to serve different business functions. But the system requirements that he established for that I think are absolutely true. And it's, you have to have the full data protection. You have to have the full disaster recovery. You have to have the full high availability because this is going to be important data serving the organization. If it's data that you can lose, if it's data that you, you don't really care about having highly available, then it's a very narrow use case that that data hub serves. >>So you're saying the enterprise data hub isn't ready for prime time. >>No, I'm saying that there, there are requirements. And we have companies today that have deployed an enterprise data hub and they are quite successful with it. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 times faster and it's 10 times cheaper than what they're seeing. >>Soundbite, Dave, >>I agree, but it's nuanced. Right. And so, you know, the customers cause a lot of vendors, right? They're all saying the same thing to the customers, right? So you've got your messaging that you've, you know, you've proven out over the last several years and then the entire market starts to use the same terminology. So it is, this is why I, like, I think this, what is, what are those >>Things? We're in a little bit of this, this kind of marketing fog here in the relative early stages. I think the best response there is customer proof points. And I think some education in the very beginning, you know, when they're in development and test, it's really important to understand, you know, what is Hadoop and what can I use it for and what data source am I going to leverage? I think the features that we're talking about really start to show up as you deploy in production. And as you expand its use in production and there we've enjoyed tremendous success, >>But he would argue that you have a lead in this space. I wouldn't, I don't think you would either the space being robustness enterprise ready, mission criticality is your lead increasing, decreasing staying the same. >>What's your sense? Well, it's hard cause there's no, you know, th th there's no external service that's out there, you know, interviewing every customer and, and giving numbers. I do know that we passed 500 paying customers. I do know that we've got significant deployments and you can measure those in terms of number of nodes, you know, in the thousands of nodes, you can measure those in terms of use cases. So we've got, you know, one company they've passed 20 different use cases on the same cluster. I think that's an interesting proof point. We're scaling in terms of the number of, of people in an organization that are trained in leveraging the data in map are again in the, in the thousands. So, you know, I think this market is so big and so dynamic that this isn't about, you know, one company success at the expense of everyone. Else's zero sum game. I think, you know, we're all here kind of raising this, this boat and focusing on this paradigm shift, but when it comes to production success, that's our focus. And I think that's where we've, we've proven that >>One thing I'm really want to get your opinion on, you know, as, as to do matures and some of the innovations you guys are doing and, and making the platform, you know, basically a multi application platform, you can do more things with Hadoop. And we've been talking about this on the cube, is that as that happens, you're going to start you as an industry. You're going to start bumping up against the EDW vendors and some of the other database vendors in the traditional world. And you're now you're doing some of the things that those, those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. There's no overlap. We're gonna all play nice. But increasingly we're seeing that there is an overlap. How do you view that? Is that, and what is your relationship with those, with those EDW vendors and, and what are you hearing from customers when you go into a customer? Okay. >>So, I mean, there's a, there's a lot in that question. I think the F the first comment though, is don't look at Hadoop through this single data warehouse lens. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse where there's, here's a few decades of experience, there, there are many organizations that have a lot of activities that are based in that data warehouse. And that's where we're seeing a data warehouse offload that is complimentary, but it gives organizations this lever to say, well, I'm going to control the fill rate, and I'm going to take some of the data that's no longer, you know, really active and put that on Hadoop and really change my ability to manage the costs in a data warehouse environment. The other thing that's interesting is that the types of applications that duper doing, I think are creating a new class it's about operations and analytics, kind of combined together, taking high arrival rate data and making very quick micro changes to optimize whether that's fraud detection or recommendation engines, or taking sensor data and predictive analytics for, for maintenance, et cetera. There is just a tremendous number of, of applications. In some cases, leveraging a new data source in some cases, doing new applications, but it's just opening things up. And, and I think organizations are moving to be very data-driven and Hadoop is at the center of that. >>And you control the field, right? That's another really good soundbites. And, and these that, you mentioned this high arrival rate data, this fraud detection, predictive analytics, maintenance, these are things that you're doing today with >>Navarre right? Yeah, >>Absolutely. Great. All right, Jack. Well, listen, always a pleasure. Thanks very much for coming by. Great to see you again. All right. Keep it right there about Uber, right back with our next guest. This is the cube we're live from the big apple.

Published Date : Oct 30 2013

SUMMARY :

I from Midtown Manhattan, the cute quiet coverage of big data NYC So by the way, thank you so much for the We had three myths that you talked about, you know, one related to the distraction. So, you know, th the impression that it's a knock-down drag-out sort of what I call the tapped out of the Hadoop distro, you know, piece of it. And, you know, in the keynote, I talked about the difference between the no SQL market, And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, that's on that same, you know, massive data store, you know, big leg up. So you guys were first to say, And that's, you know, is, is Hadoop ready for prime time? where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And you know, one of the big drivers as you expand Did you agree with his, his, to, you know, to serve different business functions. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 And so, you know, the customers cause a lot of you know, when they're in development and test, it's really important to understand, you know, I wouldn't, I don't think you would either the space being robustness enterprise so dynamic that this isn't about, you know, one company success at the expense those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse And you control the field, right? Great to see you again.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Michael Wilson	PERSON	0.99+
10 times	QUANTITY	0.99+
Jack	PERSON	0.99+
Jack Norris	PERSON	0.99+
10 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$270 million	QUANTITY	0.99+
Mike	PERSON	0.99+
yesterday	DATE	0.99+
Dave Volante	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
Hadoop	TITLE	0.99+
Midtown Manhattan	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
Volante	PERSON	0.99+
thousands	QUANTITY	0.99+
first	QUANTITY	0.99+
20 different use cases	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
second	QUANTITY	0.99+
John furrier	PERSON	0.98+
NYC	LOCATION	0.98+
two years ago	DATE	0.98+
Hadoop	ORGANIZATION	0.98+
first comment	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
SQL	TITLE	0.97+
Terre data	ORGANIZATION	0.97+
One	QUANTITY	0.97+
1.7 trillion events	QUANTITY	0.97+
third	QUANTITY	0.97+
today	DATE	0.97+
one	QUANTITY	0.96+
single	QUANTITY	0.96+
a year ago	DATE	0.95+
one company	QUANTITY	0.94+
HBase	TITLE	0.94+
Navarre	PERSON	0.93+
EDW	ORGANIZATION	0.92+
over 2000 nodes	QUANTITY	0.91+
big apple	ORGANIZATION	0.91+
first markets	QUANTITY	0.9+
nodes	QUANTITY	0.89+
about eight months	QUANTITY	0.88+
2013	DATE	0.88+
Soundbite	ORGANIZATION	0.87+
three myths	QUANTITY	0.87+
Hindu	ORGANIZATION	0.87+
first open-source	QUANTITY	0.86+
Wiki bond	ORGANIZATION	0.85+
BigDataNYC	EVENT	0.85+
$15,000 a terabyte	QUANTITY	0.85+
three major	QUANTITY	0.82+
90 billion auctions a day	QUANTITY	0.81+
500 paying customers	QUANTITY	0.79+
comScore	ORGANIZATION	0.79+
map R	ORGANIZATION	0.78+
over a thousand nodes	QUANTITY	0.77+
Hilton	LOCATION	0.77+
few hundred dollars a terabyte	QUANTITY	0.76+
Number two	QUANTITY	0.76+
10 technologies	QUANTITY	0.74+

Jack Norris | Strata-Hadoop World 2012

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George Gilder	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Kristin Filetti	PERSON	0.99+
Joel Hellison	PERSON	0.99+
John Schroeder	PERSON	0.99+
Joe	PERSON	0.99+
Jack	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Jack Norris	PERSON	0.99+
John	PERSON	0.99+
40 days	QUANTITY	0.99+
Melinda Graham	PERSON	0.99+
64%	QUANTITY	0.99+
$99	QUANTITY	0.99+
comScore	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Dave	PERSON	0.99+
Tuesday	DATE	0.99+
Matt BARR	PERSON	0.99+
Hellerstein	PERSON	0.99+
Google	ORGANIZATION	0.99+
George Gilder	PERSON	0.99+
Ted	PERSON	0.99+
John ferry	PERSON	0.99+
30 years	QUANTITY	0.99+
30,000 times	QUANTITY	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
a week later	DATE	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Dana	PERSON	0.99+
Tim SDS	PERSON	0.99+
one point	QUANTITY	0.99+
Java	TITLE	0.99+
first	QUANTITY	0.99+
six months later	DATE	0.99+
one	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
Linux	TITLE	0.98+
once a week	QUANTITY	0.98+
18 months	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
HBase	TITLE	0.98+
Kozum	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
this morning	DATE	0.97+
Telekom	ORGANIZATION	0.97+
this week	DATE	0.97+
10 years ago	DATE	0.97+
second dimension	QUANTITY	0.97+
both	QUANTITY	0.97+
Kozum	ORGANIZATION	0.95+
third one	QUANTITY	0.95+
One	QUANTITY	0.94+
three things	QUANTITY	0.94+
a year ago	DATE	0.94+
Hadoop	TITLE	0.93+
siliconangle.com	OTHER	0.93+
Knicks	ORGANIZATION	0.93+
Regents	ORGANIZATION	0.92+

Jack Norris | Hadoop Summit 2012

>>Okay. We're back live in Silicon valley and San Jose, California for the continuous coverage of siliconangle.tv and have duke world 2012. This is ground zero for the alpha geeks in big data. Uh, just the tech elite. We call them tech athletes and, uh, we're excited to cover it on the ground. Extract the signal from the noise here. This is the cube, our flagship telecast. I'm joining my co-host Jeff Kelly from Wiki bond.org, the best analyst in the business. Jeff, welcome back for another segment. End of the day, day one loving every minute. Okay. We're here with our guest. Jack Norris is a cm of map bar Jack. Welcome back to the cube. You've been on a few times. Um, so you guys have some news. Yes. So let's get right to the news. So you guys are a player in the business, so share with your news, the folks. Excellent jump right in. >>So, uh, two big announcements today, we announced that Amazon is integrating map bar as part of their Lastic MapReduce service and both edition or, or free edition. M three is available as well as M five directly with Amazon, Amazon in the cloud. >>So what's the value proposition. Why would a customer say, all right, I want to do this in the cloud manpower, an Amazon cloud rather than doing it on premise. >>Okay. So let's start with, I mean, there's a lot of value propositions, all balled up into one here. Uh, first of all, in the cloud, it allows them to spin up very quickly. Within a couple minutes, you can get, uh, you know, hundreds of nodes available. Um, and, uh, and depending on where you're processing the data, if you've got a lot of data in the cloud already makes a lot of sense to do the Hadoop processing directly there. So that's, that's one area. A second is you might have an on-premise cloud deployment and need to have a disaster recovery. So map R provides point in time, snapshots, uh, as well as, as a white area replication. So you can use mirroring having Amazon available as a target is a huge advantage. And then there's also a third application area where you can do processing of the data in the cloud and then synchronize those results to an on-premise. So basically process where the data is combined the results into a cluster on premise. So you >>Don't have to move the raw data. Uh, >>On-premise actually, it's all about let's do the processing on the data. Well, you know, the whole, >>The value proposition and big data in general is let's not move, move data as little as possible. Yep. Uh, you know, so you bring the computation to the data, if you can. Uh, so what are your take on this event? I mean, we've got, uh, this is a, you know, the 4th of June summit, uh, you know, Hortonworks is now fully taken over the show and talk about what you see out here in terms of, uh, the other vendors that play. And, uh, just to kind of the attendees, the vibe you're seeing, >>Uh, it's a lot of excitement. I think a big difference between last year, which seemed to be very developer focused. We're seeing a lot of, a lot of presentations by customers. A lot of information was shared by our customers today. It was fun to see that, uh, comScore's shared, uh, shared their success. Boeing gap map is, uh, it was great for us. >>Fantastic. We look at Amazon, Amazon, first of all, is the gold standard for public cloud. Right? They've knocked it out of the park. Everyone knows Amazon. Um, but they've been criticized on the big data front because of the cycle times involve on. Um, and some developers and mean for web service spending up and down. No problem. Um, and we're seeing businesses like Netflix run on Amazon. So Amazon is not a stranger to running scale for cloud, but Hadoop has kind of been a klugey thing for Amazon. So I think, you know, talk about why Amazon and you guys is a good fit out to the market. The market reach is great. So you guys know and have a huge addressable market. Are you guys helping solve some of that complexity with the, uh, with the MapReduce side? What's, >>What's the core, I guess the first comment first response would be, I think every customer should have that type of Kluge. Uh, uh, they could have the success that Amazon has in Hadoop. They have a huge number of, of, uh, of Hadoop deployments have been very, very successful. I think, >>I mean, you know what I mean by it's natural, it's, cloogy everywhere right now. That's the problem. But Amazon has huge scale, um, and had not a natural fit. There >>Is not a natural fit >>For the data for the data component. And, uh, uh, the HBase for example, >>Component. So where were Amazons, you know, made it very frictionless is the ability to spin up Hadoop to do the analysis. The gap that was missing is some of the, the ha capabilities. The data protection features the disaster recovery, and, you know, we're map are now it gives options to those customers. You know, if they want those kinds of enterprise enterprise grade features, now they have an option within EMR. It can select a M five and, and get moving if they want a performance. And in NFS, they've got the M three options. >>Well, congratulations. I think it's a great deal for you guys and for Amazon customers. My question for you is, as you guys explore the enterprise ready equation, which has been a big topic this week, um, what does that mean to you guys? Cause it means different things to different people depends on where, how high up to OLTB do you go? Right? I mean, we're how far from batch to real time transactional, um, levels you go, I mean, low bash, no problem. But as you start to get more near real time, it's going to be a little bit different gray in this house used security HDFS. Yeah. >>Yeah. So, so duke represents the strategic platform, right? Deploying that in an organization, um, you know, moving from kind of an experimental kind of lab based to production environment creates a different set of feature requirements. How available is it? How easy is it to integrate, right? How do I kind of protect that information and how do I share it? So when we say enterprise grade, we mean you can have SLA, she can put the data there and, and be confident that the data will remain there, that you can have a point in time recovery for an application error or user mistake. Uh, you can have a disaster recovery features in place. And then the integration is about not recreating the wheel to get access to the information. So Hadoop is very powerful, but it requires interacting through an HDFS API. If you can leverage it like through map bar with NFS standard file based access standard ODBC access, open it up. >>So I can use a standard file browser applications to see and manipulate the data really opens up the use cases. And then finally, what we announced in two dot oh, was multitenancy features. So as you share that information, all of a sudden the SLA is of different groups and well, these guys need it immediately. And if you've got some low grade batch jobs are going to impact that. So you want the ability to protect, to isolate, to secure information, and basically have virtual clusters within a cluster. And those features are important to cloud, but they're also important to on-premise >>So great for the hybrid cloud environments out there. I mean, the multitenancy cracking the code on that. Exactly huge. I mean, that is basically, I mean, right now most enterprises are like private cloud because it's like, they're basically extension of their data center and you're seeing a lot more activity in the hybrid cloud as a gateway to the public cloud. So, >>And, and, you know, frankly, people are kind of struggling with in an experimental with Apache Hadoop and the other distributions, the policies are either at the individual file level or the whole cluster. And it all almost forced the creation of separate physical clusters, which kind of goes against the whole Hadoop concept. So the ability to manage it, a logical layer have separate volumes where you can apply policies to apply that applies to all the content underneath really kind of makes it much, much easier for administrators to kind of deal with these multiple use cases. >>Amazon, Amazon has always been one of those cases for the enterprise where it's been one of those and they've, this has been talked about for years, put the credit card down, go play on Amazon, but then bring it back into the it group for certification. And so I think this is a nice product for you guys to bring that comfort. You know, we're very >>Excited the enterprise saying, Hey, >>Come play in Amazon. It's Bulletproof enterprise. Ready? So congratulations. >>I wonder, can we talk, uh, talk use cases. So what are you seeing in terms of, uh, evolving use cases as, as, uh, duke continues to become more enterprise grade, uh, depending on your definition, uh, but how is that impacting what you're seeing in terms of, even if it's just, uh, you know, the, the, um, the mindset even people think now, okay, now it's enterprise grade, well, maybe, you know, in, in, depending on who you talk to, it's been that way for a bit, but what kind of, uh, use cases are you seeing develop now that it's kind of starting to gain acceptance? It's like, okay, we can trust our data is going to be there, et cetera. >>So th there's a huge range of use cases that, uh, different by industry, different by kind of dataset that's being used against everything from really a deep store where you can do analytics on it. So you're selecting the content to something that's very, very analytic machine learning intensive, where you're doing sophisticated clustering algorithms, uh, et cetera, um, where we've seen kind of an expansion of use cases are around real-time streaming and you get streaming data sets that are kind of entering into the cloud. And, um, some of the more mission, critical data moving beyond just maybe click stream data or things that if you happen to drop a few, you know, not a big deal, right. Versus the kind of trust the business type of content. >>Talk a little bit about the streaming, uh, aspects, uh, because of course, you know, we think of duke, we think of a batch system in terms of streaming data into Hadoop. You know, that's, that's a different, uh, that's something we don't, we haven't heard a lot about. So how do you guys approach that? >>So, uh, one of the artifacts of, of HDFS, which is a, is a distributed file system that scores in the underlying Linux file system, it's append only. So as an administrator, you decide, how frequently do I close the file item? I going to do that an hourly basis on it every eight hours, because you have to close the file for other applications to see the data that's been written. Right? So one of the innovations that, uh, that we pursued was to rewrite that create this dynamic read-write layer. So you can continue to write data in any application is seeing the latest data that's written. So you can Mount the cluster as if it's storage and just continue to write data. There really opens up what's, uh, what's possible companies like Informatica, they're all from a messaging product integrates directly in with, with Matt BARR and provides. >>So what kind of advantage does that provide to the end user? What w w translate that into real business value? Why, why is that important? >>Well, so one example is comScore, comScore handles 30 billion, uh, objects a day, uh, as they go out and try to measure the use of, of the web and being able to continually write and stream that information and scale and handle that in a real time and do analytics and turn around data faster, has tremendous business value to them. If they're stuck in a batch environment where the load times lengthen to the point where all of a sudden they can't keep up and they're actually reporting on, you know, old news. And I think the analogy is forecasting rain a day after it's wet. Isn't exactly valuable. >>Yeah. So you guys, obviously a great deal of the enterprise ready for Amazon, big story, big coup for the company. What's next for you. I want to ask that and make sure you get that out there on your agenda for the next year, but then I want you to take a step back a year, maybe a year and a half ago. Look back at how much has changed in this landscape. Um, share your perspective because the market has gone through an evolution where there's been a market opportunity, and then everyone goes, oh my God, it's bigger than we actually thought. I mean, Jeff, Kelly's a groundbreaking report about the $50 billion market is now being talked about as too low. So big data has absolutely opened up to a huge, and it's changed some of the tactics around strategies. So your strategy, Hortonworks strategy, even cloud era. So, and it's still evolving. So what's changed for the folks out there from a year and a half ago, a year ago to today, and then look out for the next 12 months. What's on your agenda. >>Well, if, if you look back, I think we've been fairly consistent. Um, uh, I'm, I'm not going to take credit for the vision of our CEO and CTO. Uh, but they recognized early on that Hadoop was, uh, was a strategic platform and to be a strategic platform that applied to the broadest number of use cases and organizations required some, some areas, uh, of innovation and particularly the how it, how it scaled, how it was managed, how you stored and protected the information needed a rearchitecture. And I think that, you know, architecture matters when you're going through a paradigm shift, having the right one in place creates this, this ability, you know, to speed innovation. And I think that's, if there's anything that's changed, I think it's the speed of innovation has even increased in the Hadoop community. I think it's, it's created a focus on these enterprise grade features on how do we store this valuable information and, and continue to explore. >>And I think one of the observations I'll make is that on that note is that it really focuses everyone to be just mind your own business and get the products out. You know what I'm saying? We've seen everyone, the product focus be the number one conversation. >>What we've seen is customers, you know, start and they expand rapidly. Some of that student data growth, but a lot of it is student more and more applications are being delivered and, and, uh, and, and the values kind of extracted from the hoop platform and success breeds success. Well, >>Congratulations for all your success, great win with Amazon web services and make that a little bit more easier, more robust, and more, more features for them and you, uh, more revenue for part of our, um, and I want to personally thank you for your support to the cube. Uh, we've expanded with a new studio B software for extra extra interviews, um, and wanna expand the conversation, thanks to your generous support. You can bring the independent coverage out to the market and, um, great community, thanks for helping us out. And we appreciate it. So thank you. Okay. Jack Dorsey with Matt bar, we'll be right back to wrap up day one with that. Jeff and I will give our analysis right at the short break.

Published Date : Jun 14 2012

SUMMARY :

So you guys are a player in the business, so share with your news, Amazon in the cloud. So what's the value proposition. And then there's also a third application area where you can do processing of the data in Don't have to move the raw data. Well, you know, the whole, uh, you know, Hortonworks is now fully taken over the show and talk about what you see out here in terms of, uh, it was great for us. So I think, you know, talk about why Amazon and you guys is a good fit out What's the core, I guess the first comment first response would be, I think every customer I mean, you know what I mean by it's natural, it's, cloogy everywhere right now. For the data for the data component. the disaster recovery, and, you know, we're map are now it gives options to those customers. I think it's a great deal for you guys and for Amazon customers. that the data will remain there, that you can have a point in time recovery for an application error or user mistake. So as you share that information, So great for the hybrid cloud environments out there. So the ability to manage it, And so I think this is a nice product for you guys to So congratulations. So what are you seeing in terms of, uh, evolving use cases as, really a deep store where you can do analytics on it. Talk a little bit about the streaming, uh, aspects, uh, because of course, you know, we think of duke, I going to do that an hourly basis on it every eight hours, because you have to close the file for other applications actually reporting on, you know, old news. I want to ask that and make sure you get that And I think that, you know, architecture matters when you're going through a paradigm shift, And I think one of the observations I'll make is that on that note is that it really focuses everyone to be What we've seen is customers, you know, start and they expand rapidly. You can bring the independent coverage out to the market and, um, great community,

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Jeff	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jack Norris	PERSON	0.99+
Jack Dorsey	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
$50 billion	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
30 billion	QUANTITY	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
a year ago	DATE	0.99+
next year	DATE	0.99+
comScore	ORGANIZATION	0.99+
a year and a half ago	DATE	0.99+
Kelly	PERSON	0.99+
last year	DATE	0.99+
Amazons	ORGANIZATION	0.99+
Linux	TITLE	0.99+
Matt BARR	PERSON	0.99+
San Jose, California	LOCATION	0.99+
one example	QUANTITY	0.98+
one area	QUANTITY	0.97+
third application	QUANTITY	0.97+
Matt	PERSON	0.97+
one	QUANTITY	0.97+
Hadoop	TITLE	0.97+
this week	DATE	0.96+
2012	DATE	0.95+
hundreds of nodes	QUANTITY	0.94+
Hortonworks	ORGANIZATION	0.94+
Jack	PERSON	0.93+
both edition	QUANTITY	0.93+
a day	QUANTITY	0.93+
two big announcements	QUANTITY	0.92+
second	QUANTITY	0.9+
next 12 months	DATE	0.88+
day one	QUANTITY	0.86+
two dot	QUANTITY	0.85+
M three	OTHER	0.85+
M three	TITLE	0.84+
MapReduce	ORGANIZATION	0.82+
Hadoop Summit 2012	EVENT	0.79+
first response	QUANTITY	0.79+
every eight hours	QUANTITY	0.78+
SLA	TITLE	0.77+
June	DATE	0.77+
first comment	QUANTITY	0.77+
Lastic MapReduce	TITLE	0.69+
M five	OTHER	0.69+
Boeing	ORGANIZATION	0.68+
M five	TITLE	0.67+
siliconangle.tv	OTHER	0.67+
ground zero	QUANTITY	0.67+
Wiki bond.org	ORGANIZATION	0.62+
Apache	ORGANIZATION	0.61+
4th of	EVENT	0.6+

Jack Norris - Strata Conference 2012 - theCUBE

>>Hi everybody. We're back. This is Dave Volante from Wiki bond.org. We're live at strata in Santa Clara, California. This is Silicon angle TVs, continuous coverage of the strata conference. So Riley media or Raleigh media is a great partner of ours. And thanks to them for allowing us to be here. We've been going all week cause it's day three for us. I'm here with Jeff Kelly Wiki bonds that lead big data analysts. And we're here with Jack Norris. Who's the VP of marketing at Matt bar Jack. Welcome to the cube. Thank you, Dave. Thanks very much for coming on. And you know, we've been going all week. You guys are a great sponsor of ours. Thank you for the support. We really appreciate it. How's the show going for you? >>Great. A lot of attention, a lot of focus, a lot of discussion about Hadoop and big data. >>Yeah. So you guys getting a lot of traffic. I mean, it says I hear this 2,500 people here up from 1400 last year. So that's >>Yeah, we've had like five, six people deep in the, in the booth. So I think there's a lot of, a lot of interests. There's interesting. >>You know, when we were here last year, when you looked at the, the infrastructure and the competitive landscape, there wasn't a lot going on and just a very short time, that's completely changed. And you guys have had your hand in that. So, so that's good. Competition is a good thing, right? And, and obviously customers want choice, but so we want to talk about that a little bit. We want to talk about map bar, the kind of problems you're solving. So why don't we start there? What is map are all about? And you've got your own distribution of, of, of enterprise Hadoop. You make it Hadoop enterprise ready? Let's start there. >>Okay. Yeah, I mean, we invested heavily in creating a alternative distribution one that took the best of the open source community with the best of the map, our innovations, and really it's, it's about making Hadoop more applicable, broader use cases, more mission, critical support, you know, being able to sit in and work in a lights out data center environment. >>Okay. So what was the problem that you set out to solve? Why, why do, why do we need another distribution of Hadoop? Let me ask it that way. Get nice and close to. >>So there, there are some just big issues with, with the duke. >>One of those issues, let's talk about that. There's >>Some ease of use issues. There's some deep dependability issues. There's some, some performance. So, you know, let's take those in order right now. If you look at some of the distributions, Apache Hadoop, great technology, but it requires a programmer, right? To get access to the data it's through the Hadoop API, you can't really see the data. So there's a lot of focus of, you know, what do I do once the data's in there opening that up, providing a full file based access, right? So I can look at it and treat it like enterprise storage, see the data, use my standard tools, standard commands, you know, drag and drop from a file browser. You can do that with Matt bar. You can't do that with other districts >>Talking about mountain HDFS as a NFS correct >>Example. Correct. And then, and then just the underlying storage services. The fact that it's append only instead of full random read-write, you know, causes some, some issues. So, you know, that's some of the, the ease of use features. There's a whole lot. We could discuss there. Big picture for reliability. Dependability is there's a single point of failure, multiple single points of failure within Hadoop. So you risk data loss. So people have looked at Hadoop. Traditionally is, is batch oriented. Scratchpad right. We were out to solve that, right? We want to make sure that you can use it for mission critical data, that you don't have a risk of a data loss that you've got full high availability. You've got the full data protection in terms of snapshots and mirroring that you would expect with the enterprise products. >>It gets back to when you guys were, you know, thinking about doing this. I'm not even sure you were at the company at the time, but you, your DNA was there and you're familiar with it. So you guys saw this big data movement. You saw this at duke moon and you said, okay, this is cool. It's going to be big. And it's gonna take a long time for the community to fix all these problems. We can fix them. Now let's go do that. Is that the general discussion? Yeah. >>You know, I think, I think the what's different about this. This is the first open source package. The first open source project that's created a market. If you look at the other open source, you know, Linux, my SQL, et cetera, it was really late in the life cycle of a product. Everyone knew what the features were. It was about, you know, giving an alternative choice, better Unix. Your, your, the focus is on innovation and our founders, you know, have deep enterprise background or CTO was at Google and charge of big table, understands MapReduce at scale, spent time as chief software architect at Spinnaker, which was kind of the fastest clustered Nazanin on the planet. So recognize that the underlying layers of Hadoop needed some rearchitecture and needed some deep investment and to do that effectively and do that quickly required a whole lot of focus. And we thought that was the best way to go to market. >>Talk about the early validation from customers. Obviously you guys didn't just do this in a vacuum, I presume. So you went out and talked to some customers. Yeah. >>What sorts of conversations with customers, why we're in stealth mode? We're probably the loudest stealth >>As you were nodding. And I mean, what were they telling you at the time? Yeah, please go do this. >>The, what we address weren't secrets. I there've been gyrus for open for four or five years on, on these issues. >>Yeah. But at the same time, Jack, you've got this, you got this purist community out there that says, I don't want to, I don't want to rip out HDFS. You know, I want it to be pure. What'd you, what'd you say to those guys, you just say, okay, thank you. We, we understand you're not a prospect. >>And I think, I think that, you know, duke has a huge amount of momentum. And I think a lot of that momentum is that there isn't any risks to adopting Hadoop, right? It's not like the fractured no SQL market where there's 122 different entrance, which one's going to win. Hadoop's got the ecosystem. So when you say pure, it's about the API APIs, it's about making sure that if I create a MapReduce job, it's going to run an Apache. It's going to run a map bar. It's going to run on the other distributions. That's where I think that the heat and the focus is now to do that. You also have to have innovation occurring up and down the stack that that provides choice and alternatives for. >>So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, which is the elephant in the room here. People will worry about lock-in >>Pun intended. >>No, no, but good one good catch. But so, but you're basically saying, Hey, where we're no more locked in than cloud era. Right. I mean, they've got their own >>Actually. I think we're less because it's so easy to get data in and out with our NFS. That there's probably less so, >>So, and I'm gonna come back to that. But so for instance, many, when I, when I say peers, I mean some users in ISV, some guys we've had on here, we had an Abby Mehta from Triceda on the other day, for instance, he's one who said, I just don't have time to mess with that stuff and figure out all that API integration. I mean, there are people out there that just don't want to go that route. Okay. But, but you're saying I'm, I'm inferring this plenty who do right. >>And the, and by the API route, I want to make sure I understand what you're saying. You >>Talked about, Hey, it's all about the API integration. It's not >>About, it's not the, it it's about the API APIs being consistent, a hundred percent compatible. Right. So if I, you know, write a program, that's, that's going after HDFS and the HDFS API, I want to make sure that that'll run on other distributions. Right. >>And that's your promise. Yeah. Okay. All right. So now where I was going with this was th again, there are some peers to say, oh, I just don't want to mess with all that. Now let's talk about what that means to mess with all that. So comScore was a big, high profile case study for you guys. They, they were cloud era customer. They basically, in my understanding is a couple of days migrated from Cloudera to Mapbox. And the impetus was, let's talk about that. Why'd they do that >>Performance data protection, ease of use >>License fee issues. There was some license issues there as well, right? The, the, your, your maintenance pricing was more attractive. Is that true? Or >>I read more mainly about price performance and reliability, and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. Didn't actually tell all their users, they had one guys debug the software for half a day because something was wrong. It finished so quickly. >>So, so it took him a couple of days to migrate and then boom, >>Boom. And they've, they handle about 30 billion objects a day. So there, you know, the use of that really high performance support for, for streaming data flows, you know, they're talking about, they're doing forecasts and insights into web behavior, and, you know, they w the earlier they can do that, the better off they are. So >>Greg, >>So talk about the implications of, of your approach in terms of the customer base. So I'm, I'm imagining that your customers are more, perhaps advanced than a lot of your typical Hadoop users who are just getting started tinkering with Hadoop. Is it fair to say, you know, your customers know what they want and they want performance and they want it now. And they're a little more advanced than perhaps some of the typical early adopters. >>We've got people to go to our website and download the free version. And some of them are just starting off and getting used to Hadoop, but we did specifically target those very experienced Hadoop users that, you know, we're kind of, you know, stubbing their toes on, on the issues. And so they're very receptive to the message of we've made it faster. We've made it more reliable, you know, we've, we've added a lot of ease of use to the, to the Hindu. >>So I found this, let me interrupt, go back to what I was saying before is I found this comment that I found online from Mike Brown comScore. Skipio I presume you mean, he said comScore's map our direct access NFS feature, which exposes a duke distributed file system data as NFS files can then be easily mounted, modified, or overwritten. So that's a data access simplification. You also said we could capitalize on the purchase of map bar with an annual maintenance charge versus a yearly cost per node. NFS allowed our enterprise systems to easily access the data in the cluster. So does that make sense to you that, that enterprise of that annual maintenance charge versus yearly cost per node? I didn't get that. >>Oh, I think he's talking about some, some organizations prefer to do a perpetual license versus a subscription model that's >>Oh, okay. So the traditional way of licensing software >>And that, that you have to do it basically reinforces the fact that we've really invested in have kind of a, a product, you know, orientation rather than just services on top of, of some opensource. >>Okay. So you go in, you license it and then yeah. Perpetual license. >>Then you can also start with the free edition that does all the performance NFS support kick the tires >>Before you buy it. Sorry. Sorry, Jeff. Sorry to interrupt. No, no problem >>At all. So another topic, a lot of interest is security making a dupe enterprise ready. One of the pillars, there is security, making sure access controls, for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from some of the other vendors out there, or the other >>Full Kerberos support. We Lincoln to enterprise standards for access eldap, et cetera. We leveraged the Linux, Pam security, and we also provide volume control. So, you know, right now in Hindu in Apache to dupe other distributions, you put policies at the file level or the entire cluster. And we see many organizations having separate physical clusters because of that limitation, right? And we'd provide volume. So you can define a volume. And in that volume control, access control, administrative privileges data protection class, and, you know, in a sense kind of segregate that content. And that provides a lot of, a lot of control and a lot more, you know, security and protection and separation of data. >>That scenario, the comScore scenario, common where somebody's moving off an existing distribution onto a map are, or, or you more going, going, seeing demand from new customers that are saying, Hey, what's this big data thing I really want to get into it. How's it shake out there >>Right now? There's this huge pent up demand for these features. And we're seeing a lot of people that have run on other distributions switched to map our >>A little bit of everything. How about, can you talk a little bit about your, your channel? You go to market strategy, maybe even some of your ecosystem and partnerships in the little time. >>Sure. So EMC is a big partner of the EMC Greenplum Mr. Edition is basically a map R you can start with any of our additions and upgrade to that. Greenplum with just a licensed key that gives us worldwide service and support. It's been a great partnership. >>We hear a lot of proof of concepts out there >>For, yeah. And then it just hit the news news today about EMC's distribution, Mr. Distribution being available with UCS Cisco's ECS gear. So now that's further expanded the, the footprint that we have about. >>Okay. So you're the EMC relationship. Anything else that you can share with us? >>We have other announcements coming out and >>Then you want to pre-announce in the queue. >>Oops. Did I let that slip >>It's alive? So be careful. And so, in terms of your, your channel strategy, you guys mostly selling direct indirect combination, >>It's it? It, it's kind of an indirect model through these, these large partners with a direct assist. >>Yeah. Okay. So you guys come in and help evangelize. Yep. Excellent. All right. Do you have anything else before we gotta got a roll here? >>Yeah, I did wonder if you could talk a little bit about, you mentioned EMC Greenplum so there's a lot of talk about the data warehouse market, the MPB data warehouses, versus a Hadoop based on that relationship. I'm assuming that Matt BARR thinks well, they're certainly complimentary. Can you just touch on that? And, you know, as opposed to some who think, well, Hadoop is going to be the platform where we go, >>Well, th th there's just, I mean, if you look at the typical organization, they're just really trying to get their, excuse me, their arms around a lot of this machine generated content, this, you know, unstructured data that just growing like wildfire. So there's a lot of Paducah specific use cases that are being rolled out. They're also kind of data lakes, data, oceans, whatever you want to call it, large pools where that information is then being extracted and loaded into data warehouses for further analysis. And I think the big pivot there is if it's well understood what the issue is, you define the schema, then there's a whole host of, of data warehouse applications out there that can be deployed. But there's many things where you don't really understand that yet having to dupe where you don't need to find a schema a is a, is a big value, >>Jack, I'm sorry. We have to go run a couple of minutes behind. Thank you very much for coming on the cube. Great story. Good luck with everything. And sounds like things are really going well and market's heating up and you're in the right place at the right time. So thank you again. Thank you to Jeff. And we'll be right back everybody to the strata conference live in Santa Clara, California, right after this word from our.

Published Date : Apr 27 2012

SUMMARY :

And you know, we've been going all week. A lot of attention, a lot of focus, a lot of discussion about Hadoop So that's So I think there's a lot of, And you guys have had your hand in that. broader use cases, more mission, critical support, you know, being able to sit in and work Let me ask it that way. So there, there are some just big issues with, One of those issues, let's talk about that. So there's a lot of focus of, you know, what do I do once the data's in So you risk data loss. It gets back to when you guys were, you know, thinking about doing this. It was about, you know, giving an alternative choice, better Unix. So you went out and talked to some customers. And I mean, what were they telling you at the time? I there've been gyrus for open for four or five You know, I want it to be And I think, I think that, you know, duke has a huge amount of momentum. So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, I mean, they've got their own I think we're less because it's so easy to get data in and out with our NFS. So, and I'm gonna come back to that. And the, and by the API route, I want to make sure I understand what you're saying. Talked about, Hey, it's all about the API integration. So if I, you know, write a program, that's, that's going after for you guys. Is that true? and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. you know, the use of that really high performance support for, to say, you know, your customers know what they want and they want performance and they want it now. experienced Hadoop users that, you know, we're kind of, you know, So does that make sense to you that, So the traditional way of licensing software And that, that you have to do it basically reinforces the fact that we've really invested in have kind Before you buy it. for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from a lot of control and a lot more, you know, security and protection and separation of data. off an existing distribution onto a map are, or, or you more going, And we're seeing a lot of people that have run on other distributions switched to map our How about, can you talk a little bit about your, your channel? Mr. Edition is basically a map R you can start with any of our additions So now that's further Anything else that you can share with us? you guys mostly selling direct indirect combination, It, it's kind of an indirect model through these, these large partners with Do you have anything else before And, you know, as opposed to some who think, excuse me, their arms around a lot of this machine generated content, this, you know, So thank you again.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Jeff	PERSON	0.99+
Jack Norris	PERSON	0.99+
five	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Jack	PERSON	0.99+
EMC	ORGANIZATION	0.99+
last year	DATE	0.99+
Matt BARR	PERSON	0.99+
four	QUANTITY	0.99+
UCS	ORGANIZATION	0.99+
2,500 people	QUANTITY	0.99+
Santa Clara, California	LOCATION	0.99+
Greg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Mike Brown	PERSON	0.99+
half a day	QUANTITY	0.99+
Spinnaker	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
comScore	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Riley	ORGANIZATION	0.98+
EMC Greenplum	ORGANIZATION	0.98+
Abby Mehta	PERSON	0.98+
Linux	TITLE	0.97+
strata conference	EVENT	0.97+
SQL	TITLE	0.97+
One	QUANTITY	0.97+
one guys	QUANTITY	0.97+
today	DATE	0.97+
Raleigh	ORGANIZATION	0.97+
122 different entrance	QUANTITY	0.97+
six people	QUANTITY	0.97+
Skipio	PERSON	0.96+
Jeff Kelly	PERSON	0.95+
single point	QUANTITY	0.95+
about 30 billion objects a day	QUANTITY	0.94+
Strata Conference 2012	EVENT	0.93+
ECS	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.91+
Triceda	ORGANIZATION	0.9+
Apache	TITLE	0.9+
firs	QUANTITY	0.9+
Paducah	LOCATION	0.89+
Greenplum	ORGANIZATION	0.89+
single points	QUANTITY	0.88+
day three	QUANTITY	0.88+
NFS	TITLE	0.87+
Wiki bond.org	OTHER	0.87+
1400	QUANTITY	0.85+
Unix	TITLE	0.85+
Wiki bonds	ORGANIZATION	0.84+
Silicon angle	ORGANIZATION	0.83+
Mapbox	ORGANIZATION	0.78+
Apache	ORGANIZATION	0.76+
MapReduce	ORGANIZATION	0.75+
Kerberos	ORGANIZATION	0.75+
first open	QUANTITY	0.74+
Pam	TITLE	0.73+
Matt bar	ORGANIZATION	0.73+
Nazanin	ORGANIZATION	0.61+
Cloudera	TITLE	0.59+
moon	LOCATION	0.58+
Cisco	ORGANIZATION	0.54+
one	QUANTITY	0.53+
days	QUANTITY	0.52+
MapReduce	TITLE	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for comScore: