Jack Norris, MapR - Spark Summit East 2016 #SparkSummit #theCUBE

>>From New York expecting the signal to nine. It's the cube covering sparks summit east brought to you by spark summit. Now your hosts, Dave Volante and George Gilbert >>Right here in Midtown at the Hilton hotel. This has sparked somebody and this is the cube. The cube goes out to the events. We extract the signal from the noise. Jack Norris is here. He's the CMO of Mapbox, long time cube, alum jackets. It's great to see you again. Hey, if you've been here since the beginning of this whole big data >>Meme and it might've started here, I don't know. I think we've yeah, >>I think you're right. I mean, it really did start it. I think in this building, it was our first big data show at the original, you know, uh, uh, Hadoop world. And, uh, and you guys, like I say, I've been there from the start. Uh, you were kind of impatient early on. You said, you know, we're just going to go build solutions and, uh, and ignore the noise and you built a really nice, nice business. Um, you guys have been growing, you're growing your Salesforce and, uh, and things are good and all of a sudden, boom, the spark thing comes in. So we're seeing the evolution. I remember saying to George and the early days of a dupe, we were geeking out talking to all the bits and bytes and then it turned into a business discussion. It's like we're back to the hardcore bits and bites. So give us the update from Matt bar's point of view, where are we in the whole big data space? >>Well, I think, um, I think it has transitioned. I mean, uh, if you look at the typical large fortune company, the web to Datto's, it's really, how do we best leverage our data and how do we leverage our data in that we can, we can make decisions much faster, right? That high-frequency decision-making process. Um, and typically that involves taking production data and analytics and joining them together so that you're actually impacting business as it happens and to do that effectively requires, um, innovations. So the exciting thing about spark is taking and, uh, and having a distributed compute engine, it's much easier to develop and, uh, in much faster. >>So in the remember the early days we'd be at these shows and the big question was, you know, can you take the humans out of the equation? It's like, no, no humans are the last mile. Um, is that, is that changing or would we still need that human interaction or, >>Um, humans are important part of the process, but increasingly if you can adjust and make, you know, small algorithmic decisions, um, and, and make those decisions at that kind of moment of truth, you got big impact, and I'll give you a few examples. So, um, ad platforms, you know, Rubicon project over a hundred billion ad auctions a day, you know, humans, part of that process in terms of setting that up and reviewing the process, but each, you know, each supply and demand decision, there is an automated decision optimizing that has a huge impact on the bottom line, um, fraud, uh, you know, credit card swiping that transaction and deciding is this fraudulent or not avoiding false positives, et cetera, a big leveraged item. So we're seeing things like that across manufacturing, across retail healthcare. And, um, it isn't about asking bigger questions or doing reports and looking back at, you know, what happened last week. It's more, how can I have an infrastructure in place that allows this organization to be agile? Because it's not the companies with the most data that's going to win. It's the companies that are the most agile and making intelligent. >>So it's so much data. Humans can ingest it any faster. I mean, we just, we can't keep up. So the world needs data scientists that needs trained developers. You've got some news I want to talk about on the training side, but even that we can only throw so many bodies at the problem. So it's really software. That's going to allow us to scale it. Software's hard. Software takes time. So we've seen a lot of the spend in the analytics, big data world on, on services. And obviously you guys and others have been working hard to shift it towards software. I want to come back to that training issue. We heard this morning about, uh, Databricks launched a move. They trained 20,000 people. That's a lot, but still long way to go. You guys are putting some investment into training. Talk about that news. Yeah. >>Yeah. Um, well it starts at the underlying software. If you can do things in the platform to make it much easier and do things that are hard to surround with services, like, uh, data protection, right? If you've lost data, it doesn't matter how many people you throw at it, you can't recover it. Right. So that's kind of the starting point you're gonna get fired. >>The, the, uh, the approach we've taken is, is to take, uh, a software product approach to the training as well. So we rolled out on demand training. So it's free, it's on demand. You work at your own pace. It's got different modules, there's some training associated with that, or some hands-on labs, if you will. Um, we launched that last January. So it's basically coming up the year anniversary. We recently celebrated, we trained 50,000 people, uh, on, on Hadoop and big data. Um, today we're announcing expansion on spark classes. We've got full curriculum around spark, including a certification. So you can get sparked certification through this, this map, our on demand training. Okay. >>Gotcha. You said something really, really intriguing that I want to dive into a little bit is where we were talking about the small decisions that can be made really, really fast for that a human in the loop human might have to train them, but it at runtime now where you said, it's not about asking bigger questions, it's finding faster answers, um, what had to change in your platform or in the underlying technology to make that possible. >>You know, um, there's a lot that into it. It's typically a series of functions, uh, a kind of breadth that needs to be brought to the problem as well as squeezing out latencies. So instead of, um, the traditional approach, which is different applications and different analytic techniques dictate a separate silo, a separate, you know, scheme of data. And you've got those all around the organization and data kind of travels, and you get an answer at the end of some period of time. Uh, it's converging that altogether into a single platform, squeezing out those latencies so that you can have an informed action at the speed of business, if you will. And, >>Um, let's say spark never came along. Would that be possible? >>Yes. Yes. Would you, how would you, so if you look at kind of the different architectures that are out there, there's typically deep analytics in terms of, you know, let's go look at the trends, you know, the last seven years, what happened. And then look, let's look at, um, doing actions on a streaming set, say for instance, storm, and then let's do a real time database operations. So you could do that with, with HBase or map RDB and all of that together. What spark has really done is made that whole development process just much easier and much more streamlined. And that's where a lot of the excitements happen. >>So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. Um, and I want to ask you about those in the state of those. So ad tech obviously has come a long way, but it's still got a ways to go. I mean, you look at, I mean, who's making money on ads. Obviously Google will make tons of money. Everybody else is sorta chasing them Facebook making money. It's probably cause they didn't let Google in. Okay. So how will spark affect sort of that business? Uh, and, and what's map, R's sort of role in evolving that, you know, to the next level. >>So, so, um, there's, there's different kind of compute and the types of things you can do, um, on the data. I think increasingly we're seeing the kind of streaming analytics and making those decisions as the data arrives, right. And then there's the whole ecosystem in terms of how do you coordinate those flows of data? It's not just a simple, here's the origin, here's the destination. There's typically a complex data flow. Um, that's where we've kind of focused on map our streams, this huge publish and subscribe infrastructure so that you can get real-time data to the appropriate location and then do the right operations, a lot of that involved with spark, but not exclusively. >>Okay. And then on fraud detection, um, obviously come a long way. Sampling could have died. Yes. And now, but now we're getting too many false positives. You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, um, but now what about the next level? What are you guys doing to take fraud detection to the next level? So that when I get on the plane in Boston and I land in London, it knows, um, is that a database problem? Is it an integration problem, a systems problem, and how, what role you guys play in solving that? >>Well, there's, there's, um, you know, there's, there's a lot of details and techniques that probably go, um, beyond, you know, what, what we'll share publicly or what are our customers talk about publicly? I think in general, it's the more data that you can apply to a problem. The more context, the better off you are, that's the way I kind of summarize it so that instead of a sampling or instead of a boy, that's a strange purchase over there, it's understanding, well, this is Dave Valenti and this is the full body of, of, uh, expenditures he's done, then the types of things and here's who he frequently purchases from. And here's kind of a transaction trend started in San Francisco, went to New York, et cetera. So in context it would make more sense. So >>Part of that is more data. And the other part of that is just better algorithms and better, better learnings and applying that on a continuous basis. How are your customers dealing with that, that constraint? I mean, if they got a, a hundred dollars to spend, yeah. They can only spend so much on, on each of those gathering more data, cleaning the data, they spent so much time getting it ready versus making their machine learning algorithms or whatever the other techniques to do. What are you seeing there as sort of best practice? It was probably varies. I'm sure, but give us some color on it. >>Um, I'll actually go back to Google and Google a letter last round, um, you know, excellent, excellent insights coming from Google. They wrote a paper called the unreasonable effectiveness of data and in it, they basically squarely addressed that problem. And given the choice to invest in either the complex model and algorithm or put more data at it, putting more data, had a huge impact. And, um, you know, my simple explanation is if you're sampling the data, you have to have a model that tries to recreate reality. If you're looking at all of the data, then the anomalies can, can pop up and be more apparent. And, um, the more context you can bring, the more data from other sources. So you get around, you know, a better picture of what's happening, the better off you are. And so that requires scale. It requires speed and requires different techniques that can be brought to bear, right? The database operation, here's a streaming operation, here's a deep, you know, file machine learning algorithm. >>So there's a lot of vendors in the sort of big data ecosystem are coming at spark from different angles and, um, are, are trying to add value to it and sort of bathe themselves in sort of the halo. Yep. Now you guys took some time upfront to build a converged platform so that you weren't trying to wrap your arms around 37 different projects. Can you tell us how having perhaps not anticipated spark how this converts platform allows you to add more value to it than other approaches? >>So, so we simplify, if you look at the Hadoop ecosystem, it's basically separated into the components for compute and management on top of the data layer, right? The Hadoop distributed file system. So how do you scale data? How do you protect it? It's very simply what's going on. Spark really does a great job at that top layer. Doesn't do anything about defining the underlying storage layer in the Hadoop community that underlying storage layer is a batch system. So you're trying to do, you know, micro batch kind of streaming operations on top of batch oriented data. What we addressed was to take that whole data layer, make it real time, make it random. Read-write converge enterprise storage together with Hadoop support and spark support on a single platform. And that's basically >>With the difference and to make an enterprise great. You guys were really the first to lead the lecture. You were, everybody started talking about attic price straight after you were kind of delivering it. So you've had a lead there. Do you feel like you still have a lead there, or is that the kind of thing where you sort of hit the top of the S-curve and start innovating elsewhere? >>NC state did a study, uh, just this past year, a recent study identified that only 25% of data corruption issues are identified and properly handled by the Hadoop distributed file system. 42% of those are silent. So there's a huge gap in terms of quote unquote enterprise grade features and what we think. >>Yes, silent data corruption has been a problem for decades now. And you're saying it's no different in the duke ecosystem, especially as, as mainstream businesses start to, uh, to adopt this what's happening in the valley. Uh, we're seeing, you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. Uh, you guys are funded, you know, you're growing, you're talking about investments, you know, what do you see? Do you, do you feel like you're achieving escape velocity? Um, maybe give us sort of an update on, uh, the state of the business. >>Yeah. I, I think the state of the business is best represented by the customers, right? And the customers kind of vote, right. They vote in terms of, you know, how well is this technology driving their business? So we've got a recent study, um, that kind of shows the, the returns that customers, um, are getting, uh, we've got a 1% chance, a 99% retention rate with our customers. We've got, uh, an expansion rate. That's, that's unbelievable. We've got multi-million dollar customers in, uh, in seven of the top verticals and nine out of the top $10 million customers. So we're seeing significant investments and more importantly, significant returns on the part of customers where they're not just doing a single application on the platform, but multiple >>Applications, Jack Norris map are always focused. Always a pleasure having you on the cube. Thanks very much for coming on. Appreciate it. Keep right there, buddy. We'll be back with our next guest is the cube we're live from spark somebody's right back. Okay.

Published Date : Feb 17 2016

SUMMARY :

covering sparks summit east brought to you by spark summit. It's great to see you again. I think we've yeah, You said, you know, we're just going to go build solutions and, if you look at the typical large fortune company, So in the remember the early days we'd be at these shows and the big question was, you know, and reviewing the process, but each, you know, each supply and demand decision, And obviously you guys and others have been working hard to shift it towards software. If you can do things in the platform to make it much easier and do things that are hard to surround So you can get sparked certification through really fast for that a human in the loop human might have to train them, but it at runtime around the organization and data kind of travels, and you get an answer at the end of some period Would that be possible? let's go look at the trends, you know, the last seven years, what happened. So you mentioned earlier, um, to, to use cases, ad tech and fraud detection. so that you can get real-time data to the appropriate location and then do the right operations, You get the call and, you know, I mean, I get a lot of calls because we can buy so much equipment, but, The more context, the better off you are, that's the way I kind of summarize What are you seeing there as sort of best practice? um, you know, my simple explanation is if you're sampling the data, this converts platform allows you to add more value to it than other approaches? So how do you scale data? You were, everybody started talking about attic price straight after you were kind of delivering it. and properly handled by the Hadoop distributed file system. you know, in the wall street journal every day you read about down rounds, flat rounds, people can't get B rounds. They vote in terms of, you know, Always a pleasure having you on the cube.

ENTITIES

Entity	Category	Confidence
Dave Valenti	PERSON	0.99+
Jack Norris	PERSON	0.99+
Dave Volante	PERSON	0.99+
New York	LOCATION	0.99+
London	LOCATION	0.99+
George	PERSON	0.99+
San Francisco	LOCATION	0.99+
Boston	LOCATION	0.99+
George Gilbert	PERSON	0.99+
99%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
42%	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
50,000 people	QUANTITY	0.99+
nine	QUANTITY	0.99+
20,000 people	QUANTITY	0.99+
last week	DATE	0.99+
Datto	ORGANIZATION	0.99+
last January	DATE	0.99+
$10 million	QUANTITY	0.98+
seven	QUANTITY	0.98+
each	QUANTITY	0.98+
first	QUANTITY	0.98+
Mapbox	ORGANIZATION	0.98+
today	DATE	0.97+
1%	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Matt	PERSON	0.96+
single platform	QUANTITY	0.96+
NC	ORGANIZATION	0.95+
this morning	DATE	0.95+
single application	QUANTITY	0.94+
25%	QUANTITY	0.94+
Midtown	LOCATION	0.93+
first big	QUANTITY	0.92+
Rubicon	ORGANIZATION	0.92+
37 different projects	QUANTITY	0.92+
last seven years	DATE	0.89+
over a hundred billion ad auctions a day	QUANTITY	0.88+
this past year	DATE	0.86+
spark	ORGANIZATION	0.85+
multi-million dollar	QUANTITY	0.84+
decades	QUANTITY	0.83+
a hundred dollars	QUANTITY	0.79+
data corruption	QUANTITY	0.7+
HBase	TITLE	0.67+
Hilton	ORGANIZATION	0.67+
RDB	TITLE	0.64+
Spark	ORGANIZATION	0.57+
MapR	ORGANIZATION	0.57+
map	TITLE	0.57+
Salesforce	ORGANIZATION	0.53+
2016	EVENT	0.51+
- Spark Summit	EVENT	0.46+
East	LOCATION	0.42+

Jack Norris - Hadoop on the Hudson - theCUBE

>>Live from New York city. It's cute. here's your host? Jeff Frick. >>Hi, Jeff Frick here with the Q we're on the ground at the USS Intrepid at the Hadoop on the Hudson party put on by Matt BARR. It's uh, I think it's the party of the night tonight here in big data week, New York city with strata cough, a dupe world, big data NYC. So Jack a great >>Venue. Yeah, it's excellent. Here. >>The place is filled. I'm just struck by the technology. There's a Gemini capsule over there, about 50 years old. It's about the size of a Volkswagen, I think would be much bigger. And to think that those guys went up into space with probably less technology than is on your four year old flip phone. Amazing. Yeah. >>Not, not much data at all. No. If >>You look at it, just kind of get that bounce on the gravity thing, which I never quite understood. So talk about you guys had some big news today. Once you give us a rundown on some of the announcements, >>We had two big announcements. One was incorporating the map RDB and our community edition that came out. We also reported results from our customers where the majority of customers reported less than a 12 month payback, uh, 65% of five X or greater return and 40%, 10 X or greater. And that included a subset of those customers that had experienced with other distributions. So kind of a Testament to when you get serious about Hadoop, you get serious with Mapbox >>And when they're getting those return on investments, we're always trying to explore where's the big, the big ROI, because it's really in value that's released for the customer. It's not necessarily because it's a cheaper way to do it, >>Right? So, so there are some costs that 63% was cost reduction that was driving it about 41% were top-line revenue projects. And about 23% were related to risk reduction and risk mitigation. And if you add those up, it's greater than a hundred percent because of many customers that are doing multiple applications. >>Great. So you've been coming to Hadoop world for longer than you would admit to me before we came on camera and, and the baseball playoffs are going on right now. I mean, we like to talk in sports analogy. So kind of where are we in, in kind of what inning are we in this adoption of big data and the duke specifically >>Early, early innings. Um, but, uh, what we've seen is the bases are loaded and we're up >>And it's it. And it seems to be we're way past now the POC stage. Now we're really getting in there for that. >>And the, the customer announcement, we did kind of shows how people are hitting it out of the park with Hadoop. And a lot of that is by impacting the operations, impacting the business as it happens. And that's coupling analytics plus this higher arrival rate data from a variety of sources and making adjustments so that you can impact revenue as businesses happening. You can mitigate risk as it's happening. It's not just reporting, looking back >>Function. Right, right. It's being able to react in real time, which is defined by, in time to do something about it. Right. Exactly. All right. Well, thanks for hosting a great party, Jack Norris. Here we are on the ground, uh, at the USS Intrepid at the Hadoop on the Hudson. Uh, uh, if you take a nice picture, tweet that in. I think they got some prizes. Hadoop Hudson is a hashtag Jeff Frick on the ground. You're watching the cube. Thanks. Big ship.

Published Date : Oct 22 2014

SUMMARY :

It's cute. It's uh, I think it's the party of the night tonight here And to think that those guys went up into space with probably less technology than is on your four Not, not much data at all. You look at it, just kind of get that bounce on the gravity thing, which I never quite understood. So kind of a Testament to when you get serious about Hadoop, And when they're getting those return on investments, we're always trying to explore where's the big, And if you add those up, it's greater than a hundred percent because of many customers that are doing multiple applications. So kind of where are we in, Um, but, uh, what we've seen is the bases are loaded and we're up And it seems to be we're way past now the POC stage. And a lot of that is by impacting the operations, It's being able to react in real time, which is defined by,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
40%	QUANTITY	0.99+
Jack Norris	PERSON	0.99+
Matt BARR	PERSON	0.99+
65%	QUANTITY	0.99+
63%	QUANTITY	0.99+
One	QUANTITY	0.99+
10 X	QUANTITY	0.99+
New York city	LOCATION	0.99+
NYC	LOCATION	0.99+
today	DATE	0.99+
greater than a hundred percent	QUANTITY	0.99+
about 23%	QUANTITY	0.99+
Volkswagen	ORGANIZATION	0.98+
two big announcements	QUANTITY	0.98+
Jack	PERSON	0.98+
about 41%	QUANTITY	0.98+
five X	QUANTITY	0.98+
about 50 years old	QUANTITY	0.94+
Mapbox	ORGANIZATION	0.93+
Hadoop	TITLE	0.93+
tonight	DATE	0.91+
less than a 12 month	QUANTITY	0.91+
Hudson	LOCATION	0.87+
Hadoop	LOCATION	0.86+
four year old	QUANTITY	0.83+
Hadoop on	LOCATION	0.78+
USS Intrepid	ORGANIZATION	0.76+
map RDB	TITLE	0.68+
Hadoop Hudson	TITLE	0.68+
Gemini	COMMERCIAL_ITEM	0.53+
some	QUANTITY	0.5+
Hadoop on the	TITLE	0.5+

Jack Norris - Hadoop Summit 2014 - theCUBE - #HadoopSummit

>>The queue at Hadoop summit, 2014 is brought to you by anchor sponsor Hortonworks. We do, I do. And headline sponsor when disco we make Hadoop invincible >>Okay. Welcome back. Everyone live here in Silicon valley in San Jose. This is a dupe summit. This is Silicon angle and Wiki bonds. The cube is our flagship program. We go out to the events and extract the signal to noise. I'm John barrier, the founder SiliconANGLE joins my cohost, Jeff Kelly, top big data analyst in the, in the community. Our next guest, Jack Norris, COO of map R security enterprise. That's the buzz of the show and it was the buzz of OpenStack summit. Another open source show. And here this year, you're just seeing move after, move at the moon, talking about a couple of critical issues. Enterprise grade Hadoop, Hortonworks announced a big acquisition when all in, as they said, and now cloud era follows suit with their news. Today, I, you sitting back saying, they're catching up to you guys. I mean, how do you look at that? I mean, cause you guys have that's the security stuff nailed down. So what Dan, >>You feel about that now? I think I'm, if you look at the kind of Hadoop market, it's definitely moving from a test experimental phase into a production phase. We've got tremendous customers across verticals that are doing some really interesting production use cases. And we recognized very early on that to really meet the needs of customers required some architectural innovation. So combining the open source ecosystem packages with some innovations underneath to really deliver high availability, data protection, disaster recovery features, security is part of that. But if you can't predict the PR protect the data, if you can't have multitenancy and separate workflows across the cluster, then it doesn't matter how secure it is. You know, you need those. >>I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. Silicon lucky bond is so successful, but I just don't understand your business model without plates were free content and they have some underwriters. So you guys have been very successful yet. People aren't looking at map are as good at the quiet leader, like you doing your business, you're making money. Jeff. He had some numbers with us that in the Hindu community, about 20% are paying subscriptions. That's unlike your business model. So explain to the folks out there, the business model and specifically the traction because you have >>Customers. Yeah. Oh no, we've got, we've got over 500 paying customers. We've got at least $1 million customer in seven different verticals. So we've got breadth and depth and our business model is simple. We're an enterprise software company. That's looking at how to provide the best of open source as well as innovations underneath >>The most open distribution of Hadoop. But you add that value separately to that, right? So you're, it's not so much that you're proprietary at all. Right. Okay. >>You clarify that. Right. So if you look at, at this exciting ecosystem, Hadoop is fairly early in its life cycle. If it's a commoditization phase like Linux or, or relational database with my SQL open source, kind of equates the whole technology here at the beginning of this life cycle, early stages of the life cycle. There's some architectural innovations that are really required. If you look at Hadoop, it's an append only file system relying on Linux. And that really limits the types of operations. That types of use cases that you can do. What map ours done is provide some deep architectural innovations, provide complete read-write file systems to integrate data protection with snapshots and mirroring, et cetera. So there's a whole host of capabilities that make it easy to integrate enterprise secure and, and scale much better. Do you think, >>I feel like you were maybe a little early to the market in the sense that we heard Merv Adrian and his keynote this morning. Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about nine years into Hadoop. Do you feel like maybe you guys were a little early and now you're at a tipping point, whereas these more, as more and more deployments get ready to go to production, this is going to be an area that's going to become increasingly important. >>I think, I think our timing has been spectacular because we, we kind of came out at a time when there was some customers that were really serious about Hadoop. We were able to work closely with them and prove our technology. And now as the market is just ramping, we're here with all of those features that they need. And what's a, what's an issue. Is that an incremental improvement to provide those kind of key features is not really possible if the underlying architecture isn't there and it's hard to provide, you know, online real-time capabilities in a underlying platform that's append only. So the, the HDFS layer written in Java, relying on the Linux file system is kind of the, the weak underbelly, if you will, of, of the ecosystem. There's a lot of, a lot of important developments happening yarn on top of it, a lot of really kind of exciting things. So we're actively participating in including Apache drill and on top of a complete read-write file system and integrated Hindu database. It just makes it all come to life. >>Yeah. I mean, those things on top are critical, but you know, it's, it's the underlying infrastructure that, you know, we asked, we keep on community about that. And what's the, what are the things that are really holding you back from Paducah and production and the, and the biggest challenge is they cited worth high availability, backup, and recovery and maintaining performance at scale. Those are the top three and that's kind of where Matt BARR has been focused, you know, since day one. >>So if you look at a major retailer, 2000 nodes and map bar 50 unique applications running on a single cluster on 10,000 jobs a day running on top of that, if you look at the Rubicon project, they recently went public a hundred million add actions, a hundred billion ad auctions a day. And on top of that platform, beats music that just got acquired for $3 billion. Basically it's the underlying map, our engine that allowed them to scale and personalize that music service. So there's a, there's a lot of proof points in terms of how quickly we scale the enterprise grade features that we provide and kind of the blending of deep predictive analytics in a batch environment with online capabilities. >>So I got to ask you about your go to market. I'll see Cloudera and Hortonworks have different business models. Just talk about that, but Cloudera got the massive funding. So you get this question all the time. What do you, how do you counter that army and the arms race? I think >>I just wrote an article in Forbes and he says cash is not a strategy. And I think that was, that was an excellent, excellent article. And he goes in and, you know, in this fast growing market, you know, an amount of money isn't necessarily translate to architectural innovations or speeding the development of that. This is a fairly fragmented ecosystem in terms of the stack that runs on top of it. There's no single application or single vendor that kind of drives value. So an acquisition strategy is >>So your field Salesforce has direct or indirect, both mixable. How do you handle the, because Cloudera has got feet on the street and every squirrel will find it, not if they're parked there, parking sales reps and SCS and all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. Yeah. And they're going to actually try to engage the clients. So, you know, I guess it is a strategy if they're deploying sales and marketing, right? So >>The beauty about that, and in fact, we're all in this together in terms of sharing an API and driving an ecosystem, it's not a fragmented market. You can start with one distribution and move to another, without recompiling or without doing any sort of changes. So it's a fairly open community. If this were a vendor lock-in or, you know, then spending money on brand, et cetera, would, would be important. Our focus is on the, so the sales execution of direct sales, yes, we have direct sales. We also have partners and it depends on the geographies as to what that percentage is. >>And John Schroeder on with the HP at fifth big data NYC has updated the HP relationship. >>Oh, excellent. In fact, we just launched our application gallery app gallery, make it very easy for administrators and developers and analysts to get access and understand what's available in the ecosystem. That's available directly on our website. And one of the featured applications there today is an integration with the map, our sandbox and HP Vertica. So you can get early access, try it and get the best of kind of enterprise grade SQL first, >>First Hadoop app store, basically. Yeah. If you want to call it that way. Right. So like >>Sure. Available, we launched with close to 30, 30 with, you know, a whole wave kind of following that. >>So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. So, you know, there's a lot of talk about that. Some confusion about the different methods for applying SQL on predicts or map art takes an open approach. I know you'll support things like Impala from, from a competitor Cloudera, talk about that approach from a map arts perspective. >>So I guess our, our, our perspective is kind of unbiased open source. We don't try to pick and choose and dictate what's the right open source based on either our participation or some community involvement. And the reality is with multiple applications being run on the platform, there are different use cases that make difference, you know, make different sense. So whether it's a hive solution or, you know, drill drills available, or HP Vertica people have the choice. And it's part of, of a broad range of capabilities that you want to be able to run on the platform for your workflows, whether it's SQL access or a MapReduce or a spark framework shark, et cetera. >>So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, you've got Impala, you've got hive. And the stinger initiative is, is that whole kind of SQL on Hadoop ecosystem, still working itself out. Are we going to have this many options in a year or two years from now? Or are they complimentary and potentially, you know, each has its has its role. >>I think the major differences is kind of how it deals with the new data formats. Can it deal with self-describing data? Sources can leverage, Jason file does require a centralized metadata, and those are some of the perspectives and advantages say the Apache drill has to expand the data sets that are possible enabled data exploration without dependency on a, on an it administrator to define that, that metadata. >>So another, maybe not always as exciting, but taking workloads from existing systems, moving them to Hadoop is one of the ways that a lot of people get started with, to do whether associated transformation workloads or there's something in that vein. So I know you've announced a partnership with Syncsort and that's one of the things that they focus on is really making it as easy as possible to meet those. We'll talk a little bit about that partnership, why that makes sense for you and, and >>When your customer, I think it's a great proof point because we announced that partnership around mainframe offload, we have flipped comScore and experience in that, in that press release. And if you look at a workload on a mainframe going to duke, that that seems like that's a, that's really an oxymoron, but by having the capabilities that map R has and making that a system of record with that full high availability and that data protection, we're actually an option to offload from mainframe offload, from sand processing and provide a really cost effective, scalable alternative. And we've got customers that had, had tried to offload from the mainframe multiple times in the past, on successfully and have done it successfully with Mapbox. >>So talk a little bit more about kind of the broader partnership strategy. I mean, we're, we're here at Hadoop summit. Of course, Hortonworks talks a lot about their partnerships and kind of their reseller arrangements. Fedor. I seem to take a little bit more of a direct approach what's map R's approach to kind of partnering and, and as that relates to kind of resell arrangements and things like, >>I think the app gallery is probably a great proof point there. The strategy is, is an ecosystem approach. It's having a collection of tools and applications and management facilities as well as applications on top. So it's a very open strategy. We focus on making sure that we have open API APIs at that application layer, that it's very easy to get data in and out. And part of that architecture by presenting standard file system format, by allowing non Java applications to run directly on our platform to support standard database connections, ODBC, and JDBC, to provide database functionality. In addition to kind of this deep predictive analytics really it's about supporting the broadest set of applications on top of a single platform. What we're seeing in this kind of this, this modern architecture is data gravity matters. And the more processing you can do on a single platform, the better off you are, the more agile, the more competitive, right? >>So in terms of, so you're partnering with people like SAS, for example, to kind of bring some of the, some of the analytic capabilities into the platform. Can you kind of tell us a little bit about any >>Companies like SAS and revolution analytics and Skytree, and I mean, just a whole host of, of companies on the analytics side, as well as on the tools and visualization, et cetera. Yeah. >>Well, I mean, I, I bring up SAS because I think they, they get the fact that the, the whole data gravity situation is they've got it. They've got to go to where the data is and not have the data come to them. So, you know, I give them credit for kind of acknowledging that, that kind of big data truth ism, that it's >>All going to the data, not bringing the data >>To the computer. Jack talk about the success you had with the customers had some pretty impressive numbers talking about 500 customers, Merv agent. The garden was on with us earlier, essentially reiterating not mentioning that bar. He was just saying what you guys are doing is right where the puck is going. And some think the puck is not even there at the same rink, some other vendors. So I gotta give you props on that. So what I want you to talk about the success you have in specifically around where you're winning and where you're successful, you guys have struggled with, >>I need to improve on, yeah, there's a, there's a whole class of applications that I think Hadoop is enabling, which is about operations in analytics. It's taking this, this higher arrival rate machine generated data and doing analytics as it happens and then impacting the business. So whether it's fraud detection or recommendation engines, or, you know, supply chain applications using sensor data, it's happening very, very quickly. So a system that can tolerate and accept streaming data sources, it has real-time operations. That is 24 by seven and highly available is, is what really moves the needle. And that's the examples I used with, you know, add a Rubicon project and, you know, cable TV, >>The very outcome. What's the primary outcomes your clients want with your product? Is it stability? And the platform has enabled development. Is there a specific, is there an outcome that's consistent across all your wins? >>Well, the big picture, some of them are focused on revenues. Like how do we optimize revenue either? It's a new data source or it's a new application or it's existing application. We're exploding the dataset. Some of it's reducing costs. So they want to do things like a mainframe offload or data warehouse offload. And then there's some that are focused on risk mitigation. And if there's anything that they have in common it's, as they moved from kind of test and looked at production, it's the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. So it's not, it's not anything new. It's just like, Hey, we've got SLS and I've got data protection policies, and I've got a disaster recovery procedure. And why can't I expect the same level of capabilities in Hindu that I have today in those other systems. >>It's a final question. Where are you guys heading this year? What's your key objectives. Obviously, you're getting these announcements as flurry of announcements, good success state of the company. How many employees were you guys at? Give us a quick update on the numbers. >>So, you know, we just reported this incredible momentum where we've tripled core growth year over year, we've added a tremendous amount of customers. We're over 500 now. So we're basically sticking to our knitting, focusing on the customers, elevating the proof points here. Some of the most significant customers we have in the telco and financial services and healthcare and, and retail area are, you know, view this as a strategic weapon view, this is a huge competitive advantage, and it's helping them impact their business. That's really spring our success. We've, you know, we're, we're growing at an incredible clip here and it's just, it's a great time to have made those calls and those investments early on and kind of reaping the benefits. >>It's. Now I've always said, when we, since the first Hadoop summit, when Hortonworks came out of Yahoo and this whole community kind of burst open, you had to duke world. Now Riley runs at it's a whole different vibe of itself. This was look at the developer vibe. So I got to ask you, and we would have been a big fan. I mean, everyone has enough beachhead to be successful, not about map arbors Hortonworks or cloud air. And this is why I always kind of smile when everyone goes, oh, Cloudera or Hortonworks. I mean, they're two different animals at this point. It would do different things. If you guys were over here, everyone has their quote, swim lanes or beachhead is not a lot of super competition. Do you think, or is it going to be this way for awhile? What's your fork at some? At what point do you see more competition? 10 years out? I mean, Merv was talking a 10 year horizon for innovation. >>I think that the more people learn and understand about Hadoop, the more they'll appreciate these kind of set of capabilities that matter in production and post-production, and it'll migrate earlier. And as we, you know, focus on more developer tools like our sandbox, so people can easily get experienced and understand kind of what map are, is. I think we'll start to see a lot more understanding and momentum. >>Awesome. Jack Norris here, inside the cube CMO, Matt BARR, a very successful enterprise grade, a duke player, a leader in the space. Thanks for coming on. We really appreciate it. Right back after the short break you're live in Silicon valley, I had dupe December, 2014, the right back.

Published Date : Jun 4 2014

SUMMARY :

The queue at Hadoop summit, 2014 is brought to you by anchor sponsor I mean, cause you guys have that's the security stuff nailed down. I think I'm, if you look at the kind of Hadoop market, I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. That's looking at how to provide the best of open source But you add that value separately to So if you look at, at this exciting ecosystem, Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about isn't there and it's hard to provide, you know, online real-time And what's the, what are the things that are really holding you back from Paducah So if you look at a major retailer, 2000 nodes and map bar 50 So I got to ask you about your go to market. you know, in this fast growing market, you know, an amount of money isn't necessarily all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. We also have partners and it depends on the geographies as to what that percentage So you can get early If you want to call it that way. a whole wave kind of following that. So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. And it's part of, of a broad range of capabilities that you want So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, of the perspectives and advantages say the Apache drill has to expand the data sets why that makes sense for you and, and And if you look at a workload on a mainframe going to duke, So talk a little bit more about kind of the broader partnership strategy. And the more processing you can do on a single platform, the better off you are, Can you kind and I mean, just a whole host of, of companies on the analytics side, as well as on the tools So, you know, I give them credit for kind of acknowledging that, that kind of big data truth So what I want you to talk about the success you have in specifically around where you're winning and you know, add a Rubicon project and, you know, cable TV, And the platform has enabled development. the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. Where are you guys heading this year? So, you know, we just reported this incredible momentum where we've tripled core and this whole community kind of burst open, you had to duke world. And as we, you know, focus on more developer tools like our sandbox, a duke player, a leader in the space.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Jack Norris	PERSON	0.99+
John Schroeder	PERSON	0.99+
HP	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
$3 billion	QUANTITY	0.99+
December, 2014	DATE	0.99+
Jason	PERSON	0.99+
Matt BARR	PERSON	0.99+
10,000 jobs	QUANTITY	0.99+
Today	DATE	0.99+
10 year	QUANTITY	0.99+
Syncsort	ORGANIZATION	0.99+
Dan	PERSON	0.99+
Silicon valley	LOCATION	0.99+
John barrier	PERSON	0.99+
Java	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
24	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
this year	DATE	0.99+
Jack	PERSON	0.99+
fifth	QUANTITY	0.99+
Linux	TITLE	0.99+
Skytree	ORGANIZATION	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
today	DATE	0.98+
one	QUANTITY	0.98+
Merv	PERSON	0.98+
about 10 years	QUANTITY	0.98+
San Jose	LOCATION	0.98+
Hadoop	EVENT	0.98+
about 20%	QUANTITY	0.97+
seven	QUANTITY	0.97+
over 500	QUANTITY	0.97+
a year	QUANTITY	0.97+
about 500 customers	QUANTITY	0.97+
SQL	TITLE	0.97+
seven different verticals	QUANTITY	0.97+
two years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
2014	DATE	0.96+
Apache	ORGANIZATION	0.96+
Hadoop	LOCATION	0.95+
SiliconANGLE	ORGANIZATION	0.94+
comScore	ORGANIZATION	0.94+
single vendor	QUANTITY	0.94+
day one	QUANTITY	0.94+
Salesforce	ORGANIZATION	0.93+
about nine years	QUANTITY	0.93+
Hadoop Summit 2014	EVENT	0.93+
Merv	ORGANIZATION	0.93+
two different animals	QUANTITY	0.92+
single application	QUANTITY	0.92+
top three	QUANTITY	0.89+
SAS	ORGANIZATION	0.89+
Riley	PERSON	0.88+
First	QUANTITY	0.87+
Forbes	TITLE	0.87+
single cluster	QUANTITY	0.87+
Mapbox	ORGANIZATION	0.87+
map R	ORGANIZATION	0.86+
map	ORGANIZATION	0.86+

Jack Norris - BigDataNYC 2013 - theCUBE - #BigDataNYC

>>I from Midtown Manhattan, the cute quiet coverage of big data NYC Civicon angled, Wiki bonds production made possible by Hortonworks. We do hairdo and lamb disco and new made invincible. And now your hosts, John furrier and Volante >>Hi buddy. We're back. This is Dave Volante with Jeff Kelly with Wiki bond. And this is the cube Silicon angle's continuous production. We're here at big data NYC right across the street from the Hilton where strata comp and a dupe world is going on. We've got a multi-time cube guest, Jack Norris, the CMO of map bars here, Jack. Welcome back to the cube first. So by the way, thank you so much for the support. As you know, we're across the street here at the Warwick hotel map, our, you guys have always been so generous supporting the cube. We can't thank you enough for that. So really appreciate it. Thank you. So we were able to listen to your keynote yesterday. It was, we, we, we weren't broadcasting, you know, head to head yesterday and had an opportunity to hear your keynote. So, first of all, how did that go? I want to ask you some questions about it. >>It, it was a really well-received and I think people were kind of clamoring to try to separate the myths from, from reality on, on Hadoop, >>We had three myths that you talked about, you know, one related to the distraction. I'd like to get into some of those. So what was the, the first myth was around the, the, the, the district distribution battle. So take us through that. >>So, you know, th the impression that it's a knock-down drag-out competitive battle across Hadoop distributions was the first myth. And the reality is that all of the distribution share the same open source Apache code. And this is one of the first markets that's really, really created, or the first open-source technologies it's really created a market. I mean, look, what's happened here with this whole, this whole big data and Hadoop, but given that early stage, there's the requirement to really combine that open source code with additional innovations to meet customer needs. And so what you see is you see those aggregators that are taken open source, you see others that are taking the open source, and then adding maybe management utility, couple of, of, you know, different applications on top. And then our approach at map R is we're taking the open source with those management innovations, doing some development, the open source community with things like Apache drill, and then really focusing on the underlying architecture, the data platform and providing innovations at that layer. So >>Actually sort of the three major destroys that we talk about all the time. You know, you guys, Hortonworks and Hadoop, you guys have been consistent the whole time as has Hortonworks, right? Cloud era basically put out a post recently saying, Hey, kind of going in a different direction, sort of what I call the tapped out of the Hadoop distro, you know, piece of it. But so there's a lot of discussion around it. You're putting forth the, Hey, it's not an internet seen war, but does it matter is my question? >>Well, I think if you take a step back, the Hadoop ecosystem is incredibly strong growing very, very quickly, fastest growing big data technology, one of the top 10 technologies overall. And I think it's because we are sharing the same API. It is possible for customers to learn on one, develop and move seamlessly to another. And, you know, in the keynote, I talked about the difference between the no SQL market, which is, you know, there is no consensus there and, and customers have to figure out not only what's the right word workload, but what's the technology that's actually going to have some staying power, right? >>That's a powerful comment. Amazon turn the data center and into an API, or you as the duke community is essentially turning data, access into an API. And that is a very powerful and leverageable concept. Okay. Your second myth was around the whole, no SQL yes. Piece of it. You help you put up a slide. I thought I read Jeff Kelly's reports. And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, but so take us through myth. Number two >>Too. I'm sure we missed some >>There wasn't room on the slide for anymore. >>The, yeah, it's basically about the consensus. There is no real consensus. There's no common API. There's no ability to move applications seamlessly across no SQL solutions. If you look at one no SQL solution, and that's, HBase a big inherent advantage because it's integrated with Hindu, you know, this whole trend is about compute and data together. So if you've got a no sequel solution, that's on that same, you know, massive data store, you know, big leg up. And, and then we got into the, well, if you've got HBase, it's included in all the distributions and all the distribution share the same open source, then obviously it must run the same across all distributions. And there, we shared some pretty interesting data to show the difference. When you, when you do architectural differences and innovations underneath that you can dramatically change the performance of, of not only MapReduce, but of no SQL. Yes. >>Okay. So not all no SQL is created equally. Not all HBase is created equally as essentially what you're saying there. Now the third piece was to dupe is enterprise ready, right? Yeah. So you guys were first to say, well, we have a Hadoop platform that's enterprise ready way ahead on that. Got criticized a lot for going down that path shrugged and said, okay, we'll just keep doing business with customers. And you've been again, very clear and consistent on that. So talk about the third myth >>And that's, you know, is, is Hadoop ready for prime time? And I think the way to combat that myth is by customer examples and showing the tremendous success that customers are enjoying with Hadoop. And, you know, we, we don't have time on the cube here to go through all of them, but, you know, I like to point out 90 billion auctions a day with Rubicon, they've surpassed Google in terms of ad reach. They're doing that on Mapbox 1.7 trillion events a month with comScore that's on, on map bar. You look in, in traditional enterprise, you know, a single retailer with over 2000 nodes of Hadoop. I mean, it's a key part of their merchandising and retail operations, and combining all sorts of, of data feeds and all sorts of use cases there, financial services over a thousand nodes of risk medication, personalized offers streamlining their operations. I mean, it's, it's dramatic. And then, you know, we shared some of the more, more interesting ones, esoteric ones like garbage and whiskey and weather prediction. >>There was consider these, we even as diverse and eclectic as they are, they consider these mission critical application. >>Oh, absolutely. No it it's. And I think that's the difference because what we're talking about is not Hadoop as this cash, right? This temporary processing, where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And yes, there are applications like that, but companies soon realized that if I'm going to use this as a key part of my operations, and it's about data on compute, then I want a consistent permanent store. I want a system of record. So all of the SLS and high availability and data protection features that they expect in their enterprise applications should be present in Hadoop, right? That's where we focus. Let's run down a couple of those. >>What are some of the key capabilities that you need in an enterprise enterprise grade platform? That map bar is >>Well, let's, let's take, let's take business continuity cause that's important if you're really going to trust data there. And you know, one of the big drivers as you expand data is how much am I going to spend on it? And if you look at a large investment bank, $270 million of their budget, not total, but incremental to address the additional capacity, there's a big emphasis for let's look at a better way to do that. So instead of spending $15,000 a terabyte, if you can spend a few hundred dollars a terabyte, that's a huge, huge advantage. And that's the focus of Hindu, but to do that, well, then the features that are in this enterprise storage have to be present. And we're talking about, you know, mirroring and not a copy table function, but replication, that's how that's how organizations do it, right. If you're going to recovery and recovery, you know, you can't back up a petabyte of information through a copy function, right? You have to do a snapshot and the snapshots have to be consistent, right. And, and we're not saying anything that, you know, an enterprise administrator doesn't know, there is some confusion when you're more on the developer side as to what these features are and the difference between a fuzzy snapshot and a point in time, consistent snaps. >>Got it. So let's talk a little bit about the, the enterprise data hub, this, this concept that Michael Wilson with clutter introduced yesterday. Tell us a little bit about your take on, on, on Mike's I guess, definition and, and essentially I think trying to name the category of kind of what Hadoop can do and what, and where it sits in the architecture. Did you agree with his, his, >>Yeah. I mean, if you look at, at that description, it's about I'm taking important data and I'm putting it in a dupe and I'm combining a lot of different data sources and it's been referred to as a data lake and a data reservoir and a data ocean. I mean, we've heard a lot of terms. We worked with an outside consultant that was originally an architect at Terre data. It's been about eight months, almost a year ago now where he defined it and enterprise data hub. And it's it's, he went through kind of the list of requirements. And once you move from a transitory to a permanent store, then that becomes an enterprise data hub. And an enterprise data hub can be used to select and process information, maybe it's ETL and serve some downstream applications. It can also be useful to do analysis directly on it, to, you know, to serve different business functions. But the system requirements that he established for that I think are absolutely true. And it's, you have to have the full data protection. You have to have the full disaster recovery. You have to have the full high availability because this is going to be important data serving the organization. If it's data that you can lose, if it's data that you, you don't really care about having highly available, then it's a very narrow use case that that data hub serves. >>So you're saying the enterprise data hub isn't ready for prime time. >>No, I'm saying that there, there are requirements. And we have companies today that have deployed an enterprise data hub and they are quite successful with it. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 times faster and it's 10 times cheaper than what they're seeing. >>Soundbite, Dave, >>I agree, but it's nuanced. Right. And so, you know, the customers cause a lot of vendors, right? They're all saying the same thing to the customers, right? So you've got your messaging that you've, you know, you've proven out over the last several years and then the entire market starts to use the same terminology. So it is, this is why I, like, I think this, what is, what are those >>Things? We're in a little bit of this, this kind of marketing fog here in the relative early stages. I think the best response there is customer proof points. And I think some education in the very beginning, you know, when they're in development and test, it's really important to understand, you know, what is Hadoop and what can I use it for and what data source am I going to leverage? I think the features that we're talking about really start to show up as you deploy in production. And as you expand its use in production and there we've enjoyed tremendous success, >>But he would argue that you have a lead in this space. I wouldn't, I don't think you would either the space being robustness enterprise ready, mission criticality is your lead increasing, decreasing staying the same. >>What's your sense? Well, it's hard cause there's no, you know, th th there's no external service that's out there, you know, interviewing every customer and, and giving numbers. I do know that we passed 500 paying customers. I do know that we've got significant deployments and you can measure those in terms of number of nodes, you know, in the thousands of nodes, you can measure those in terms of use cases. So we've got, you know, one company they've passed 20 different use cases on the same cluster. I think that's an interesting proof point. We're scaling in terms of the number of, of people in an organization that are trained in leveraging the data in map are again in the, in the thousands. So, you know, I think this market is so big and so dynamic that this isn't about, you know, one company success at the expense of everyone. Else's zero sum game. I think, you know, we're all here kind of raising this, this boat and focusing on this paradigm shift, but when it comes to production success, that's our focus. And I think that's where we've, we've proven that >>One thing I'm really want to get your opinion on, you know, as, as to do matures and some of the innovations you guys are doing and, and making the platform, you know, basically a multi application platform, you can do more things with Hadoop. And we've been talking about this on the cube, is that as that happens, you're going to start you as an industry. You're going to start bumping up against the EDW vendors and some of the other database vendors in the traditional world. And you're now you're doing some of the things that those, those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. There's no overlap. We're gonna all play nice. But increasingly we're seeing that there is an overlap. How do you view that? Is that, and what is your relationship with those, with those EDW vendors and, and what are you hearing from customers when you go into a customer? Okay. >>So, I mean, there's a, there's a lot in that question. I think the F the first comment though, is don't look at Hadoop through this single data warehouse lens. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse where there's, here's a few decades of experience, there, there are many organizations that have a lot of activities that are based in that data warehouse. And that's where we're seeing a data warehouse offload that is complimentary, but it gives organizations this lever to say, well, I'm going to control the fill rate, and I'm going to take some of the data that's no longer, you know, really active and put that on Hadoop and really change my ability to manage the costs in a data warehouse environment. The other thing that's interesting is that the types of applications that duper doing, I think are creating a new class it's about operations and analytics, kind of combined together, taking high arrival rate data and making very quick micro changes to optimize whether that's fraud detection or recommendation engines, or taking sensor data and predictive analytics for, for maintenance, et cetera. There is just a tremendous number of, of applications. In some cases, leveraging a new data source in some cases, doing new applications, but it's just opening things up. And, and I think organizations are moving to be very data-driven and Hadoop is at the center of that. >>And you control the field, right? That's another really good soundbites. And, and these that, you mentioned this high arrival rate data, this fraud detection, predictive analytics, maintenance, these are things that you're doing today with >>Navarre right? Yeah, >>Absolutely. Great. All right, Jack. Well, listen, always a pleasure. Thanks very much for coming by. Great to see you again. All right. Keep it right there about Uber, right back with our next guest. This is the cube we're live from the big apple.

Published Date : Oct 30 2013

SUMMARY :

I from Midtown Manhattan, the cute quiet coverage of big data NYC So by the way, thank you so much for the We had three myths that you talked about, you know, one related to the distraction. So, you know, th the impression that it's a knock-down drag-out sort of what I call the tapped out of the Hadoop distro, you know, piece of it. And, you know, in the keynote, I talked about the difference between the no SQL market, And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, that's on that same, you know, massive data store, you know, big leg up. So you guys were first to say, And that's, you know, is, is Hadoop ready for prime time? where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And you know, one of the big drivers as you expand Did you agree with his, his, to, you know, to serve different business functions. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 And so, you know, the customers cause a lot of you know, when they're in development and test, it's really important to understand, you know, I wouldn't, I don't think you would either the space being robustness enterprise so dynamic that this isn't about, you know, one company success at the expense those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse And you control the field, right? Great to see you again.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Michael Wilson	PERSON	0.99+
10 times	QUANTITY	0.99+
Jack	PERSON	0.99+
Jack Norris	PERSON	0.99+
10 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$270 million	QUANTITY	0.99+
Mike	PERSON	0.99+
yesterday	DATE	0.99+
Dave Volante	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
Hadoop	TITLE	0.99+
Midtown Manhattan	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
Volante	PERSON	0.99+
thousands	QUANTITY	0.99+
first	QUANTITY	0.99+
20 different use cases	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
second	QUANTITY	0.99+
John furrier	PERSON	0.98+
NYC	LOCATION	0.98+
two years ago	DATE	0.98+
Hadoop	ORGANIZATION	0.98+
first comment	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
SQL	TITLE	0.97+
Terre data	ORGANIZATION	0.97+
One	QUANTITY	0.97+
1.7 trillion events	QUANTITY	0.97+
third	QUANTITY	0.97+
today	DATE	0.97+
one	QUANTITY	0.96+
single	QUANTITY	0.96+
a year ago	DATE	0.95+
one company	QUANTITY	0.94+
HBase	TITLE	0.94+
Navarre	PERSON	0.93+
EDW	ORGANIZATION	0.92+
over 2000 nodes	QUANTITY	0.91+
big apple	ORGANIZATION	0.91+
first markets	QUANTITY	0.9+
nodes	QUANTITY	0.89+
about eight months	QUANTITY	0.88+
2013	DATE	0.88+
Soundbite	ORGANIZATION	0.87+
three myths	QUANTITY	0.87+
Hindu	ORGANIZATION	0.87+
first open-source	QUANTITY	0.86+
Wiki bond	ORGANIZATION	0.85+
BigDataNYC	EVENT	0.85+
$15,000 a terabyte	QUANTITY	0.85+
three major	QUANTITY	0.82+
90 billion auctions a day	QUANTITY	0.81+
500 paying customers	QUANTITY	0.79+
comScore	ORGANIZATION	0.79+
map R	ORGANIZATION	0.78+
over a thousand nodes	QUANTITY	0.77+
Hilton	LOCATION	0.77+
few hundred dollars a terabyte	QUANTITY	0.76+
Number two	QUANTITY	0.76+
10 technologies	QUANTITY	0.74+

Jack Norris - Hadoop Summit 2013 - theCUBE - #HadoopSummit

>>Ash it's, you know, what will that mean to my investment? And the announcement fusion IO is that, you know, we're 25 times faster on read intensive HBase applications. The combination. So as organizations are deploying Hadoop, and they're looking at technology changes coming down the pike, they can rest assured that they'll be able to take advantage of those in a much more aggressive fashion with map R than, than other distribution. >>Jack, how I got to ask you, we were talking last night at the Hadoop summit, kind of the kickoff party and, you know, everyone was there. All the top execs were there and all the developers, you know, we were in the queue. I think, I think that either Dave or myself coined the term, the big three of big data, you guys ROMs cloud Cloudera map R and Hortonworks, really at the, at the beginning of the key players early on and Charles from Cloudera was just recently on. And, and he's like, oh no, this, this enterprise grade stuff has been kicked around. It's been there from the beginning. You guys have been there from the beginning and Matt BARR has never, ever waffled on your, on your messaging. You've always been very clear. Hey, we're going to take a dupe open source a dupe and turn it into an enterprise grade product. Right. So that's clear, right? That's, that's, that's a great, that's a great, so what's your take on this because now enterprise grade is kind of there, I guess, the buzz around getting the, like the folks that have crossed the chasm implemented. So what can you comment on that about one enterprise grade, the reality of it, certainly from your perspective, you haven't been any but others. And then those folks that are now rolling it out for the first time, what can you share with them around? What does it mean to be enterprise grade? >>So enterprise grade is more about the customer experience than, than a marketing claim. And, you know, by enterprise grade, what we're talking about are some of the capabilities and features that they've grown to expect in their, their other enterprise applications. So, you know, the ability to meet full S SLA is full ha recovery from multiple failures, rolling upgrades, data protection was consistent snapshots business continuity with mirroring the ability to share a cluster across multiple groups and have, you know, volumes. I mean, there's a, there's a host of features that fall under the umbrella enterprise grade. And when you move from no support for any of those features to support to a few of them, I don't think that's going to, to ha it's more like moving to low availability. And, and there's just a lot of differences in terms of when we say enterprise grade with those features mean versus w what we view as kind of an incomplete story. So >>What do you, what do you mean by low availability? Well, I mean, it's tongue in cheek. It's nice. It's a good term. It's really saying, you know, just available when you sometimes is that what you mean? Is this not true availability? I mean, availability is 99.9%. Right? >>Right. So if you've got a, an ha solution that can't recover from multiple failures, that's downtime. If you've got an HBase application that's running online and you have data that goes down and it takes 10 to 30 minutes to have the region servers recover it from another place in the distribution, that's downtime. If you have snapshots that aren't consistent across the cluster, that doesn't provide data protection, there's no point in time recovery for, for a cluster. So, you know, there's a lot of details underneath that, but what it, what it amounts to is, do you have interruptions? Do you have downtime? Do you have the potential for losing data? And our answer is you need a series of features that are hardened and proven to deliver that. >>What about recoverability? You mentioned that you guys have done a lot of work in that area with snapshotting, that's kind of being kicked around, are our folks addressing, what are the comp what's your competition doing in those areas of recoverability just mentioned availability. Okay, got that. Recoverability security, compliance, and usability. Those are the areas that seem to be the hot focus areas what's going on in the energy. How would you give them the grade, the letter grade, if you will, candidly, compared to what you guys offer? Well, the, >>The first of all, it's take recoverability. You know, one of the tenants is you have a point in time recovery, the ability to restore to a previous point that's consistent across the cluster. And right now there's, there's no point in time recovery for, for HDFS, for the files. And there's no point in time recovery for HBase tables. So there's snapshot support. It's being talked about in the open source community with respect to snapshots, but it's being referred to in the JIRAs as fuzzy snapshots and really compared to copy table. >>So, Jack, I want to turn the conversation to the, kind of the topic we've talked about before kind of the open versus a proprietary that, that whole debate we've, we've, we've heard about that. We talked about that before here on the cube. So just kind of reiterate for us your take. I mean, we, we hear perhaps because of the show we're at, there's a lot of talk about the open source nature of Hadoop and some of the purists, as you might call them are saying, it's gotta be open a hundred percent Patrick compatible, et cetera. And then there's others that are taking a different approach, explain your approach and why you think that's the key way to make, to really spur adoption of a dupe and make it >>W w we're we're a part of the community we're, we've got, you know, commitment going on. We've, you know, pioneered and pushed a patchy drill, but we have done innovations as well. And I think that those innovations are really required to support and extend the, the whole ecosystem. So canonical distributes RN, three D distribution. We've got, you know, all our, our packages are, are available on get hub and, and open source. So it's not, it's not a binary debate. And I think the, the point being that there's companies that have jumped ahead and now that Peloton is, is, you know, pedaling faster and, and we'll, we'll catch up. We'll streamline. I think the difference is we rearchitected. So we're basically in a race car and, you know, are, are racing ahead with, with enterprise grade features that are required. And there's a lot of work that still needs to be done, needs to be accomplished before that full rearchitecture is, is in place. >>Well, I mean, I think for me, the proof is really in the pudding when you, when it comes to talk about customers that are doing real things and real production, grade mission, critical applications that they're running. And to me that shows the successor or relative success of a given approach. So I know you guys are working with companies like ancestry.com, live nation and Quicken loans. Maybe you could, could you walk us through a couple of those scenarios? Let's take ancestry.com. Obviously they've got a huge amount of data based on the kind of geological information, where do you guys do >>With them? Yeah, so they've got, I mean, they've got the world's largest family genealogy services available on the web. So there's a massive amount of data that they make accessible and, and, you know, ability for, for analysis. And then they've rolled out new features and new applications. One of which is to ship a kit out, have people spit in a tube, returned back and they do DNA matching and reveal additional details. So really some really fabulous leading edge things that are being done with, with the use of, of Hadoop. >>Interesting. So talk about when you went to, to work with them, what were some of their key requirements? Was it around, it was more around the enterprise enterprise, grade security and uptime kind of equation, or was it more around some of the analytics? What, what, what's the kind of the killer use case for them? >>It's kind of, you know, it's, it's hard with a specific company or even, you know, to generalize across companies. Cause they're really three main areas in terms of ease of use and administration dependability, which includes the full ha and then, and then performance. And in some cases, it's, it's just one of those that kind of drives it. And it's used to justify, in other cases, it's kind of a collection. The ease of use is being able to use a cluster, not only as Hadoop, but to access it and treat it like enterprise storage. So it's a complete POSIX compliance file system underneath that allows the, the mounting and access and updates and using it in dynamic read-write. So what that means from an application level, it's, it's faster, it's much easier to administer and it's much easier and reliable for developers to, to utilize. >>I got to ask you about the marketing question cause I see, you know, map our, you guys have done a good job of marketing. Certainly we want to be thankful to you guys is supporting the cube in the past and you guys have been great supporters of our mission, but now the ecosystem's evolving a lot more competition. Claudia mentioned those eight companies they're tracking in quote Hadoop, and certainly Jeff and I, and, and SiliconANGLE by look at there's a lot more because Hadoop washing has been going on now for the term Hadoop watching me and jumping in and doing Hadoop, slapping that onto an existing solution. It's not been happening full, full, full bore for a year. At least what's the next for you guys to break above the noise? Obviously the communities are very active projects are coming online. You guys have your mission in the enterprise. What's the strategy for you guys going forward is more of the same and anything new even share. >>Yeah, I, I, I think as far as breaking above the noise, it will be our customers, their success and their use cases that really put the spotlight on what the differences are in terms of, of, you know, using a big data platform. And I think what, what companies will start to realize is I'd rather analogy between supply chain and the big, the big revolution in supply chain was focusing on inventory at each stage in the supply chain. And how do you reduce that inventory level and how do you speed the, the flow of goods and the agility of a company for competitive advantage. And I think we're going to view data the same way. So companies instead of raw data that they're copying and moving across different silos, if they're able to process data in place and send small results sets, they're going to be faster, more agile and more competitive. >>And that puts the spotlight on what data platform is out there that can support a broad set of applications and it can have the broadest set of functionality. So, you know, what we're delivering is a mission grade, you know, enterprise grade mission, critical support platform that supports MapReduce and does that high performance provides NFS POSIX access. So you can use it like a file system integrates, you know, enterprise grade, no SQL applications. So now you can do, you know, high-speed consistent performance, real time operations in addition to batch streaming, integrated search, et cetera. So it's, it's really exciting to provide that platform and have organizations transform what they're doing. >>How's the feedback on with Ted Dunning? I haven't seen a lot of buzz on the Twittersphere is getting positive feedback here. He's a, a tech athlete. He's a guru, he's an expert. He's got his hands in all the pies. He's a scientist type. What's he up to? What's his, what's his role within Mapa and he's obviously playing in the open-source community. What's he up to these days, >>Chief application architect, he's on the leading edge of my house. So machine learning, so, you know, sharing insights there, he was speaking at the storm meetup two nights ago and sharing how you can integrate long running batch, predictive analytics with real-time streaming and how the use of snapshots really that, that easy and possible. He travels the world and is helping organizations understand how they can take some very complex, long running processes and really simplify and shorten those >>Chance to meet him in New York city had last had duke world at a, at a, a party and great guy, fantastic geek, and certainly is doing a great work and shout out to Ted. Congratulations, continue up that support. How's everyone else doing? How's John and Treevis doing how's the team at map are we're pedaling as best as you can growing >>Really quickly. No, we're just shifting gears. Would it be on pedaling >>Engine? >>Yeah. Give us an update on the company in terms of how the growth and kind of where you guys are moving that. >>Yeah. We're, we're expanding worldwide, you know, just this, you know, last few months we've opened up offices and in London and Munich and Paris, we're expanding in Asia, Japan and Korea. So w our, our sales and services and engineering, and basically across the whole company continues to expand rapidly. Some really great, interesting partnerships and, and a lot of growth Natalie's we add customers, but it's, it's nice to see customers that continue to really grow their use of map are within their organization, both in terms of amount of data that they're analyzing and the number of applications that they're bringing to bear on the platform. >>Well, that a little bit, because I think, you know, one of the, one of the trends we do see is when a company brings in big data, big data platform, and they might start experiment experimenting with it, build an application. And then maybe in the, maybe in the marketing department, then the sales guys see it and they say, well, maybe we can do something with that. How is that typically the kind of the experience you're seeing and how do you support companies that want to start expanding beyond those initial use cases to support other departments, potentially even other physical locations around the world? How do you, how do you kind of, >>That's been the beauty of that is if you have a platform that can support those new applications. So if you know, mission critical workloads are not an issue, if you support volumes so that you can logically separate makes it much easier, which we have. So one of our customers Zions bank, they brought in Matt BARR to do fraud detection. And pretty soon the fact that they were able to collect all of that data, they had other departments coming to them and saying, Hey, we'd like to use that to do analysis on because we're not getting that data from our existing system. >>Yeah. They come in and you're sitting on a goldmine, there are use cases. And you also mentioned kind of, as you're expanding internationally, what's your take on the international market for big data to do specifically is, is the U S kind of a leaps and bounds ahead of the rest of the world in terms of adoption of the technology. What are you seeing out there in terms of where, where the rest of the, >>I wouldn't say leaps and bounds, and I think internationally, they're able to maybe skip some of the experimental steps. So we're seeing, we're seeing deployment of class financial services and telecom, and it's, it's fairly broad recruit technologies there. The largest provider of recruiting services, indeed.com is one of their subsidiaries they're doing a lot with, with Hadoop and map are specifically, so it's, it's, it's been, it's been expanding rapidly. Fantastic. >>I also, you know, when you think about Europe, what's going on with Google and some of the, the privacy concerns even here, or I should say, is there, are there different regulatory environments you've got to navigate when you're talking about data and how you use data when you're starting to expand to other, other locales? >>Yeah. There's typically by vertical, there's different, different requirements, HIPAA and healthcare, and basal to, and financial services. And so all of those, and it, it, it basically, it's the same theme of when you're bringing Hadoop into an organization and into a data center, the same sorts of concerns and requirements and privacy that you're applying in other areas will be applied on Hindu. >>I'm now kind of turning back to the technology. You mentioned Apache drill. I'd love to get an update on kind of where, where that stands. You know, it's put, then put that into context for people. We hear a lot about the SQL and Hadoop question here, where does drill fit into that, into that equation? >>Well, the, the, you know, there's a lot of different approaches to provide SQL access. A lot of that is driven by how do you, how do you leverage some of the talent and organization that, you know, speak SQL? So there's developments with respect to hive, you know, there's other projects out there. Apache drill is an open source project, getting a lot of community involvement. And the design center there is pretty interesting. It started from the beginning as an open source project. And two main differences. One was in looking at supporting SQL it's, let's do full ANSI SQL. So it's full 2003 ANSI, sequel, not a SQL like, and that'll support the greatest number of applications and, you know, avoid a lot of support and, and issues. And the second design center is let's support a broad set of data sources. So nested sources like Jason scheme on discovery, and basically fitting it into an enterprise environment, which sometimes is kinda messy and can get messy as acquisitions happen, et cetera. So it's complimentary, it's about, you know, enabling interactive, low latency queries. >>Jack, I want to give you the final word. We are out of time. Thanks for coming on the cube. Really preached. Great to see you again, keep alumni, but final word. And we'll end the segment here on the cube is your quick thoughts on what's happening here at Hadoop world. What is this show about? Share with the audience? What's the vibe, the summary quick soundbite on Hadoop. >>I think I'll go back to how we started. It's not, if you used to do putz, how you use to do and, you know, look at not only the first application, but what it's going to look like in multiple applications and pay attention to what enterprise grade means. >>Okay. They were secure. We got a more coverage coming, Jack Norris with map R I'll say one of the big three original, big three, still on the, on the list in our mind, and the market's mind with a unique approach to Hadoop and the mid-June great. This is the cube I'm Jennifer with Jeff Kelly. We'll be right back after this short break, >>Let's settle the PR program out there and fighting gap tech news right there. Plenty of the attack was that providing a new gadget. Let's talk about the latest game name, but just the.

Published Date : Jun 27 2013

SUMMARY :

IO is that, you know, we're 25 times faster on read intensive HBase applications. All the top execs were there and all the developers, you know, So, you know, the ability to meet full S SLA is full ha It's really saying, you know, just available when So, you know, there's a lot of details compared to what you guys offer? You know, one of the tenants is you have a point of Hadoop and some of the purists, as you might call them are saying, it's gotta be open a hundred percent that Peloton is, is, you know, pedaling faster and, and we'll, we'll catch up. So I know you guys are working with companies like ancestry.com, live nation and Quicken that they make accessible and, and, you know, ability for, So talk about when you went to, to work with them, what were some of their key requirements? It's kind of, you know, it's, it's hard with a specific company or even, I got to ask you about the marketing question cause I see, you know, map our, you guys have done a good job of marketing. And how do you reduce that inventory level and how do you speed the, you know, what we're delivering is a mission grade, you know, enterprise grade mission, How's the feedback on with Ted Dunning? so, you know, sharing insights there, he was speaking at the storm meetup How's John and Treevis doing how's the team at map are we're pedaling as best as you can No, we're just shifting gears. and basically across the whole company continues to expand rapidly. Well, that a little bit, because I think, you know, one of the, one of the trends we do see is when a company brings in big data, That's been the beauty of that is if you have a platform that can support those And you also mentioned kind of, they're able to maybe skip some of the experimental steps. and it, it, it basically, it's the same theme of when you're bringing Hadoop into We hear a lot about the SQL and Hadoop question support the greatest number of applications and, you know, avoid a lot of support and, Great to see you again, you know, look at not only the first application, but what it's going to look like in multiple This is the cube I'm Jennifer with Jeff Kelly. Plenty of the attack was that providing a new gadget.

ENTITIES

Entity	Category	Confidence
Ted	PERSON	0.99+
London	LOCATION	0.99+
Claudia	PERSON	0.99+
Jeff Kelly	PERSON	0.99+
Asia	LOCATION	0.99+
Ted Dunning	PERSON	0.99+
Jack Norris	PERSON	0.99+
Dave	PERSON	0.99+
John	PERSON	0.99+
Jack	PERSON	0.99+
10	QUANTITY	0.99+
Paris	LOCATION	0.99+
Korea	LOCATION	0.99+
Matt BARR	PERSON	0.99+
Munich	LOCATION	0.99+
New York	LOCATION	0.99+
99.9%	QUANTITY	0.99+
Jennifer	PERSON	0.99+
Treevis	PERSON	0.99+
25 times	QUANTITY	0.99+
Japan	LOCATION	0.99+
Google	ORGANIZATION	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
Jeff	PERSON	0.99+
eight companies	QUANTITY	0.99+
first time	QUANTITY	0.99+
mid-June	DATE	0.99+
Charles	PERSON	0.98+
Europe	LOCATION	0.98+
30 minutes	QUANTITY	0.98+
One	QUANTITY	0.98+
first application	QUANTITY	0.98+
Ash	PERSON	0.98+
two nights ago	DATE	0.98+
Hortonworks	ORGANIZATION	0.98+
each stage	QUANTITY	0.97+
SQL	TITLE	0.97+
SiliconANGLE	ORGANIZATION	0.97+
Natalie	PERSON	0.97+
ancestry.com	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Patrick	PERSON	0.96+
last night	DATE	0.95+
Jason	PERSON	0.95+
2003	DATE	0.95+
Hadoop	EVENT	0.94+
Apache	ORGANIZATION	0.94+
Hadoop	PERSON	0.93+
indeed.com	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.92+
HBase	TITLE	0.92+
Hadoop Summit 2013	EVENT	0.92+
Quicken loans	ORGANIZATION	0.92+
two main differences	QUANTITY	0.89+
HIPAA	TITLE	0.89+
#HadoopSummit	EVENT	0.89+
S SLA	TITLE	0.89+
Hadoop	ORGANIZATION	0.88+
Cloudera	ORGANIZATION	0.85+
map R	TITLE	0.85+
a year	QUANTITY	0.83+
Zions bank	ORGANIZATION	0.83+
Peloton	LOCATION	0.78+
NFS	TITLE	0.78+
MapReduce	TITLE	0.77+
Cloudera map R	ORGANIZATION	0.75+
live	ORGANIZATION	0.74+
second design center	QUANTITY	0.73+
Hindu	ORGANIZATION	0.7+
theCUBE	ORGANIZATION	0.7+
three main areas	QUANTITY	0.68+
one enterprise grade	QUANTITY	0.65+

Jack Norris | Strata Data Conference 2013

>>Okay. We're back here inside the cube, our flagship program about the events and extract the signal from the noise. This is strata conference. O'Reilly media is a big data event. We're talking about Hadoop analytics, data platforms, and big is come into the enterprise from the front door. As we heard them yesterday. I'm John Frey with Dave Volante, wiki.org. And we're here with Jack Norris, our cube alumni, and a favorite guest here. You're a in charge executive at map. Our, you guys are leading the charge with this use of a dupe. Welcome back to the cube. Thank you. Okay, so what's, let's chat about what's going on. What's your take on all the big news out here for the distributions. I'll the big power moose. You guys have a relationship with EMC. Okay. Exclusive relationship with those guys. Intel's got a distribution Horton versus with Microsoft, a lot of things going on. So this is your wheelhouse. So what's your take on the Hadoop action here? >>Well, I think there's an article in Forbes where I think they, they said it best. This is showing that map bars had the right strategy all along. And what we're seeing is, is basically there's a fairly low bar to taking a patchy Hadoop and providing a distribution. And so we're seeing a lot of new entrance in the market and there's, there's a lot of options. If you want to try Hadoop and experiment and get started. And then there's production class Hadoop, which includes enterprise data protection, snapshots mirrors, ability to integrate. And that's basically map R so start and test and dev with, with a lot of options and then move into production, class >>Mapbox. So break it down for the folks out there who are tipping the toe in the water and hearing all the noise. Cause it's right now, the noise level is very high, right? With the, with the recent announcements. But you guys have been doing business obviously for many years in this area. So when people say, Hey, I want to get a Hadoop distribution with enterprise. What, what should they be looking for? Okay. Because it's not that easy to kind of swing through the noise. So could you share with the folks out there, what, what to look for in like the, the table stakes, the check boxes? Cause there's a lot of claims. There's a lot of noise is this. And that is a lot of different options. Some teams have more committers or no committers than others, so that's all noise, but let's what are the key things that customers need to know? So I think there's, miling, >>There's three areas. All right. One is kind of how it integrates into your enterprise. And with Hadoop, you have the Hadoop distributed file system API. That's how you interact. Well, if you're able to also use standard tools that can use standard file and database access, it makes it much, much easier. So map ours unique and supporting NFS and making that happen. That's a, that's a big difference. The second is on dependability and there's high availability capabilities and then there's data protection. So I'll focus on snapshots as an example, you've got data replicated and Hindu. That's great. But if you have a user error, an application error, that's replicated just as quickly. So having the ability to recover and double-edged in time. Yeah. So if I can say, Hey, I made a mistake. Can I go back two minutes earlier with snapshots that makes it possible map ours, unique and snapshot support. And then finally, there's there's disaster recovery mirroring where you can go across clusters, mirror, what's going on across the land and being able to recover in the case of a disaster where you lose a whole cluster or use a whole >>Section and that's not available in >>Other, those aren't available either. That's >>NFS, >>Snapshots has been on the JIRA list for over five years. >>Yeah. Okay. So I wonder >>If I could find that and then there's third. Cause I said three and almost said two, the third is performance and scale and, but >>That'd be for >>Integration, dependability and speed. >>Okay. So dependability Jr's part of the VR snapshots. MDR. Okay. So let's talk about the performance because you guys had asked a Google's a big partner of you guys. So we should, we just had them on the cube strata. So you have to have a record setting. Do you have a record setting? EMC take that. Well, you work with DMC. So let me talk about the performance real quick. Then we'll talk about some of the EMC conversations, but performance, you have a variety of diverse performance benchmarks, Google you have within the enterprise. Can you talk about those? >>So, so what we announced this week was the minute sort world record. So minutes or runs across technologies is just, how can you, you know, how much data can you sort in 60 seconds? And if you look back at, at the previous record that was done in the labs with Microsoft with special purpose software, and they did 1.4 terabytes Hadoop hasn't been used since 2009, it's been several years because it's got features in there that work against performance. Things like checkpointing and logging because it assumes you've got long running MapReduce jobs. So we set the record with our distribution of Hadoop. So we have kind of one hand tied behind our back, given that technology. Secondly, we sent it in the cloud, which is the other hand tied behind our back because it's a, it's a virtualized environment. So we set the record with just with your legs And a 1.5 terabytes in 60 seconds. Very proud of that. >>Well, that's interesting because we've been doing a lot of labs testing, Dave and I and our teams on cost. Right. So, yeah. And it's an interesting benchmark because you always don't look at the nuance, the cost to compare a cloud performance versus bare metal. Most people don't factor into setup, cost of deployment. Exactly. So can you just quickly talk about that and how significant of an order of magnitude of your customer? >>So the, the previous Hadoop record took 3,400 servers about 27,000 cores, 13, 13,000, almost 14,000 discs and did 600 gigs, actually a little less than that at 5 78. And on Google, we did it with 2020 100 virtual instances, 8,000 cores did 1.5 terabytes >>And costs. You spin up the Google versus >>Basically if you look at that and you assume conservatively 4,000 per server, it's $13.8 million worth of hardware previously. And the cost to do that run on Google was $20 and 33 cents. >>Well, you got to discount. I mean, come on a partner mean it really costs that much. I mean, they that's what they would charge for it. Actually >>We are map artist's case on that minute. If you look at the Asheville charges to be 1200, >>Okay. It's not six millions, so millions to thousands. Yep. Okay. That's impressive. We'll have to go look at the numbers. Like we're going to look at GreenPlum's numbers in the next couple of weeks when talking about the Google relationship and men were that the up way with that was that >>Very excited about it. We're actually deployed throughout the cloud. We've got multiple partners Google's in limited preview. So we've got a number of customers kind of, you know, testing that and, and doing some really interesting things. >>So we monitor the data center market. I'll see with our proprietary tool that you know about the viewfinder and crowd spots and thing is that the data center verticals interesting, right? If you look at the sentiment analysis of what the conversation is on, on just the Twitter data, it's Facebook, apple, these companies. And when we dig into the numbers, it's not so much the companies, it's the fact that their data center operations are significantly being looked at as the leading indicator for where CEO's are going. So I want to ask you in your conversations with your customers, what are the conversations around moving to the cloud and where are they on that transition? Because we hear, yeah, one of the cloud for all the benefits you were mentioning, but Google and Facebook, these are the gold standards as, as architecture necessarily a cut and paste architecture, but they see the benefits that they're doing. So what are your conversations with your enterprise customers around the cloud cloud architecture and what other features besides replication and disaster recovery, are they, are they looking at >>Well, it's basically work, workload driven and dataset driven. So data that's already in the cloud are kind of a natural first step is, well, why don't I do the analysis there as well? So things like Google earth and digital advertising data, that's real interesting candidates for that also periodic workload. So if they have workloads that need to spin up and spin down, the, the cloud works, works really well for that. And in some cases it's driven by their own environments. They've got data centers that are approaching capacity and they need to kind of do offloads and then looking at the, at the cloud because it's easy to get up running quickly and uses an alternative. >>I want to do come back to one of your three sort of value props here, particularly the dependability piece and specifically the snapshot. So somebody asked me one time, how do you know a couple of years ago, how do you back up a petabyte as he could do this thing? And then his answer was, well, you don't know. So I want to, I want to ask you how your customers are protecting and, and, and, and what you guys are bringing to the table. >>So snapshots is not a bolt on feature. It's basically a low level feature based on the underlying data architecture. So when we architected that from the beginning, snapshots was, was a, was a core feature. And if you use a technique called redirect on, right, you're not copying the data, right? So you can do efficient, you can do a petabyte snapshot, you know, basically almost instantaneously because you're tracking the pointers of the latest blocks that have been written. So if, if the data change rate is, is basically, data's not changing, you can snapshot every minute and not have any additional storage overhead. >>Right. Okay. And, and so you can set that. So you, you map, map, our technologies will allow them to set that, dial that up, dial it down and switches. >>So we support logical volumes. So you can set policies at that volume and you can say, well, this volume is critical data. And then I can set policies. Well, critical data is every minute. And then I can change what the definition of critical data is. Maybe it's every five minutes, et cetera. So you can set up these different policies at volumes and have snapshots happen independently for each. >>Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, as opposed to kind of a one size fits all approach. >>Exactly. And that, that also corresponds to user access, administrative privileges, you know, other features and policies within the, within the cluster. >>How about the, you know, this whole trend toward bringing SQL into, into Hadoop. What's, what's your take on that? And what's your angle? >>So interactive, SQL's an important aspect because you've got so many people trained in the organization and, and leverage, you know, sequel, but it's one of many use cases that needs to run across a big data platform. So there's a range of big data analytics, batch analytics, interactive capabilities with sequel, database operations, no sequel search streaming, all those are kind of functions that need to run across a platform. So it's a piece, but it's not the big driver, because what we've seen is that there's higher rival rate of machine generated data and machine generated response to respond to those for digital advertising, for recommendation engines for fraud detection can really move the needle for an organization, have huge swings and profitability >>And the ball down the field big time. Yeah. And >>Having an interactive piece with a kind of a human element involved, it doesn't really scale and work on a 24 by seven basis. >>Jack final question, we're over now by a minute. But when I ask a one party question, obviously, very competitive landscape right now in terms of competitiveness, the stakes are higher because the demand in the market market opportunities is massive. What's map ours business strategy going forward, no change in direction. Is it going to be same old, same old. You guys have any new things going down and you see the marketplace. >>We've got a huge lead when it comes to kind of mission critical enterprise grade features. And our focus is one platform. So the ability to support enterprise Hadoop, enterprise HBase and provide those full capabilities for ease of use for dependability, for performance. And, you know, we've seen a lot of companies test on one distribution and switch to map are and will continue to help that in the future. >>Well, we, we will, we will say we've been covering this big data space now going on four years now, Dave and I, and we've watched all the players pivot a few times. You guys have not, you guys have been true to your mission from day one and that we know where you stand. No one, everyone knows where you stand enterprise grade. It's a good strategy. I think everyone's putting that on their label now. So enterprise grade Washington, we call it a congratulations map art and said the cube. We'll be right back with our next guest here on day three wall-to-wall coverage at O'Reilly media. When do our news, our next from 12 to one, we'll be right back after this short break.

Published Date : Mar 4 2013

SUMMARY :

So what's your take on the Hadoop If you want to try Hadoop So could you share with the folks out there, what, what to look for in like the, the table stakes, And with Hadoop, you have the Hadoop That's If I could find that and then there's third. So let's talk about the performance because you And if you look back at, at the previous record that was done in the labs with So can you just quickly talk about that and how significant And on Google, we did it with 2020 100 virtual instances, And costs. And the cost to do that run on Google was $20 Well, you got to discount. If you look at the Asheville charges to be 1200, We'll have to go look at the numbers. So we've got a number of customers kind of, you know, testing that and, So I want to ask you in your conversations with your customers, So data that's already in the cloud are kind of a natural first step is, well, So I want to, I want to ask you how your customers are protecting and, and, So you can do efficient, you can do a petabyte snapshot, So you, you map, So you can set policies at that volume and you can say, Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, you know, other features and policies within the, within the cluster. How about the, you know, this whole trend toward bringing SQL into, into Hadoop. you know, sequel, but it's one of many use cases that needs to run And the ball down the field big time. Having an interactive piece with a kind of a human element involved, and you see the marketplace. So the ability to support enterprise Hadoop, You guys have not, you guys have been true to your mission from day

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
$20	QUANTITY	0.99+
Jack Norris	PERSON	0.99+
John Frey	PERSON	0.99+
apple	ORGANIZATION	0.99+
$13.8 million	QUANTITY	0.99+
Dave	PERSON	0.99+
600 gigs	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
60 seconds	QUANTITY	0.99+
1.5 terabytes	QUANTITY	0.99+
33 cents	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
3,400 servers	QUANTITY	0.99+
six millions	QUANTITY	0.99+
8,000 cores	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
O'Reilly	ORGANIZATION	0.99+
1200	QUANTITY	0.99+
third	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Asheville	LOCATION	0.99+
millions	QUANTITY	0.99+
two	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
2009	DATE	0.99+
1.4 terabytes	QUANTITY	0.99+
SQL	TITLE	0.99+
three	QUANTITY	0.99+
yesterday	DATE	0.99+
24	QUANTITY	0.99+
this week	DATE	0.99+
four years	QUANTITY	0.99+
one party	QUANTITY	0.99+
over five years	QUANTITY	0.99+
three areas	QUANTITY	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.98+
2020	DATE	0.98+
one	QUANTITY	0.98+
100 virtual instances	QUANTITY	0.97+
second	QUANTITY	0.97+
one platform	QUANTITY	0.97+
first step	QUANTITY	0.97+
Jack	PERSON	0.97+
one time	QUANTITY	0.97+
Secondly	QUANTITY	0.95+
about 27,000 cores	QUANTITY	0.94+
HBase	TITLE	0.93+
13, 13,000	QUANTITY	0.93+
GreenPlum	ORGANIZATION	0.92+
day three	QUANTITY	0.92+
DMC	ORGANIZATION	0.91+
Intel	ORGANIZATION	0.9+
a minute	QUANTITY	0.9+
day one	QUANTITY	0.89+
Strata Data Conference	EVENT	0.89+
4,000 per server	QUANTITY	0.89+
14,000 discs	QUANTITY	0.87+
five minutes	QUANTITY	0.85+
Washington	LOCATION	0.84+
one distribution	QUANTITY	0.83+
wiki.org	OTHER	0.83+
seven	QUANTITY	0.83+
couple of years ago	DATE	0.83+
5 78	QUANTITY	0.82+
each	QUANTITY	0.81+
Jr	PERSON	0.79+
12	QUANTITY	0.77+

Jack Norris | Strata-Hadoop World 2012

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George Gilder	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Kristin Filetti	PERSON	0.99+
Joel Hellison	PERSON	0.99+
John Schroeder	PERSON	0.99+
Joe	PERSON	0.99+
Jack	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Jack Norris	PERSON	0.99+
John	PERSON	0.99+
40 days	QUANTITY	0.99+
Melinda Graham	PERSON	0.99+
64%	QUANTITY	0.99+
$99	QUANTITY	0.99+
comScore	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Dave	PERSON	0.99+
Tuesday	DATE	0.99+
Matt BARR	PERSON	0.99+
Hellerstein	PERSON	0.99+
Google	ORGANIZATION	0.99+
George Gilder	PERSON	0.99+
Ted	PERSON	0.99+
John ferry	PERSON	0.99+
30 years	QUANTITY	0.99+
30,000 times	QUANTITY	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
a week later	DATE	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Dana	PERSON	0.99+
Tim SDS	PERSON	0.99+
one point	QUANTITY	0.99+
Java	TITLE	0.99+
first	QUANTITY	0.99+
six months later	DATE	0.99+
one	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
Linux	TITLE	0.98+
once a week	QUANTITY	0.98+
18 months	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
HBase	TITLE	0.98+
Kozum	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
this morning	DATE	0.97+
Telekom	ORGANIZATION	0.97+
this week	DATE	0.97+
10 years ago	DATE	0.97+
second dimension	QUANTITY	0.97+
both	QUANTITY	0.97+
Kozum	ORGANIZATION	0.95+
third one	QUANTITY	0.95+
One	QUANTITY	0.94+
three things	QUANTITY	0.94+
a year ago	DATE	0.94+
Hadoop	TITLE	0.93+
siliconangle.com	OTHER	0.93+
Knicks	ORGANIZATION	0.93+
Regents	ORGANIZATION	0.92+

Jack Norris | Hadoop Summit 2012

>>Okay. We're back live in Silicon valley and San Jose, California for the continuous coverage of siliconangle.tv and have duke world 2012. This is ground zero for the alpha geeks in big data. Uh, just the tech elite. We call them tech athletes and, uh, we're excited to cover it on the ground. Extract the signal from the noise here. This is the cube, our flagship telecast. I'm joining my co-host Jeff Kelly from Wiki bond.org, the best analyst in the business. Jeff, welcome back for another segment. End of the day, day one loving every minute. Okay. We're here with our guest. Jack Norris is a cm of map bar Jack. Welcome back to the cube. You've been on a few times. Um, so you guys have some news. Yes. So let's get right to the news. So you guys are a player in the business, so share with your news, the folks. Excellent jump right in. >>So, uh, two big announcements today, we announced that Amazon is integrating map bar as part of their Lastic MapReduce service and both edition or, or free edition. M three is available as well as M five directly with Amazon, Amazon in the cloud. >>So what's the value proposition. Why would a customer say, all right, I want to do this in the cloud manpower, an Amazon cloud rather than doing it on premise. >>Okay. So let's start with, I mean, there's a lot of value propositions, all balled up into one here. Uh, first of all, in the cloud, it allows them to spin up very quickly. Within a couple minutes, you can get, uh, you know, hundreds of nodes available. Um, and, uh, and depending on where you're processing the data, if you've got a lot of data in the cloud already makes a lot of sense to do the Hadoop processing directly there. So that's, that's one area. A second is you might have an on-premise cloud deployment and need to have a disaster recovery. So map R provides point in time, snapshots, uh, as well as, as a white area replication. So you can use mirroring having Amazon available as a target is a huge advantage. And then there's also a third application area where you can do processing of the data in the cloud and then synchronize those results to an on-premise. So basically process where the data is combined the results into a cluster on premise. So you >>Don't have to move the raw data. Uh, >>On-premise actually, it's all about let's do the processing on the data. Well, you know, the whole, >>The value proposition and big data in general is let's not move, move data as little as possible. Yep. Uh, you know, so you bring the computation to the data, if you can. Uh, so what are your take on this event? I mean, we've got, uh, this is a, you know, the 4th of June summit, uh, you know, Hortonworks is now fully taken over the show and talk about what you see out here in terms of, uh, the other vendors that play. And, uh, just to kind of the attendees, the vibe you're seeing, >>Uh, it's a lot of excitement. I think a big difference between last year, which seemed to be very developer focused. We're seeing a lot of, a lot of presentations by customers. A lot of information was shared by our customers today. It was fun to see that, uh, comScore's shared, uh, shared their success. Boeing gap map is, uh, it was great for us. >>Fantastic. We look at Amazon, Amazon, first of all, is the gold standard for public cloud. Right? They've knocked it out of the park. Everyone knows Amazon. Um, but they've been criticized on the big data front because of the cycle times involve on. Um, and some developers and mean for web service spending up and down. No problem. Um, and we're seeing businesses like Netflix run on Amazon. So Amazon is not a stranger to running scale for cloud, but Hadoop has kind of been a klugey thing for Amazon. So I think, you know, talk about why Amazon and you guys is a good fit out to the market. The market reach is great. So you guys know and have a huge addressable market. Are you guys helping solve some of that complexity with the, uh, with the MapReduce side? What's, >>What's the core, I guess the first comment first response would be, I think every customer should have that type of Kluge. Uh, uh, they could have the success that Amazon has in Hadoop. They have a huge number of, of, uh, of Hadoop deployments have been very, very successful. I think, >>I mean, you know what I mean by it's natural, it's, cloogy everywhere right now. That's the problem. But Amazon has huge scale, um, and had not a natural fit. There >>Is not a natural fit >>For the data for the data component. And, uh, uh, the HBase for example, >>Component. So where were Amazons, you know, made it very frictionless is the ability to spin up Hadoop to do the analysis. The gap that was missing is some of the, the ha capabilities. The data protection features the disaster recovery, and, you know, we're map are now it gives options to those customers. You know, if they want those kinds of enterprise enterprise grade features, now they have an option within EMR. It can select a M five and, and get moving if they want a performance. And in NFS, they've got the M three options. >>Well, congratulations. I think it's a great deal for you guys and for Amazon customers. My question for you is, as you guys explore the enterprise ready equation, which has been a big topic this week, um, what does that mean to you guys? Cause it means different things to different people depends on where, how high up to OLTB do you go? Right? I mean, we're how far from batch to real time transactional, um, levels you go, I mean, low bash, no problem. But as you start to get more near real time, it's going to be a little bit different gray in this house used security HDFS. Yeah. >>Yeah. So, so duke represents the strategic platform, right? Deploying that in an organization, um, you know, moving from kind of an experimental kind of lab based to production environment creates a different set of feature requirements. How available is it? How easy is it to integrate, right? How do I kind of protect that information and how do I share it? So when we say enterprise grade, we mean you can have SLA, she can put the data there and, and be confident that the data will remain there, that you can have a point in time recovery for an application error or user mistake. Uh, you can have a disaster recovery features in place. And then the integration is about not recreating the wheel to get access to the information. So Hadoop is very powerful, but it requires interacting through an HDFS API. If you can leverage it like through map bar with NFS standard file based access standard ODBC access, open it up. >>So I can use a standard file browser applications to see and manipulate the data really opens up the use cases. And then finally, what we announced in two dot oh, was multitenancy features. So as you share that information, all of a sudden the SLA is of different groups and well, these guys need it immediately. And if you've got some low grade batch jobs are going to impact that. So you want the ability to protect, to isolate, to secure information, and basically have virtual clusters within a cluster. And those features are important to cloud, but they're also important to on-premise >>So great for the hybrid cloud environments out there. I mean, the multitenancy cracking the code on that. Exactly huge. I mean, that is basically, I mean, right now most enterprises are like private cloud because it's like, they're basically extension of their data center and you're seeing a lot more activity in the hybrid cloud as a gateway to the public cloud. So, >>And, and, you know, frankly, people are kind of struggling with in an experimental with Apache Hadoop and the other distributions, the policies are either at the individual file level or the whole cluster. And it all almost forced the creation of separate physical clusters, which kind of goes against the whole Hadoop concept. So the ability to manage it, a logical layer have separate volumes where you can apply policies to apply that applies to all the content underneath really kind of makes it much, much easier for administrators to kind of deal with these multiple use cases. >>Amazon, Amazon has always been one of those cases for the enterprise where it's been one of those and they've, this has been talked about for years, put the credit card down, go play on Amazon, but then bring it back into the it group for certification. And so I think this is a nice product for you guys to bring that comfort. You know, we're very >>Excited the enterprise saying, Hey, >>Come play in Amazon. It's Bulletproof enterprise. Ready? So congratulations. >>I wonder, can we talk, uh, talk use cases. So what are you seeing in terms of, uh, evolving use cases as, as, uh, duke continues to become more enterprise grade, uh, depending on your definition, uh, but how is that impacting what you're seeing in terms of, even if it's just, uh, you know, the, the, um, the mindset even people think now, okay, now it's enterprise grade, well, maybe, you know, in, in, depending on who you talk to, it's been that way for a bit, but what kind of, uh, use cases are you seeing develop now that it's kind of starting to gain acceptance? It's like, okay, we can trust our data is going to be there, et cetera. >>So th there's a huge range of use cases that, uh, different by industry, different by kind of dataset that's being used against everything from really a deep store where you can do analytics on it. So you're selecting the content to something that's very, very analytic machine learning intensive, where you're doing sophisticated clustering algorithms, uh, et cetera, um, where we've seen kind of an expansion of use cases are around real-time streaming and you get streaming data sets that are kind of entering into the cloud. And, um, some of the more mission, critical data moving beyond just maybe click stream data or things that if you happen to drop a few, you know, not a big deal, right. Versus the kind of trust the business type of content. >>Talk a little bit about the streaming, uh, aspects, uh, because of course, you know, we think of duke, we think of a batch system in terms of streaming data into Hadoop. You know, that's, that's a different, uh, that's something we don't, we haven't heard a lot about. So how do you guys approach that? >>So, uh, one of the artifacts of, of HDFS, which is a, is a distributed file system that scores in the underlying Linux file system, it's append only. So as an administrator, you decide, how frequently do I close the file item? I going to do that an hourly basis on it every eight hours, because you have to close the file for other applications to see the data that's been written. Right? So one of the innovations that, uh, that we pursued was to rewrite that create this dynamic read-write layer. So you can continue to write data in any application is seeing the latest data that's written. So you can Mount the cluster as if it's storage and just continue to write data. There really opens up what's, uh, what's possible companies like Informatica, they're all from a messaging product integrates directly in with, with Matt BARR and provides. >>So what kind of advantage does that provide to the end user? What w w translate that into real business value? Why, why is that important? >>Well, so one example is comScore, comScore handles 30 billion, uh, objects a day, uh, as they go out and try to measure the use of, of the web and being able to continually write and stream that information and scale and handle that in a real time and do analytics and turn around data faster, has tremendous business value to them. If they're stuck in a batch environment where the load times lengthen to the point where all of a sudden they can't keep up and they're actually reporting on, you know, old news. And I think the analogy is forecasting rain a day after it's wet. Isn't exactly valuable. >>Yeah. So you guys, obviously a great deal of the enterprise ready for Amazon, big story, big coup for the company. What's next for you. I want to ask that and make sure you get that out there on your agenda for the next year, but then I want you to take a step back a year, maybe a year and a half ago. Look back at how much has changed in this landscape. Um, share your perspective because the market has gone through an evolution where there's been a market opportunity, and then everyone goes, oh my God, it's bigger than we actually thought. I mean, Jeff, Kelly's a groundbreaking report about the $50 billion market is now being talked about as too low. So big data has absolutely opened up to a huge, and it's changed some of the tactics around strategies. So your strategy, Hortonworks strategy, even cloud era. So, and it's still evolving. So what's changed for the folks out there from a year and a half ago, a year ago to today, and then look out for the next 12 months. What's on your agenda. >>Well, if, if you look back, I think we've been fairly consistent. Um, uh, I'm, I'm not going to take credit for the vision of our CEO and CTO. Uh, but they recognized early on that Hadoop was, uh, was a strategic platform and to be a strategic platform that applied to the broadest number of use cases and organizations required some, some areas, uh, of innovation and particularly the how it, how it scaled, how it was managed, how you stored and protected the information needed a rearchitecture. And I think that, you know, architecture matters when you're going through a paradigm shift, having the right one in place creates this, this ability, you know, to speed innovation. And I think that's, if there's anything that's changed, I think it's the speed of innovation has even increased in the Hadoop community. I think it's, it's created a focus on these enterprise grade features on how do we store this valuable information and, and continue to explore. >>And I think one of the observations I'll make is that on that note is that it really focuses everyone to be just mind your own business and get the products out. You know what I'm saying? We've seen everyone, the product focus be the number one conversation. >>What we've seen is customers, you know, start and they expand rapidly. Some of that student data growth, but a lot of it is student more and more applications are being delivered and, and, uh, and, and the values kind of extracted from the hoop platform and success breeds success. Well, >>Congratulations for all your success, great win with Amazon web services and make that a little bit more easier, more robust, and more, more features for them and you, uh, more revenue for part of our, um, and I want to personally thank you for your support to the cube. Uh, we've expanded with a new studio B software for extra extra interviews, um, and wanna expand the conversation, thanks to your generous support. You can bring the independent coverage out to the market and, um, great community, thanks for helping us out. And we appreciate it. So thank you. Okay. Jack Dorsey with Matt bar, we'll be right back to wrap up day one with that. Jeff and I will give our analysis right at the short break.

Published Date : Jun 14 2012

SUMMARY :

So you guys are a player in the business, so share with your news, Amazon in the cloud. So what's the value proposition. And then there's also a third application area where you can do processing of the data in Don't have to move the raw data. Well, you know, the whole, uh, you know, Hortonworks is now fully taken over the show and talk about what you see out here in terms of, uh, it was great for us. So I think, you know, talk about why Amazon and you guys is a good fit out What's the core, I guess the first comment first response would be, I think every customer I mean, you know what I mean by it's natural, it's, cloogy everywhere right now. For the data for the data component. the disaster recovery, and, you know, we're map are now it gives options to those customers. I think it's a great deal for you guys and for Amazon customers. that the data will remain there, that you can have a point in time recovery for an application error or user mistake. So as you share that information, So great for the hybrid cloud environments out there. So the ability to manage it, And so I think this is a nice product for you guys to So congratulations. So what are you seeing in terms of, uh, evolving use cases as, really a deep store where you can do analytics on it. Talk a little bit about the streaming, uh, aspects, uh, because of course, you know, we think of duke, I going to do that an hourly basis on it every eight hours, because you have to close the file for other applications actually reporting on, you know, old news. I want to ask that and make sure you get that And I think that, you know, architecture matters when you're going through a paradigm shift, And I think one of the observations I'll make is that on that note is that it really focuses everyone to be What we've seen is customers, you know, start and they expand rapidly. You can bring the independent coverage out to the market and, um, great community,

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Jeff	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jack Norris	PERSON	0.99+
Jack Dorsey	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
$50 billion	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
30 billion	QUANTITY	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
a year ago	DATE	0.99+
next year	DATE	0.99+
comScore	ORGANIZATION	0.99+
a year and a half ago	DATE	0.99+
Kelly	PERSON	0.99+
last year	DATE	0.99+
Amazons	ORGANIZATION	0.99+
Linux	TITLE	0.99+
Matt BARR	PERSON	0.99+
San Jose, California	LOCATION	0.99+
one example	QUANTITY	0.98+
one area	QUANTITY	0.97+
third application	QUANTITY	0.97+
Matt	PERSON	0.97+
one	QUANTITY	0.97+
Hadoop	TITLE	0.97+
this week	DATE	0.96+
2012	DATE	0.95+
hundreds of nodes	QUANTITY	0.94+
Hortonworks	ORGANIZATION	0.94+
Jack	PERSON	0.93+
both edition	QUANTITY	0.93+
a day	QUANTITY	0.93+
two big announcements	QUANTITY	0.92+
second	QUANTITY	0.9+
next 12 months	DATE	0.88+
day one	QUANTITY	0.86+
two dot	QUANTITY	0.85+
M three	OTHER	0.85+
M three	TITLE	0.84+
MapReduce	ORGANIZATION	0.82+
Hadoop Summit 2012	EVENT	0.79+
first response	QUANTITY	0.79+
every eight hours	QUANTITY	0.78+
SLA	TITLE	0.77+
June	DATE	0.77+
first comment	QUANTITY	0.77+
Lastic MapReduce	TITLE	0.69+
M five	OTHER	0.69+
Boeing	ORGANIZATION	0.68+
M five	TITLE	0.67+
siliconangle.tv	OTHER	0.67+
ground zero	QUANTITY	0.67+
Wiki bond.org	ORGANIZATION	0.62+
Apache	ORGANIZATION	0.61+
4th of	EVENT	0.6+

Jack Norris - Strata Conference 2012 - theCUBE

>>Hi everybody. We're back. This is Dave Volante from Wiki bond.org. We're live at strata in Santa Clara, California. This is Silicon angle TVs, continuous coverage of the strata conference. So Riley media or Raleigh media is a great partner of ours. And thanks to them for allowing us to be here. We've been going all week cause it's day three for us. I'm here with Jeff Kelly Wiki bonds that lead big data analysts. And we're here with Jack Norris. Who's the VP of marketing at Matt bar Jack. Welcome to the cube. Thank you, Dave. Thanks very much for coming on. And you know, we've been going all week. You guys are a great sponsor of ours. Thank you for the support. We really appreciate it. How's the show going for you? >>Great. A lot of attention, a lot of focus, a lot of discussion about Hadoop and big data. >>Yeah. So you guys getting a lot of traffic. I mean, it says I hear this 2,500 people here up from 1400 last year. So that's >>Yeah, we've had like five, six people deep in the, in the booth. So I think there's a lot of, a lot of interests. There's interesting. >>You know, when we were here last year, when you looked at the, the infrastructure and the competitive landscape, there wasn't a lot going on and just a very short time, that's completely changed. And you guys have had your hand in that. So, so that's good. Competition is a good thing, right? And, and obviously customers want choice, but so we want to talk about that a little bit. We want to talk about map bar, the kind of problems you're solving. So why don't we start there? What is map are all about? And you've got your own distribution of, of, of enterprise Hadoop. You make it Hadoop enterprise ready? Let's start there. >>Okay. Yeah, I mean, we invested heavily in creating a alternative distribution one that took the best of the open source community with the best of the map, our innovations, and really it's, it's about making Hadoop more applicable, broader use cases, more mission, critical support, you know, being able to sit in and work in a lights out data center environment. >>Okay. So what was the problem that you set out to solve? Why, why do, why do we need another distribution of Hadoop? Let me ask it that way. Get nice and close to. >>So there, there are some just big issues with, with the duke. >>One of those issues, let's talk about that. There's >>Some ease of use issues. There's some deep dependability issues. There's some, some performance. So, you know, let's take those in order right now. If you look at some of the distributions, Apache Hadoop, great technology, but it requires a programmer, right? To get access to the data it's through the Hadoop API, you can't really see the data. So there's a lot of focus of, you know, what do I do once the data's in there opening that up, providing a full file based access, right? So I can look at it and treat it like enterprise storage, see the data, use my standard tools, standard commands, you know, drag and drop from a file browser. You can do that with Matt bar. You can't do that with other districts >>Talking about mountain HDFS as a NFS correct >>Example. Correct. And then, and then just the underlying storage services. The fact that it's append only instead of full random read-write, you know, causes some, some issues. So, you know, that's some of the, the ease of use features. There's a whole lot. We could discuss there. Big picture for reliability. Dependability is there's a single point of failure, multiple single points of failure within Hadoop. So you risk data loss. So people have looked at Hadoop. Traditionally is, is batch oriented. Scratchpad right. We were out to solve that, right? We want to make sure that you can use it for mission critical data, that you don't have a risk of a data loss that you've got full high availability. You've got the full data protection in terms of snapshots and mirroring that you would expect with the enterprise products. >>It gets back to when you guys were, you know, thinking about doing this. I'm not even sure you were at the company at the time, but you, your DNA was there and you're familiar with it. So you guys saw this big data movement. You saw this at duke moon and you said, okay, this is cool. It's going to be big. And it's gonna take a long time for the community to fix all these problems. We can fix them. Now let's go do that. Is that the general discussion? Yeah. >>You know, I think, I think the what's different about this. This is the first open source package. The first open source project that's created a market. If you look at the other open source, you know, Linux, my SQL, et cetera, it was really late in the life cycle of a product. Everyone knew what the features were. It was about, you know, giving an alternative choice, better Unix. Your, your, the focus is on innovation and our founders, you know, have deep enterprise background or CTO was at Google and charge of big table, understands MapReduce at scale, spent time as chief software architect at Spinnaker, which was kind of the fastest clustered Nazanin on the planet. So recognize that the underlying layers of Hadoop needed some rearchitecture and needed some deep investment and to do that effectively and do that quickly required a whole lot of focus. And we thought that was the best way to go to market. >>Talk about the early validation from customers. Obviously you guys didn't just do this in a vacuum, I presume. So you went out and talked to some customers. Yeah. >>What sorts of conversations with customers, why we're in stealth mode? We're probably the loudest stealth >>As you were nodding. And I mean, what were they telling you at the time? Yeah, please go do this. >>The, what we address weren't secrets. I there've been gyrus for open for four or five years on, on these issues. >>Yeah. But at the same time, Jack, you've got this, you got this purist community out there that says, I don't want to, I don't want to rip out HDFS. You know, I want it to be pure. What'd you, what'd you say to those guys, you just say, okay, thank you. We, we understand you're not a prospect. >>And I think, I think that, you know, duke has a huge amount of momentum. And I think a lot of that momentum is that there isn't any risks to adopting Hadoop, right? It's not like the fractured no SQL market where there's 122 different entrance, which one's going to win. Hadoop's got the ecosystem. So when you say pure, it's about the API APIs, it's about making sure that if I create a MapReduce job, it's going to run an Apache. It's going to run a map bar. It's going to run on the other distributions. That's where I think that the heat and the focus is now to do that. You also have to have innovation occurring up and down the stack that that provides choice and alternatives for. >>So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, which is the elephant in the room here. People will worry about lock-in >>Pun intended. >>No, no, but good one good catch. But so, but you're basically saying, Hey, where we're no more locked in than cloud era. Right. I mean, they've got their own >>Actually. I think we're less because it's so easy to get data in and out with our NFS. That there's probably less so, >>So, and I'm gonna come back to that. But so for instance, many, when I, when I say peers, I mean some users in ISV, some guys we've had on here, we had an Abby Mehta from Triceda on the other day, for instance, he's one who said, I just don't have time to mess with that stuff and figure out all that API integration. I mean, there are people out there that just don't want to go that route. Okay. But, but you're saying I'm, I'm inferring this plenty who do right. >>And the, and by the API route, I want to make sure I understand what you're saying. You >>Talked about, Hey, it's all about the API integration. It's not >>About, it's not the, it it's about the API APIs being consistent, a hundred percent compatible. Right. So if I, you know, write a program, that's, that's going after HDFS and the HDFS API, I want to make sure that that'll run on other distributions. Right. >>And that's your promise. Yeah. Okay. All right. So now where I was going with this was th again, there are some peers to say, oh, I just don't want to mess with all that. Now let's talk about what that means to mess with all that. So comScore was a big, high profile case study for you guys. They, they were cloud era customer. They basically, in my understanding is a couple of days migrated from Cloudera to Mapbox. And the impetus was, let's talk about that. Why'd they do that >>Performance data protection, ease of use >>License fee issues. There was some license issues there as well, right? The, the, your, your maintenance pricing was more attractive. Is that true? Or >>I read more mainly about price performance and reliability, and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. Didn't actually tell all their users, they had one guys debug the software for half a day because something was wrong. It finished so quickly. >>So, so it took him a couple of days to migrate and then boom, >>Boom. And they've, they handle about 30 billion objects a day. So there, you know, the use of that really high performance support for, for streaming data flows, you know, they're talking about, they're doing forecasts and insights into web behavior, and, you know, they w the earlier they can do that, the better off they are. So >>Greg, >>So talk about the implications of, of your approach in terms of the customer base. So I'm, I'm imagining that your customers are more, perhaps advanced than a lot of your typical Hadoop users who are just getting started tinkering with Hadoop. Is it fair to say, you know, your customers know what they want and they want performance and they want it now. And they're a little more advanced than perhaps some of the typical early adopters. >>We've got people to go to our website and download the free version. And some of them are just starting off and getting used to Hadoop, but we did specifically target those very experienced Hadoop users that, you know, we're kind of, you know, stubbing their toes on, on the issues. And so they're very receptive to the message of we've made it faster. We've made it more reliable, you know, we've, we've added a lot of ease of use to the, to the Hindu. >>So I found this, let me interrupt, go back to what I was saying before is I found this comment that I found online from Mike Brown comScore. Skipio I presume you mean, he said comScore's map our direct access NFS feature, which exposes a duke distributed file system data as NFS files can then be easily mounted, modified, or overwritten. So that's a data access simplification. You also said we could capitalize on the purchase of map bar with an annual maintenance charge versus a yearly cost per node. NFS allowed our enterprise systems to easily access the data in the cluster. So does that make sense to you that, that enterprise of that annual maintenance charge versus yearly cost per node? I didn't get that. >>Oh, I think he's talking about some, some organizations prefer to do a perpetual license versus a subscription model that's >>Oh, okay. So the traditional way of licensing software >>And that, that you have to do it basically reinforces the fact that we've really invested in have kind of a, a product, you know, orientation rather than just services on top of, of some opensource. >>Okay. So you go in, you license it and then yeah. Perpetual license. >>Then you can also start with the free edition that does all the performance NFS support kick the tires >>Before you buy it. Sorry. Sorry, Jeff. Sorry to interrupt. No, no problem >>At all. So another topic, a lot of interest is security making a dupe enterprise ready. One of the pillars, there is security, making sure access controls, for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from some of the other vendors out there, or the other >>Full Kerberos support. We Lincoln to enterprise standards for access eldap, et cetera. We leveraged the Linux, Pam security, and we also provide volume control. So, you know, right now in Hindu in Apache to dupe other distributions, you put policies at the file level or the entire cluster. And we see many organizations having separate physical clusters because of that limitation, right? And we'd provide volume. So you can define a volume. And in that volume control, access control, administrative privileges data protection class, and, you know, in a sense kind of segregate that content. And that provides a lot of, a lot of control and a lot more, you know, security and protection and separation of data. >>That scenario, the comScore scenario, common where somebody's moving off an existing distribution onto a map are, or, or you more going, going, seeing demand from new customers that are saying, Hey, what's this big data thing I really want to get into it. How's it shake out there >>Right now? There's this huge pent up demand for these features. And we're seeing a lot of people that have run on other distributions switched to map our >>A little bit of everything. How about, can you talk a little bit about your, your channel? You go to market strategy, maybe even some of your ecosystem and partnerships in the little time. >>Sure. So EMC is a big partner of the EMC Greenplum Mr. Edition is basically a map R you can start with any of our additions and upgrade to that. Greenplum with just a licensed key that gives us worldwide service and support. It's been a great partnership. >>We hear a lot of proof of concepts out there >>For, yeah. And then it just hit the news news today about EMC's distribution, Mr. Distribution being available with UCS Cisco's ECS gear. So now that's further expanded the, the footprint that we have about. >>Okay. So you're the EMC relationship. Anything else that you can share with us? >>We have other announcements coming out and >>Then you want to pre-announce in the queue. >>Oops. Did I let that slip >>It's alive? So be careful. And so, in terms of your, your channel strategy, you guys mostly selling direct indirect combination, >>It's it? It, it's kind of an indirect model through these, these large partners with a direct assist. >>Yeah. Okay. So you guys come in and help evangelize. Yep. Excellent. All right. Do you have anything else before we gotta got a roll here? >>Yeah, I did wonder if you could talk a little bit about, you mentioned EMC Greenplum so there's a lot of talk about the data warehouse market, the MPB data warehouses, versus a Hadoop based on that relationship. I'm assuming that Matt BARR thinks well, they're certainly complimentary. Can you just touch on that? And, you know, as opposed to some who think, well, Hadoop is going to be the platform where we go, >>Well, th th there's just, I mean, if you look at the typical organization, they're just really trying to get their, excuse me, their arms around a lot of this machine generated content, this, you know, unstructured data that just growing like wildfire. So there's a lot of Paducah specific use cases that are being rolled out. They're also kind of data lakes, data, oceans, whatever you want to call it, large pools where that information is then being extracted and loaded into data warehouses for further analysis. And I think the big pivot there is if it's well understood what the issue is, you define the schema, then there's a whole host of, of data warehouse applications out there that can be deployed. But there's many things where you don't really understand that yet having to dupe where you don't need to find a schema a is a, is a big value, >>Jack, I'm sorry. We have to go run a couple of minutes behind. Thank you very much for coming on the cube. Great story. Good luck with everything. And sounds like things are really going well and market's heating up and you're in the right place at the right time. So thank you again. Thank you to Jeff. And we'll be right back everybody to the strata conference live in Santa Clara, California, right after this word from our.

Published Date : Apr 27 2012

SUMMARY :

And you know, we've been going all week. A lot of attention, a lot of focus, a lot of discussion about Hadoop So that's So I think there's a lot of, And you guys have had your hand in that. broader use cases, more mission, critical support, you know, being able to sit in and work Let me ask it that way. So there, there are some just big issues with, One of those issues, let's talk about that. So there's a lot of focus of, you know, what do I do once the data's in So you risk data loss. It gets back to when you guys were, you know, thinking about doing this. It was about, you know, giving an alternative choice, better Unix. So you went out and talked to some customers. And I mean, what were they telling you at the time? I there've been gyrus for open for four or five You know, I want it to be And I think, I think that, you know, duke has a huge amount of momentum. So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, I mean, they've got their own I think we're less because it's so easy to get data in and out with our NFS. So, and I'm gonna come back to that. And the, and by the API route, I want to make sure I understand what you're saying. Talked about, Hey, it's all about the API integration. So if I, you know, write a program, that's, that's going after for you guys. Is that true? and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. you know, the use of that really high performance support for, to say, you know, your customers know what they want and they want performance and they want it now. experienced Hadoop users that, you know, we're kind of, you know, So does that make sense to you that, So the traditional way of licensing software And that, that you have to do it basically reinforces the fact that we've really invested in have kind Before you buy it. for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from a lot of control and a lot more, you know, security and protection and separation of data. off an existing distribution onto a map are, or, or you more going, And we're seeing a lot of people that have run on other distributions switched to map our How about, can you talk a little bit about your, your channel? Mr. Edition is basically a map R you can start with any of our additions So now that's further Anything else that you can share with us? you guys mostly selling direct indirect combination, It, it's kind of an indirect model through these, these large partners with Do you have anything else before And, you know, as opposed to some who think, excuse me, their arms around a lot of this machine generated content, this, you know, So thank you again.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Jeff	PERSON	0.99+
Jack Norris	PERSON	0.99+
five	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Jack	PERSON	0.99+
EMC	ORGANIZATION	0.99+
last year	DATE	0.99+
Matt BARR	PERSON	0.99+
four	QUANTITY	0.99+
UCS	ORGANIZATION	0.99+
2,500 people	QUANTITY	0.99+
Santa Clara, California	LOCATION	0.99+
Greg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Mike Brown	PERSON	0.99+
half a day	QUANTITY	0.99+
Spinnaker	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
comScore	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Riley	ORGANIZATION	0.98+
EMC Greenplum	ORGANIZATION	0.98+
Abby Mehta	PERSON	0.98+
Linux	TITLE	0.97+
strata conference	EVENT	0.97+
SQL	TITLE	0.97+
One	QUANTITY	0.97+
one guys	QUANTITY	0.97+
today	DATE	0.97+
Raleigh	ORGANIZATION	0.97+
122 different entrance	QUANTITY	0.97+
six people	QUANTITY	0.97+
Skipio	PERSON	0.96+
Jeff Kelly	PERSON	0.95+
single point	QUANTITY	0.95+
about 30 billion objects a day	QUANTITY	0.94+
Strata Conference 2012	EVENT	0.93+
ECS	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.91+
Triceda	ORGANIZATION	0.9+
Apache	TITLE	0.9+
firs	QUANTITY	0.9+
Paducah	LOCATION	0.89+
Greenplum	ORGANIZATION	0.89+
single points	QUANTITY	0.88+
day three	QUANTITY	0.88+
NFS	TITLE	0.87+
Wiki bond.org	OTHER	0.87+
1400	QUANTITY	0.85+
Unix	TITLE	0.85+
Wiki bonds	ORGANIZATION	0.84+
Silicon angle	ORGANIZATION	0.83+
Mapbox	ORGANIZATION	0.78+
Apache	ORGANIZATION	0.76+
MapReduce	ORGANIZATION	0.75+
Kerberos	ORGANIZATION	0.75+
first open	QUANTITY	0.74+
Pam	TITLE	0.73+
Matt bar	ORGANIZATION	0.73+
Nazanin	ORGANIZATION	0.61+
Cloudera	TITLE	0.59+
moon	LOCATION	0.58+
Cisco	ORGANIZATION	0.54+
one	QUANTITY	0.53+
days	QUANTITY	0.52+
MapReduce	TITLE	0.47+

Steve Wooledge - HP Discover Las Vegas 2014 - theCUBE - #HPDiscover

>>Live from Las Vegas, Nevada. It's a queue at HP. Discover 2014 brought to you by HP. >>Welcome back, everyone live here in Las Vegas for HP. Discover 2014. This is the cube we're out. We go where the action is. We're on the ground here at HP. Discover getting all the signals, sharing them with you, extracting the signal from the noise. I'm John furrier, founder of SiliconANGLE. I joined Steve Woolwich VP of product marketing at map art technologies. Great to see you welcome to the cube. Thank you. I know you got a plane to catch up, but I really wanted to squeeze you in because you guys are a leader in the big data space. You guys are in the top three, the three big whales map are Hortonworks, Cloudera. Um, you know, part of the original big data industry, which, you know, when we did the cube, when we first started the industry, you had like 30, 34 employees, total combined with three, one company Cloudera, and then Matt are announced and then Hortonworks, you guys have been part of that. Holy Trinity of, of early pioneers. Give us the update you guys are doing very, very well. Uh, we talked to you guys at the dupe summit last week. So Jack Norris for the party, give us the update what's going on with the momentum and the traction. And then I want to talk about some of the things with the product. >>Yeah. So we've seen a tremendous uptick in sales at map. Are we tripled revenue? We announced that publicly about a month ago. So we went up 300% in sales, over Q3, I'm sorry, Q1 of 2013. And I think it's really, you know, the maturity of the market. As people move more towards production, they appreciate the enterprise features. We built into the map, our distribution for Hadoop. So, um, you know, the stats I would share is that 80% of our customers triple the size of their cluster within the first 12 months and 50% of them doubled the size of the cluster because there's the, you know, they had that first production success use case and they find other applications and start rolling out more and more. So it's been great for us. >>You know, I always joke with Jack Norris, who's the VP of marketing over there. And John Frodo is the CEO about Matt bars, humbleness. You don't have the fanfare of all the height, depressed love cloud era. Now see they had done some pretty amazing things. They've had a liquidity event, so essentially kind of an IPO, if you will, that huge ex uh, financing from Intel and they're doing great big Salesforce. Hortonworks has got their open source play. You guys got, you got your heads down as well. So talk about that. How many employees you guys have and what's going on with the product? How many, how many new, what, how many products do you guys actually, >>We have, well, we have one product. So we have the map, our distribution for Hadoop, and it's got all the open source packages directly within it, but where we really innovate is in the course. So that's where we, we spent our time early on was really innovating that data platform to give everything within the Hadoop ecosystem, more reliability, better availability, performance, security scale, >>It's open source contributions to the court. And you guys put stuff on top of that, uh, >>And how it works. Yeah. And even some projects we lead the projects like with Apache Mahal and Apache drill, which is coming into beta shortly other projects, we commit and contribute back. But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is at that data platform level. So >>HP is a big data leader officer. They bought, uh, autonomy. They have HP Vertica. You guys are here. Hey, what are you doing here? Obviously we covered the cube, uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity going on other integration opportunities? >>Yeah, a few things. So, um, obviously the HP Vertica news was big. We went into general availability that solution the first week of may. So, um, what we have is the HP Vertica database integrated directly on top of our data platform. So it's this hybrid solution where you have full SQL database directly within your Hadoop distribution. Um, so it had a couple sessions on that. We had, uh, a nice panel discussion with our friends from Cloudera and Hortonworks. So really good discussion with HP about just the ecosystem and how it's evolving. The other things we're doing with HP now is, you know, we've got reference architectures on their hardware lines. So, um, you know, people can deploy Mapbox on the hardware of HP, but then also we're talking with the, um, the autonomy group about enterprise search and looking at a similar type of integration where you could have the search integrated directly into your Hadoop distro. And we've got some joint accounts we're piloting that she goes, now, >>You guys are integrating with HP pretty significantly that deals is working well. Absolutely. What's the coolest thing that you've seen with an HP that you can share. How so I asked you in the big data landscape, everyone's Bucher, you know, hunkering down, working on their feature, but outside in the real world, big data, it's not on the top of mind of the CIO, 24 7. It's probably an item that they're dressing. What have you seen and what have you been most impressed with at HP here? >>Yeah. Say, you know, this is my first HP event like this. I think the strategy they have is really good. I think in certain areas like the cloud in particular with the helium, I think they made a lot of early investments there and place some bets. And I think that's going to pay off well for them. And that marries pretty nicely with our strategy as well in terms of, you know, we have on-premise deployments, but we're also an OEM if you will, within Amazon web services. So we have a lot of agility in the cloud if you will. And I think as those products and the partnerships with HP, evolvable, we'll be playing a lot more with them in the cloud as well. >>I see that asks you a question. I want you to share with the folks out there in your own words, what is it about map bar that they may or may not understand or might not know about? Um, a little humble brag out there and share some, share some, uh, insight of, into, into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you guys do shit share with them, with what, what map map is all about. >>Yeah. I mean, for me, I was in this space with Aster data and kind of the whole Hadoop and MapReduce area since 2008 and pretty familiar with everybody in the space. I really looked at Matt bars, the best technology hands down, you look at the Forrester wave and they rank us as having the best technology today, as well as product roadmap. I think the misperception is people think, oh, it's proprietary and close. It's actually the opposite of that. We have an unbiased open-source approach where we'll ship in support in our distribution, in the entire Apache spark stack. We're not selective over which projects within Apache spark. We support. Um, I feel like SQL on Hadoop. We support Impala as well as hive and other SQL on to do technologies, including the ability to integrate HP Vertica directly in the system. And it's because of the openness of our platform. I'd say it's actually more open because of the standards we've integrated into the data platform to support a lot of third-party tools directly within it. So there is no locked in the storage formats are all the same. The code that runs on top of the distribution from the projects is exactly the same. So you can build a project in hive or some other system, and you can port it between any of the distributions. So there isn't a, lock-in >>The end of the day, what the customers want is they want ease of integration. They want reliability. That's right. And so what are you guys working on next? What's the big, uh, product marketing roadmap that you can share with us? >>Yeah, I think for us, because of the innovations we did in the data platform allows us to support not only more applications, but more types of operational systems. So integrating things like fraud detection and recommendation engines directly with the analytical systems to really speed up that, um, accuracy and, and, uh, in targeting and detecting risk and things like that. So I think now over time, you know, Hadoop has sort of been this batch analytic type of platform, but the ability to converge operations and analytics in one system is really going to be enabled by technology like Matt BARR. >>How many employees do you guys have now? Uh, >>I'm not sure what our CFO would. Let me say that before. You can say we're over 200 at this point >>As well. And over five, the customers which got the data, you guys do summit graduations, we covered your relationship with HP during our big data SV. That was exciting. Good to see John Schroeder, big, very impressive team. I'm impressed with map. I will always have been. You guys have Stephanie kept your knitting saved. Are you going to do, and again, leading the big data space, um, and again, not proprietary is a very key word and that's really cool. So thanks for coming on. Like you really appreciate Steve. We'll be right back. This is the cube live in Las Vegas, extracting the city from the noise with map bar here at the HP discover 2014. We'll be right back here for the short break.

Published Date : Jun 12 2014

SUMMARY :

Discover 2014 brought to you by HP. Uh, we talked to you guys at the dupe summit last week. So, um, you know, the stats You guys got, you got your heads down as well. and it's got all the open source packages directly within it, but where we really innovate is in the course. And you guys put stuff on top of that, But, um, so we take in the distribution, we're distributing all those projects, but where we really innovate is uh, the announcement with, uh, with, with HP Vertica, you here for that reason, is there other biz dev other activity So it's this hybrid solution where you have full SQL How so I asked you in the big data landscape, everyone's Bucher, So we have a lot of agility in the cloud if you will. into map bar for folks that don't know you guys as a company and for the folks that may have a misperception of what you So you can build a project in hive or some What's the big, uh, product marketing roadmap that you can So I think now over time, you know, Hadoop has sort of been this batch analytic Let me say that before. And over five, the customers which got the data, you guys do summit graduations,

ENTITIES

Entity	Category	Confidence
John Schroeder	PERSON	0.99+
Steve Woolwich	PERSON	0.99+
Steve	PERSON	0.99+
Jack Norris	PERSON	0.99+
HP	ORGANIZATION	0.99+
John Frodo	PERSON	0.99+
three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Steve Wooledge	PERSON	0.99+
50%	QUANTITY	0.99+
John furrier	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Matt BARR	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
30	QUANTITY	0.99+
300%	QUANTITY	0.99+
first	QUANTITY	0.99+
last week	DATE	0.99+
Aster	ORGANIZATION	0.99+
2008	DATE	0.98+
Q1	DATE	0.98+
Las Vegas, Nevada	LOCATION	0.98+
one product	QUANTITY	0.98+
34 employees	QUANTITY	0.98+
one system	QUANTITY	0.98+
evolvable	ORGANIZATION	0.98+
over five	QUANTITY	0.97+
SQL	TITLE	0.97+
three big whales	QUANTITY	0.97+
MapReduce	ORGANIZATION	0.96+
SiliconANGLE	ORGANIZATION	0.96+
first 12 months	QUANTITY	0.95+
Apache Mahal	ORGANIZATION	0.95+
map map	ORGANIZATION	0.95+
over 200	QUANTITY	0.95+
24	OTHER	0.94+
today	DATE	0.94+
Intel	ORGANIZATION	0.92+
Matt	PERSON	0.92+
Salesforce	ORGANIZATION	0.91+
2014	DATE	0.9+
Impala	TITLE	0.9+
Hadoop	ORGANIZATION	0.89+
HP Vertica	ORGANIZATION	0.89+
map bar	ORGANIZATION	0.89+
Hadoop	TITLE	0.86+
one company	QUANTITY	0.85+
dupe summit	EVENT	0.84+
about a month ago	DATE	0.83+
Bucher	PERSON	0.81+
Discover 2014	EVENT	0.78+
first week of may	DATE	0.77+
Apache drill	ORGANIZATION	0.74+
#HPDiscover	ORGANIZATION	0.73+
Mapbox	TITLE	0.73+
2013	DATE	0.72+
SQL on	TITLE	0.7+
art technologies	ORGANIZATION	0.63+
Apache	ORGANIZATION	0.61+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Jack Norris: