Wendy Pfeiffer, Nutanix | Qualys Security Conference 2019

>>from Las >>Vegas. It's the cues covering quality security Conference 2019 Bike. Wallace. Hey, welcome back It ready? Geoffrey here with the Cube were at the Bellagio in Las Vegas. It's actually raining outside, which is pretty odd, but through the desert is happy. We're here at the Kuala Security Conference. Been going on for 19 years. It's our first time here. We're excited to be here, but we got a really familiar Gaston. She's been on a number of times that Nutanix next, conferences and girls who code conferences, etcetera. So we're happy to have back Wendy Pfeifer. She's the C I O of Nutanix and as of August, early this year, a board member for quality. So great to see you. >>Nice to see you again, too. So it's raining outside. I'll have to get out. >>I know it's pretty, uh, pretty cool, actually. School coming in on the plane. But let's let's jump into a little bit from your C I, Oh, roll. We're talking a lot about security and in the age old thing came up in the keynote. You know, there's companies that have been hacked, and then there's companies that have been hacked and don't know it yet, but we're introducing 1/3 type of the company. Here is one of the themes which is that you actually can prevent, you know, not necessarily getting hacked, but kind of the damage and destruction and the duration once people get in. I'm just curious from your CEO >>hat. How >>do you look at this problem? That the space is evolving so quickly? How do you kind of organize your your thoughts around it? >>Yeah, for me. First of all, um, it starts with good architecture. So whether it's our own products running or third party products running, we need to ensure that those products are architected for resilience. And that third kind of company, the Resilient company, is one that has built in architecture er and a set of tools and service is that are focused on knowing that we will be hacked. But how can we minimize or even eliminate the damage from those hacks? And in this case, having the ability to detect those hacks when their incoming and to stop them autonomously is the key to HQ Wallace's play and the key to what I do as CEO at Nutanix, >>right? So one of the other things that keeps coming up here is kind of a budget allocation to security within the CEO budget on. And I think Mr Clark said that, you know, if you're doing 3% or less, you're losing, and you gotta be spending at least 8%. But I'm curious, because it to me is kind of like an insurance story. How much do you spend? How much do you allocate? Because potentially the downside is enormous. But you can't spend 100% of your budget just on security. So how do you think about kind of allocating budget as a percentage of spin versus the risk? >>Well, I love that question. That's part of the art of being a C i O A. C. So, you know, first of all, we have ah mixed portfolio of opportunities to spend toe hold to divest at any one time, and I t portfolio management has been around for 30 years, 40 years, almost as long as some of the people that I know. However, um, we always have that choice, right? We're aware of risk, and then we have the ability to spend. Now, of course, perfect security is to not operate at all. But that's about that's, you know, swinging too far the wrong way on Dhe. Then we also have that ability, maybe to not protect against anything and just take out a big old cyber security policy. And where is that policy might help us with lawsuits? It wouldn't necessarily have help us with ongoing operations. And so it's somewhere in the middle, and I liked some of the statistics that they share today. One of the big ones for me was that companies that tend to build resilient worlds of cybersecurity tend to spend about 10% of their total I t operating budgets on cyber security. That makes sense to me, and that reflects my track record at Nutanix and elsewhere, roughly in that amount of spending. Now you know, checking the box and saying, Well, we're spending 10% on cybersecurity doesn't really buy us that much, and also we have to think about how we're defining that spend on cyber security. Part of that spend is in building resilient architectures and building resilient code. And uh, that's sort of a dual purpose spend, because that also makes for performance code it makes for scalable, supportable code, et cetera. So you know, we can do well by doing good in this >>case. So again, just to stay on that beam permit, it went. So when you walk the floor at R S. A. And there's 50,000 people and I don't even know how many vendors and I imagine your even your I T portfolio now around security is probably tens of products, if not hundreds, and certainly tens of vendors again. How do you How do you? You kind of approach it. Do you have trusted advisors around certain point solutions? Are you leveraging? You know, system integrators or other types of specialists to help? You kind of sort through and get some clarity around this just kind of mess. >>Well, all of us actually are looking for that magic discernment algorithm. Wouldn't it be great if >>you could just >>walk up to a vendor and apply the algorithm? And ah ha. There's one who's fantastic. We don't have that, and so we've got a lot of layers of ingest. I try to leave room in my portfolio for stealth and emerging technologies because generally the more modern the technology Is the Mauritz keeping pace with the hackers out there and the bad guys out there? Um, we do have sort of that middle layer that surround the ability for us to operate at scale because we also have to operate these technologies. Even the most cutting edge technology sometimes lack some of the abilities for us to ingest them into our operations. And then they're sort of the tried and true bedrock that hopefully is built into products we consume. Everything from public Cloud service is to, uh, you know, hardware and so on. And so there's this range of choices. What we have to dio ultimately is we use that lens of operations and operational capability. And first of all, we also ensure that anything we ingest meets our design standards and our design standards include some things that I think are fascinating. I won't go into too much detail because I know how much you love this detail. But you know, things like are the AP eyes open? What is integration look like? What's the interaction design look like? And so those things matter, right? Ultimately, we have to be able to consume the data from those things, and then they have to work with our automation, our machine learning tools. Today at Nutanix, for example, you know, we weigh like toe. I'm happy to say we catch, you know, most if not all of any of the threats against us, and we deal with well over 95% of them autonomously. And so were a living example of that resilient organization that is, of course, being attacked, but at the same time hopefully responding in a resilient way. We're not perfect knock on wood, but we're actively engaged. >>So shifting gears a little bit a bit a bit now to your board hat, which again, Congratulations. Some curious. You know, your perspective on kind of breaking through the clutter from the from the board seat Cos been doing this for 19 years. Still relatively small company. But, you know, Philippe talked a lot about kind of company. Percy's me industry security initiatives that have to go through what are some of the challenges and opportunities see sitting at the board seat instead of down in the nitty gritty down the CEO. >>Well, first of all, um, quality is financially a well run responsible organization and one of Philippe and the leadership teams. Goals has always been toe operate profitably and tow. Have that hedge on DSO. What that means is that as consumers, we can count on the longevity of the organization and the company's ability to execute on its road map. It's the road map that I think is particularly attractive about Wallace. You know, I am who I am. I'm an operator. I'm a technologist. And so although I'm a board member and I care about all dimensions of the company, the most attractive component is that this this road map in those 19 years of execution are now coming to fruition at exactly the right time. For those of us who need these tools in these technologies to operate, this is a different kind of platform and its instrumented with machine learning with a I. At a time when the Attackers and the attacks are instrumented that way as well as as you mentioned, we have a lot of noise in the market today, and these point solutions, they're gonna be around for a while, right? We operate a messy and complex and wonderful ecosystem. But at the same time, the more that we can streamline, simplify on and sort of raised that bar. And the more we can depend on the collected data. From all of these point tools to instrument are automated responses, the better off we'll be. And so this is, Ah, platform whose whose time has come and as we see all of the road map items sort of coming to fruition. It's really, really exciting. And it's, you know, just speaking for a moment of someone who's been a leader in various technology companies in the security and, you know, technology space for some time. One of the most disappointing things about many technology startups is that they don't build in that that business strength. Thio have enough longevity and have enough of a hedge to execute on that brilliant vision. And so many brilliant ideas have just not seen the light of day because of a failure to execute. In this case, we have a company with a track record of execution that's monetized the build out of the platform, and now also these game changing technologies are coming to fruition. It's it's really, really exciting to be a part of it. >>So Wendy, you've mentioned a I machine learning Probably get checked. The transfer of a number of times 85 times is this interview. So it's really interesting, you know, kind of there's always a lot of chatter in the marketplace, But you talked about so many threats coming in and we heard about Mickey noticed. Not really for somebody sitting in front of a screen anymore to pay attention, this stuff. So when you look at the opportunity of machine learning and artificial intelligence and how that's going to change the role of the CEO and specifically and security when if you can share your thoughts on what that opens up >>absolutely s so there's kind of two streams here I'd love to talk about. The first is that we've had this concern as we've moved to Public Cloud and I t that i t people would be left behind. But in fact, after sort of ah little Dev ops blip where non i t people were writing code that was them consumed by enterprises were now seen the growth of I t. Again and what this relates to is this In the past, when we wanted to deploy something in public cloud. We had to be able Thio compose an express infrastructure as code. And, um, folks who are great at infrastructure are actually pretty lousy at writing code, and so that was a challenge. But today we have low code and know code tools, things like work Otto, for example, that my team uses that allow us to express the operational processes that we follow sort of the best practices and the accumulated knowledge of these I t professionals. And then we turn the machine on that inefficient code and the machine improves and refines the code. So now, adding machine learning to the mix enables us to have these I t professionals who know more than you'd ever imagine about storage and compute and scaling and data and cybersecurity and so on. And they're able to transform that knowledge into code that a machine can read, refine and execute against. And so we're seeing this leap forward in terms of the ability of some of these tools. Thio transform how we address the scale and the scope in the complexity of these challenges. And so on the one side, I think there's new opportunity for I T professionals and for those who have that operational expertise to thrive because of these tools on the other side, there's also the opportunity for the bad guys in the in the cyberspace. Um, Thio also engage with the use of thes tools. And so the use of these tools, that sort of a baseline level isn't enough. Now we need to train the systems, and the systems need to be responsive, performance resilient. And also, they need to have the ability to be augmented by to be integrated with these tools. And so suddenly we go from having this utopian. Aye, aye. Future where you know, the good looking male or female robot, you know, is the nanny for our kids, um, to something much more practical that's already in place, which is that the machine itself, the computer itself is refining in augmenting the things that human beings are doing and therefore able to be first of all, more responsive, more performance, but also to do that layer of work that is not unique to human discernment. >>Right? We hear that over and over because the press loved to jump on the general. May I think it's much more fun to show robots than then Really, the applied A I, which is lots of just kind of like Dev ops. Lots of little improvements. Yeah, lots of little places. >>Exactly. Exactly. You know, I mean, I kind of like the stories of our robot overlords, you know, take it over to. But the fact is, at the end of the day, these machine, it's just math. It's just mathematics. That's all it is. It's compute. >>So when you find let you go, I won't touch about women in tech. You know, you're a huge proponent of women in tech. You're very active on lots of boards and cure with Adriaan on the girls and Tech board where we last where we last sat down. Um, and you're making moves now. Obviously, you've already got a C title. Now you're doing more bored work. I just wonder if you can kind of share your thoughts of how this thing's kind of movement is progressing. It seems to have a lot of of weight behind it, but I don't know if the numbers air really reflecting that, but you're you're on the front lines. What can you shares? You know, you're trying to help women. That's much getting detect. But to stay into tech, I think, is what most of the stats talk about. >>Yeah, I've got a lot of thoughts on this. I think I'll try to bring our all the vectors together. So I recently was awarded CEO of the year by the Fisher Center for Data and Analytics and thank you very much. And the focus there is on inclusive analytics and inclusive. Aye, aye. And And I think this this is sort of a story that that makes the point. So if we think about all of the data that is training these technology tools and systems, um, and we think about the people who are creating these systems and the leaders who are our building, these systems and so on, for the most part, the groups of people who are working on these things technologists, particularly in Silicon Valley. They're not a diverse set of people. They're mostly male. They're overwhelmingly male. Many are from just a handful of of, um, you know, countries and groups, right? It's it's It's mainly, you know, Caucasian males, Indian males and Asian males. And and because of that, um, this lack of diverse thinking and diverse development is being reflected in the tools in ways that eventually will build barriers for folks who don't share those characteristics. As an example, Natural language processing tooling is trained by non diverse data sets, and so we have challenges with that. For example, people who are older speak a little bit more slowly and have different inflections in general on how they speak. And the voice recognition tools don't recognize them as often. People who have heavy accents, for example, are just not recognized. Yes, you know, I always have a phone, Um, and this is my iPhone and I have had an iPhone for 10 years. Siri, my, you know, helpful Agent has been on the phone in all those years. And in all of those years, um, I have had a daughter named Holly H O l L Y. And every time that I speak Thio, I dictate to Syria to send a message on. I use my daughter's name. Holly. Syria always responds with the spelling. H o L I. The Hindu holiday. Now, in 10 years, Siri has never learned that. When I say Holly, I'm most likely mean my daughter >>was in the context of the sentence. >>Exactly. Never, ever, ever. Because, you know Siri is an Aye aye, if you will. That was built without allowing for true user input through training at the point of conversation. And so s So that's it. That's bad architecture. There's a lot of other challenges with that architecture that reflect on cybersecurity and so on. One tiny example. But I think that, um now more than ever, we need diverse voices in the mix. We need diverse training data. We need, you know, folks who have different perspectives and who understand different interaction design to be not only as a tech entrepreneurs, builders and leaders of country of companies like, you know, girls in tech Support's educating women supporting women entrepreneurs. I'm I'm also on the board of another group called Tech Wald. That's all about bringing US combat veterans into the technology workforce. There's another diverse group of people who again can have a voice in this technology space. There are organizations that I work with that go into the refugee that the permanent refugee camps and find technically qualified folks who can actually build some of this training data for, ah, you know, analytics and a I We need much, much more of that. So, you know, my heart is full of the opportunity for this. My my head's on on fire, you know, and just trying to figure out how can we get the attention of technology companies of government leaders and and before it's too late, are training data sets are growing exponentially year over year, and they're being built in a way that doesn't reflect the potential usage. I was actually thinking about this the other day. I had an elderly neighbor who ah, spoke with me about how excited he waas that he he no longer could drive. He wasn't excited about that. He no longer could drive. He couldn't see very well and couldn't operate a car. And he was looking forward to autonomous vehicles because he was gonna have a mobility and freedom again. Right? Um, but he had asked me to help him to set up something that he had on his computer, and it was actually on his phone. But he there was their voice commands, but But it didn't understand him. He was frustrated. So he said, Could you help me. And I thought, man, if his mobile phone doesn't understand him, how's the autonomous vehicle going to understand him so that the very population who needs these technologies the most will will be left out another digital divide? And and, um, now is the moment while these tools and technologies are being developed, a word about Wallace. You know, when I was recruited for the board, um, you know, they already had 50 50 gender parity on the board. It wasn't even a thing in my interviews. We didn't talk about the fact that I am female at all. We talked about the fact that I'm an operator, that I'm a technologist. And so, um, you know that divide? It was already conquered on HK. Wallace's board that's so not true for many, many other organizations and leadership teams is particularly in California Silicon Valley. And so I think there's a great opportunity for us to make a difference. First of all, people like me who have made it, you know, by representing ourselves and then people of every gender, every color, every ethnicity, immigrants, et cetera, um, need to I'm begging you guys stick with it, stay engaged don't let the mean people. The naysayers force you to drop out. Um, you know, reconnect with your original values and stay strong because that's what it's gonna take. >>It's a great message. And thank you for your passion and all your hard work in the space. And the today it drives better outcomes is not only the right thing to do and a good thing to do that it actually drives better outcomes. >>We see that. >>All right, Wendy, again. Always great to catch up. And congratulations on the award and the board seat and look forward to seeing you next time. Thank you. All right, She's windy. I'm Jeff. You're watching the Cube with a quality security conference at the Bellagio in Las Vegas. Thanks for watching. We'll see you next time.

Published Date : Nov 21 2019

SUMMARY :

We're excited to be here, but we got a really familiar Gaston. Nice to see you again, too. Here is one of the themes which is that you actually How HQ Wallace's play and the key to what I do as CEO at Nutanix, you know, if you're doing 3% or less, you're losing, and you gotta be spending at least 8%. And so it's somewhere in the middle, and I liked some of the statistics So when you walk the floor at Well, all of us actually are looking for that magic discernment I'm happy to say we catch, you know, most if not all of any from the from the board seat Cos been doing this for 19 years. And the more we can depend on the collected data. you know, kind of there's always a lot of chatter in the marketplace, But you talked about so many and the systems need to be responsive, performance resilient. We hear that over and over because the press loved to jump on the general. But the fact is, at the end of the day, I just wonder if you can kind of share your thoughts of the year by the Fisher Center for Data and Analytics and thank you very of companies like, you know, girls in tech Support's educating women outcomes is not only the right thing to do and a good thing to do that it actually drives better outcomes. And congratulations on the award and the board seat and

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
3%	QUANTITY	0.99+
Clark	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Wendy Pfeifer	PERSON	0.99+
Wendy Pfeiffer	PERSON	0.99+
100%	QUANTITY	0.99+
Wallace	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Geoffrey	PERSON	0.99+
10 years	QUANTITY	0.99+
Wendy	PERSON	0.99+
40 years	QUANTITY	0.99+
10%	QUANTITY	0.99+
August	DATE	0.99+
50	QUANTITY	0.99+
Vegas	LOCATION	0.99+
Philippe	PERSON	0.99+
Tech Wald	ORGANIZATION	0.99+
19 years	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Siri	TITLE	0.99+
50,000 people	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
85 times	QUANTITY	0.99+
Holly	PERSON	0.99+
Fisher Center for Data and Analytics	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Percy	PERSON	0.99+
Las Vegas	LOCATION	0.99+
California Silicon Valley	LOCATION	0.99+
Las	LOCATION	0.99+
first	QUANTITY	0.99+
Holly H O l L Y.	PERSON	0.99+
one	QUANTITY	0.98+
about 10%	QUANTITY	0.98+
One	QUANTITY	0.97+
Mickey	PERSON	0.97+
Adriaan	PERSON	0.97+
Today	DATE	0.97+
today	DATE	0.97+
Syria	LOCATION	0.97+
Indian	OTHER	0.96+
tens of products	QUANTITY	0.96+
early this year	DATE	0.96+
Thio	PERSON	0.96+
US	LOCATION	0.95+
First	QUANTITY	0.95+
two streams	QUANTITY	0.95+
Asian	OTHER	0.94+
Kuala Security Conference	EVENT	0.93+
Caucasian	OTHER	0.93+
one side	QUANTITY	0.92+
Gaston	PERSON	0.89+
30 years	QUANTITY	0.89+
tens of vendors	QUANTITY	0.87+
R S. A. A	LOCATION	0.86+
Cube	ORGANIZATION	0.85+
over 95%	QUANTITY	0.83+
Qualys Security Conference 2019	EVENT	0.82+
at least 8%	QUANTITY	0.81+
Nutanix	LOCATION	0.77+
one time	QUANTITY	0.76+
Mauritz	ORGANIZATION	0.75+
Conference 2019	EVENT	0.75+
Otto	ORGANIZATION	0.74+
DSO	ORGANIZATION	0.72+
Hindu	ORGANIZATION	0.69+
50 gender	QUANTITY	0.68+
themes	QUANTITY	0.67+
example	QUANTITY	0.63+

Jack Norris - Hadoop Summit 2014 - theCUBE - #HadoopSummit

>>The queue at Hadoop summit, 2014 is brought to you by anchor sponsor Hortonworks. We do, I do. And headline sponsor when disco we make Hadoop invincible >>Okay. Welcome back. Everyone live here in Silicon valley in San Jose. This is a dupe summit. This is Silicon angle and Wiki bonds. The cube is our flagship program. We go out to the events and extract the signal to noise. I'm John barrier, the founder SiliconANGLE joins my cohost, Jeff Kelly, top big data analyst in the, in the community. Our next guest, Jack Norris, COO of map R security enterprise. That's the buzz of the show and it was the buzz of OpenStack summit. Another open source show. And here this year, you're just seeing move after, move at the moon, talking about a couple of critical issues. Enterprise grade Hadoop, Hortonworks announced a big acquisition when all in, as they said, and now cloud era follows suit with their news. Today, I, you sitting back saying, they're catching up to you guys. I mean, how do you look at that? I mean, cause you guys have that's the security stuff nailed down. So what Dan, >>You feel about that now? I think I'm, if you look at the kind of Hadoop market, it's definitely moving from a test experimental phase into a production phase. We've got tremendous customers across verticals that are doing some really interesting production use cases. And we recognized very early on that to really meet the needs of customers required some architectural innovation. So combining the open source ecosystem packages with some innovations underneath to really deliver high availability, data protection, disaster recovery features, security is part of that. But if you can't predict the PR protect the data, if you can't have multitenancy and separate workflows across the cluster, then it doesn't matter how secure it is. You know, you need those. >>I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. Silicon lucky bond is so successful, but I just don't understand your business model without plates were free content and they have some underwriters. So you guys have been very successful yet. People aren't looking at map are as good at the quiet leader, like you doing your business, you're making money. Jeff. He had some numbers with us that in the Hindu community, about 20% are paying subscriptions. That's unlike your business model. So explain to the folks out there, the business model and specifically the traction because you have >>Customers. Yeah. Oh no, we've got, we've got over 500 paying customers. We've got at least $1 million customer in seven different verticals. So we've got breadth and depth and our business model is simple. We're an enterprise software company. That's looking at how to provide the best of open source as well as innovations underneath >>The most open distribution of Hadoop. But you add that value separately to that, right? So you're, it's not so much that you're proprietary at all. Right. Okay. >>You clarify that. Right. So if you look at, at this exciting ecosystem, Hadoop is fairly early in its life cycle. If it's a commoditization phase like Linux or, or relational database with my SQL open source, kind of equates the whole technology here at the beginning of this life cycle, early stages of the life cycle. There's some architectural innovations that are really required. If you look at Hadoop, it's an append only file system relying on Linux. And that really limits the types of operations. That types of use cases that you can do. What map ours done is provide some deep architectural innovations, provide complete read-write file systems to integrate data protection with snapshots and mirroring, et cetera. So there's a whole host of capabilities that make it easy to integrate enterprise secure and, and scale much better. Do you think, >>I feel like you were maybe a little early to the market in the sense that we heard Merv Adrian and his keynote this morning. Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about nine years into Hadoop. Do you feel like maybe you guys were a little early and now you're at a tipping point, whereas these more, as more and more deployments get ready to go to production, this is going to be an area that's going to become increasingly important. >>I think, I think our timing has been spectacular because we, we kind of came out at a time when there was some customers that were really serious about Hadoop. We were able to work closely with them and prove our technology. And now as the market is just ramping, we're here with all of those features that they need. And what's a, what's an issue. Is that an incremental improvement to provide those kind of key features is not really possible if the underlying architecture isn't there and it's hard to provide, you know, online real-time capabilities in a underlying platform that's append only. So the, the HDFS layer written in Java, relying on the Linux file system is kind of the, the weak underbelly, if you will, of, of the ecosystem. There's a lot of, a lot of important developments happening yarn on top of it, a lot of really kind of exciting things. So we're actively participating in including Apache drill and on top of a complete read-write file system and integrated Hindu database. It just makes it all come to life. >>Yeah. I mean, those things on top are critical, but you know, it's, it's the underlying infrastructure that, you know, we asked, we keep on community about that. And what's the, what are the things that are really holding you back from Paducah and production and the, and the biggest challenge is they cited worth high availability, backup, and recovery and maintaining performance at scale. Those are the top three and that's kind of where Matt BARR has been focused, you know, since day one. >>So if you look at a major retailer, 2000 nodes and map bar 50 unique applications running on a single cluster on 10,000 jobs a day running on top of that, if you look at the Rubicon project, they recently went public a hundred million add actions, a hundred billion ad auctions a day. And on top of that platform, beats music that just got acquired for $3 billion. Basically it's the underlying map, our engine that allowed them to scale and personalize that music service. So there's a, there's a lot of proof points in terms of how quickly we scale the enterprise grade features that we provide and kind of the blending of deep predictive analytics in a batch environment with online capabilities. >>So I got to ask you about your go to market. I'll see Cloudera and Hortonworks have different business models. Just talk about that, but Cloudera got the massive funding. So you get this question all the time. What do you, how do you counter that army and the arms race? I think >>I just wrote an article in Forbes and he says cash is not a strategy. And I think that was, that was an excellent, excellent article. And he goes in and, you know, in this fast growing market, you know, an amount of money isn't necessarily translate to architectural innovations or speeding the development of that. This is a fairly fragmented ecosystem in terms of the stack that runs on top of it. There's no single application or single vendor that kind of drives value. So an acquisition strategy is >>So your field Salesforce has direct or indirect, both mixable. How do you handle the, because Cloudera has got feet on the street and every squirrel will find it, not if they're parked there, parking sales reps and SCS and all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. Yeah. And they're going to actually try to engage the clients. So, you know, I guess it is a strategy if they're deploying sales and marketing, right? So >>The beauty about that, and in fact, we're all in this together in terms of sharing an API and driving an ecosystem, it's not a fragmented market. You can start with one distribution and move to another, without recompiling or without doing any sort of changes. So it's a fairly open community. If this were a vendor lock-in or, you know, then spending money on brand, et cetera, would, would be important. Our focus is on the, so the sales execution of direct sales, yes, we have direct sales. We also have partners and it depends on the geographies as to what that percentage is. >>And John Schroeder on with the HP at fifth big data NYC has updated the HP relationship. >>Oh, excellent. In fact, we just launched our application gallery app gallery, make it very easy for administrators and developers and analysts to get access and understand what's available in the ecosystem. That's available directly on our website. And one of the featured applications there today is an integration with the map, our sandbox and HP Vertica. So you can get early access, try it and get the best of kind of enterprise grade SQL first, >>First Hadoop app store, basically. Yeah. If you want to call it that way. Right. So like >>Sure. Available, we launched with close to 30, 30 with, you know, a whole wave kind of following that. >>So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. So, you know, there's a lot of talk about that. Some confusion about the different methods for applying SQL on predicts or map art takes an open approach. I know you'll support things like Impala from, from a competitor Cloudera, talk about that approach from a map arts perspective. >>So I guess our, our, our perspective is kind of unbiased open source. We don't try to pick and choose and dictate what's the right open source based on either our participation or some community involvement. And the reality is with multiple applications being run on the platform, there are different use cases that make difference, you know, make different sense. So whether it's a hive solution or, you know, drill drills available, or HP Vertica people have the choice. And it's part of, of a broad range of capabilities that you want to be able to run on the platform for your workflows, whether it's SQL access or a MapReduce or a spark framework shark, et cetera. >>So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, you've got Impala, you've got hive. And the stinger initiative is, is that whole kind of SQL on Hadoop ecosystem, still working itself out. Are we going to have this many options in a year or two years from now? Or are they complimentary and potentially, you know, each has its has its role. >>I think the major differences is kind of how it deals with the new data formats. Can it deal with self-describing data? Sources can leverage, Jason file does require a centralized metadata, and those are some of the perspectives and advantages say the Apache drill has to expand the data sets that are possible enabled data exploration without dependency on a, on an it administrator to define that, that metadata. >>So another, maybe not always as exciting, but taking workloads from existing systems, moving them to Hadoop is one of the ways that a lot of people get started with, to do whether associated transformation workloads or there's something in that vein. So I know you've announced a partnership with Syncsort and that's one of the things that they focus on is really making it as easy as possible to meet those. We'll talk a little bit about that partnership, why that makes sense for you and, and >>When your customer, I think it's a great proof point because we announced that partnership around mainframe offload, we have flipped comScore and experience in that, in that press release. And if you look at a workload on a mainframe going to duke, that that seems like that's a, that's really an oxymoron, but by having the capabilities that map R has and making that a system of record with that full high availability and that data protection, we're actually an option to offload from mainframe offload, from sand processing and provide a really cost effective, scalable alternative. And we've got customers that had, had tried to offload from the mainframe multiple times in the past, on successfully and have done it successfully with Mapbox. >>So talk a little bit more about kind of the broader partnership strategy. I mean, we're, we're here at Hadoop summit. Of course, Hortonworks talks a lot about their partnerships and kind of their reseller arrangements. Fedor. I seem to take a little bit more of a direct approach what's map R's approach to kind of partnering and, and as that relates to kind of resell arrangements and things like, >>I think the app gallery is probably a great proof point there. The strategy is, is an ecosystem approach. It's having a collection of tools and applications and management facilities as well as applications on top. So it's a very open strategy. We focus on making sure that we have open API APIs at that application layer, that it's very easy to get data in and out. And part of that architecture by presenting standard file system format, by allowing non Java applications to run directly on our platform to support standard database connections, ODBC, and JDBC, to provide database functionality. In addition to kind of this deep predictive analytics really it's about supporting the broadest set of applications on top of a single platform. What we're seeing in this kind of this, this modern architecture is data gravity matters. And the more processing you can do on a single platform, the better off you are, the more agile, the more competitive, right? >>So in terms of, so you're partnering with people like SAS, for example, to kind of bring some of the, some of the analytic capabilities into the platform. Can you kind of tell us a little bit about any >>Companies like SAS and revolution analytics and Skytree, and I mean, just a whole host of, of companies on the analytics side, as well as on the tools and visualization, et cetera. Yeah. >>Well, I mean, I, I bring up SAS because I think they, they get the fact that the, the whole data gravity situation is they've got it. They've got to go to where the data is and not have the data come to them. So, you know, I give them credit for kind of acknowledging that, that kind of big data truth ism, that it's >>All going to the data, not bringing the data >>To the computer. Jack talk about the success you had with the customers had some pretty impressive numbers talking about 500 customers, Merv agent. The garden was on with us earlier, essentially reiterating not mentioning that bar. He was just saying what you guys are doing is right where the puck is going. And some think the puck is not even there at the same rink, some other vendors. So I gotta give you props on that. So what I want you to talk about the success you have in specifically around where you're winning and where you're successful, you guys have struggled with, >>I need to improve on, yeah, there's a, there's a whole class of applications that I think Hadoop is enabling, which is about operations in analytics. It's taking this, this higher arrival rate machine generated data and doing analytics as it happens and then impacting the business. So whether it's fraud detection or recommendation engines, or, you know, supply chain applications using sensor data, it's happening very, very quickly. So a system that can tolerate and accept streaming data sources, it has real-time operations. That is 24 by seven and highly available is, is what really moves the needle. And that's the examples I used with, you know, add a Rubicon project and, you know, cable TV, >>The very outcome. What's the primary outcomes your clients want with your product? Is it stability? And the platform has enabled development. Is there a specific, is there an outcome that's consistent across all your wins? >>Well, the big picture, some of them are focused on revenues. Like how do we optimize revenue either? It's a new data source or it's a new application or it's existing application. We're exploding the dataset. Some of it's reducing costs. So they want to do things like a mainframe offload or data warehouse offload. And then there's some that are focused on risk mitigation. And if there's anything that they have in common it's, as they moved from kind of test and looked at production, it's the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. So it's not, it's not anything new. It's just like, Hey, we've got SLS and I've got data protection policies, and I've got a disaster recovery procedure. And why can't I expect the same level of capabilities in Hindu that I have today in those other systems. >>It's a final question. Where are you guys heading this year? What's your key objectives. Obviously, you're getting these announcements as flurry of announcements, good success state of the company. How many employees were you guys at? Give us a quick update on the numbers. >>So, you know, we just reported this incredible momentum where we've tripled core growth year over year, we've added a tremendous amount of customers. We're over 500 now. So we're basically sticking to our knitting, focusing on the customers, elevating the proof points here. Some of the most significant customers we have in the telco and financial services and healthcare and, and retail area are, you know, view this as a strategic weapon view, this is a huge competitive advantage, and it's helping them impact their business. That's really spring our success. We've, you know, we're, we're growing at an incredible clip here and it's just, it's a great time to have made those calls and those investments early on and kind of reaping the benefits. >>It's. Now I've always said, when we, since the first Hadoop summit, when Hortonworks came out of Yahoo and this whole community kind of burst open, you had to duke world. Now Riley runs at it's a whole different vibe of itself. This was look at the developer vibe. So I got to ask you, and we would have been a big fan. I mean, everyone has enough beachhead to be successful, not about map arbors Hortonworks or cloud air. And this is why I always kind of smile when everyone goes, oh, Cloudera or Hortonworks. I mean, they're two different animals at this point. It would do different things. If you guys were over here, everyone has their quote, swim lanes or beachhead is not a lot of super competition. Do you think, or is it going to be this way for awhile? What's your fork at some? At what point do you see more competition? 10 years out? I mean, Merv was talking a 10 year horizon for innovation. >>I think that the more people learn and understand about Hadoop, the more they'll appreciate these kind of set of capabilities that matter in production and post-production, and it'll migrate earlier. And as we, you know, focus on more developer tools like our sandbox, so people can easily get experienced and understand kind of what map are, is. I think we'll start to see a lot more understanding and momentum. >>Awesome. Jack Norris here, inside the cube CMO, Matt BARR, a very successful enterprise grade, a duke player, a leader in the space. Thanks for coming on. We really appreciate it. Right back after the short break you're live in Silicon valley, I had dupe December, 2014, the right back.

Published Date : Jun 4 2014

SUMMARY :

The queue at Hadoop summit, 2014 is brought to you by anchor sponsor I mean, cause you guys have that's the security stuff nailed down. I think I'm, if you look at the kind of Hadoop market, I got to ask you a direct question since we're here at Hadoop summit, because we get this question all the time. That's looking at how to provide the best of open source But you add that value separately to So if you look at, at this exciting ecosystem, Talk about, you know, it's about 10 years when you start to get these questions about security and governance and we're about isn't there and it's hard to provide, you know, online real-time And what's the, what are the things that are really holding you back from Paducah So if you look at a major retailer, 2000 nodes and map bar 50 So I got to ask you about your go to market. you know, in this fast growing market, you know, an amount of money isn't necessarily all the enterprise accounts, you know, they're going to get the, squirrel's going to find a nut once in awhile. We also have partners and it depends on the geographies as to what that percentage So you can get early If you want to call it that way. a whole wave kind of following that. So talk a little bit about, you know, speaking of verdict and kind of the sequel on Hadoop. And it's part of, of a broad range of capabilities that you want So, yeah, I mean there is because there's so many different there's spark there's, you know, you can run HP Vertica, of the perspectives and advantages say the Apache drill has to expand the data sets why that makes sense for you and, and And if you look at a workload on a mainframe going to duke, So talk a little bit more about kind of the broader partnership strategy. And the more processing you can do on a single platform, the better off you are, Can you kind and I mean, just a whole host of, of companies on the analytics side, as well as on the tools So, you know, I give them credit for kind of acknowledging that, that kind of big data truth So what I want you to talk about the success you have in specifically around where you're winning and you know, add a Rubicon project and, you know, cable TV, And the platform has enabled development. the key capabilities that they have in enterprise systems today that they want to make sure they're in Hindu. Where are you guys heading this year? So, you know, we just reported this incredible momentum where we've tripled core and this whole community kind of burst open, you had to duke world. And as we, you know, focus on more developer tools like our sandbox, a duke player, a leader in the space.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Jack Norris	PERSON	0.99+
John Schroeder	PERSON	0.99+
HP	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
$3 billion	QUANTITY	0.99+
December, 2014	DATE	0.99+
Jason	PERSON	0.99+
Matt BARR	PERSON	0.99+
10,000 jobs	QUANTITY	0.99+
Today	DATE	0.99+
10 year	QUANTITY	0.99+
Syncsort	ORGANIZATION	0.99+
Dan	PERSON	0.99+
Silicon valley	LOCATION	0.99+
John barrier	PERSON	0.99+
Java	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
24	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
this year	DATE	0.99+
Jack	PERSON	0.99+
fifth	QUANTITY	0.99+
Linux	TITLE	0.99+
Skytree	ORGANIZATION	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
today	DATE	0.98+
one	QUANTITY	0.98+
Merv	PERSON	0.98+
about 10 years	QUANTITY	0.98+
San Jose	LOCATION	0.98+
Hadoop	EVENT	0.98+
about 20%	QUANTITY	0.97+
seven	QUANTITY	0.97+
over 500	QUANTITY	0.97+
a year	QUANTITY	0.97+
about 500 customers	QUANTITY	0.97+
SQL	TITLE	0.97+
seven different verticals	QUANTITY	0.97+
two years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
2014	DATE	0.96+
Apache	ORGANIZATION	0.96+
Hadoop	LOCATION	0.95+
SiliconANGLE	ORGANIZATION	0.94+
comScore	ORGANIZATION	0.94+
single vendor	QUANTITY	0.94+
day one	QUANTITY	0.94+
Salesforce	ORGANIZATION	0.93+
about nine years	QUANTITY	0.93+
Hadoop Summit 2014	EVENT	0.93+
Merv	ORGANIZATION	0.93+
two different animals	QUANTITY	0.92+
single application	QUANTITY	0.92+
top three	QUANTITY	0.89+
SAS	ORGANIZATION	0.89+
Riley	PERSON	0.88+
First	QUANTITY	0.87+
Forbes	TITLE	0.87+
single cluster	QUANTITY	0.87+
Mapbox	ORGANIZATION	0.87+
map R	ORGANIZATION	0.86+
map	ORGANIZATION	0.86+

Jack Norris - BigDataNYC 2013 - theCUBE - #BigDataNYC

>>I from Midtown Manhattan, the cute quiet coverage of big data NYC Civicon angled, Wiki bonds production made possible by Hortonworks. We do hairdo and lamb disco and new made invincible. And now your hosts, John furrier and Volante >>Hi buddy. We're back. This is Dave Volante with Jeff Kelly with Wiki bond. And this is the cube Silicon angle's continuous production. We're here at big data NYC right across the street from the Hilton where strata comp and a dupe world is going on. We've got a multi-time cube guest, Jack Norris, the CMO of map bars here, Jack. Welcome back to the cube first. So by the way, thank you so much for the support. As you know, we're across the street here at the Warwick hotel map, our, you guys have always been so generous supporting the cube. We can't thank you enough for that. So really appreciate it. Thank you. So we were able to listen to your keynote yesterday. It was, we, we, we weren't broadcasting, you know, head to head yesterday and had an opportunity to hear your keynote. So, first of all, how did that go? I want to ask you some questions about it. >>It, it was a really well-received and I think people were kind of clamoring to try to separate the myths from, from reality on, on Hadoop, >>We had three myths that you talked about, you know, one related to the distraction. I'd like to get into some of those. So what was the, the first myth was around the, the, the, the district distribution battle. So take us through that. >>So, you know, th the impression that it's a knock-down drag-out competitive battle across Hadoop distributions was the first myth. And the reality is that all of the distribution share the same open source Apache code. And this is one of the first markets that's really, really created, or the first open-source technologies it's really created a market. I mean, look, what's happened here with this whole, this whole big data and Hadoop, but given that early stage, there's the requirement to really combine that open source code with additional innovations to meet customer needs. And so what you see is you see those aggregators that are taken open source, you see others that are taking the open source, and then adding maybe management utility, couple of, of, you know, different applications on top. And then our approach at map R is we're taking the open source with those management innovations, doing some development, the open source community with things like Apache drill, and then really focusing on the underlying architecture, the data platform and providing innovations at that layer. So >>Actually sort of the three major destroys that we talk about all the time. You know, you guys, Hortonworks and Hadoop, you guys have been consistent the whole time as has Hortonworks, right? Cloud era basically put out a post recently saying, Hey, kind of going in a different direction, sort of what I call the tapped out of the Hadoop distro, you know, piece of it. But so there's a lot of discussion around it. You're putting forth the, Hey, it's not an internet seen war, but does it matter is my question? >>Well, I think if you take a step back, the Hadoop ecosystem is incredibly strong growing very, very quickly, fastest growing big data technology, one of the top 10 technologies overall. And I think it's because we are sharing the same API. It is possible for customers to learn on one, develop and move seamlessly to another. And, you know, in the keynote, I talked about the difference between the no SQL market, which is, you know, there is no consensus there and, and customers have to figure out not only what's the right word workload, but what's the technology that's actually going to have some staying power, right? >>That's a powerful comment. Amazon turn the data center and into an API, or you as the duke community is essentially turning data, access into an API. And that is a very powerful and leverageable concept. Okay. Your second myth was around the whole, no SQL yes. Piece of it. You help you put up a slide. I thought I read Jeff Kelly's reports. And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, but so take us through myth. Number two >>Too. I'm sure we missed some >>There wasn't room on the slide for anymore. >>The, yeah, it's basically about the consensus. There is no real consensus. There's no common API. There's no ability to move applications seamlessly across no SQL solutions. If you look at one no SQL solution, and that's, HBase a big inherent advantage because it's integrated with Hindu, you know, this whole trend is about compute and data together. So if you've got a no sequel solution, that's on that same, you know, massive data store, you know, big leg up. And, and then we got into the, well, if you've got HBase, it's included in all the distributions and all the distribution share the same open source, then obviously it must run the same across all distributions. And there, we shared some pretty interesting data to show the difference. When you, when you do architectural differences and innovations underneath that you can dramatically change the performance of, of not only MapReduce, but of no SQL. Yes. >>Okay. So not all no SQL is created equally. Not all HBase is created equally as essentially what you're saying there. Now the third piece was to dupe is enterprise ready, right? Yeah. So you guys were first to say, well, we have a Hadoop platform that's enterprise ready way ahead on that. Got criticized a lot for going down that path shrugged and said, okay, we'll just keep doing business with customers. And you've been again, very clear and consistent on that. So talk about the third myth >>And that's, you know, is, is Hadoop ready for prime time? And I think the way to combat that myth is by customer examples and showing the tremendous success that customers are enjoying with Hadoop. And, you know, we, we don't have time on the cube here to go through all of them, but, you know, I like to point out 90 billion auctions a day with Rubicon, they've surpassed Google in terms of ad reach. They're doing that on Mapbox 1.7 trillion events a month with comScore that's on, on map bar. You look in, in traditional enterprise, you know, a single retailer with over 2000 nodes of Hadoop. I mean, it's a key part of their merchandising and retail operations, and combining all sorts of, of data feeds and all sorts of use cases there, financial services over a thousand nodes of risk medication, personalized offers streamlining their operations. I mean, it's, it's dramatic. And then, you know, we shared some of the more, more interesting ones, esoteric ones like garbage and whiskey and weather prediction. >>There was consider these, we even as diverse and eclectic as they are, they consider these mission critical application. >>Oh, absolutely. No it it's. And I think that's the difference because what we're talking about is not Hadoop as this cash, right? This temporary processing, where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And yes, there are applications like that, but companies soon realized that if I'm going to use this as a key part of my operations, and it's about data on compute, then I want a consistent permanent store. I want a system of record. So all of the SLS and high availability and data protection features that they expect in their enterprise applications should be present in Hadoop, right? That's where we focus. Let's run down a couple of those. >>What are some of the key capabilities that you need in an enterprise enterprise grade platform? That map bar is >>Well, let's, let's take, let's take business continuity cause that's important if you're really going to trust data there. And you know, one of the big drivers as you expand data is how much am I going to spend on it? And if you look at a large investment bank, $270 million of their budget, not total, but incremental to address the additional capacity, there's a big emphasis for let's look at a better way to do that. So instead of spending $15,000 a terabyte, if you can spend a few hundred dollars a terabyte, that's a huge, huge advantage. And that's the focus of Hindu, but to do that, well, then the features that are in this enterprise storage have to be present. And we're talking about, you know, mirroring and not a copy table function, but replication, that's how that's how organizations do it, right. If you're going to recovery and recovery, you know, you can't back up a petabyte of information through a copy function, right? You have to do a snapshot and the snapshots have to be consistent, right. And, and we're not saying anything that, you know, an enterprise administrator doesn't know, there is some confusion when you're more on the developer side as to what these features are and the difference between a fuzzy snapshot and a point in time, consistent snaps. >>Got it. So let's talk a little bit about the, the enterprise data hub, this, this concept that Michael Wilson with clutter introduced yesterday. Tell us a little bit about your take on, on, on Mike's I guess, definition and, and essentially I think trying to name the category of kind of what Hadoop can do and what, and where it sits in the architecture. Did you agree with his, his, >>Yeah. I mean, if you look at, at that description, it's about I'm taking important data and I'm putting it in a dupe and I'm combining a lot of different data sources and it's been referred to as a data lake and a data reservoir and a data ocean. I mean, we've heard a lot of terms. We worked with an outside consultant that was originally an architect at Terre data. It's been about eight months, almost a year ago now where he defined it and enterprise data hub. And it's it's, he went through kind of the list of requirements. And once you move from a transitory to a permanent store, then that becomes an enterprise data hub. And an enterprise data hub can be used to select and process information, maybe it's ETL and serve some downstream applications. It can also be useful to do analysis directly on it, to, you know, to serve different business functions. But the system requirements that he established for that I think are absolutely true. And it's, you have to have the full data protection. You have to have the full disaster recovery. You have to have the full high availability because this is going to be important data serving the organization. If it's data that you can lose, if it's data that you, you don't really care about having highly available, then it's a very narrow use case that that data hub serves. >>So you're saying the enterprise data hub isn't ready for prime time. >>No, I'm saying that there, there are requirements. And we have companies today that have deployed an enterprise data hub and they are quite successful with it. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 times faster and it's 10 times cheaper than what they're seeing. >>Soundbite, Dave, >>I agree, but it's nuanced. Right. And so, you know, the customers cause a lot of vendors, right? They're all saying the same thing to the customers, right? So you've got your messaging that you've, you know, you've proven out over the last several years and then the entire market starts to use the same terminology. So it is, this is why I, like, I think this, what is, what are those >>Things? We're in a little bit of this, this kind of marketing fog here in the relative early stages. I think the best response there is customer proof points. And I think some education in the very beginning, you know, when they're in development and test, it's really important to understand, you know, what is Hadoop and what can I use it for and what data source am I going to leverage? I think the features that we're talking about really start to show up as you deploy in production. And as you expand its use in production and there we've enjoyed tremendous success, >>But he would argue that you have a lead in this space. I wouldn't, I don't think you would either the space being robustness enterprise ready, mission criticality is your lead increasing, decreasing staying the same. >>What's your sense? Well, it's hard cause there's no, you know, th th there's no external service that's out there, you know, interviewing every customer and, and giving numbers. I do know that we passed 500 paying customers. I do know that we've got significant deployments and you can measure those in terms of number of nodes, you know, in the thousands of nodes, you can measure those in terms of use cases. So we've got, you know, one company they've passed 20 different use cases on the same cluster. I think that's an interesting proof point. We're scaling in terms of the number of, of people in an organization that are trained in leveraging the data in map are again in the, in the thousands. So, you know, I think this market is so big and so dynamic that this isn't about, you know, one company success at the expense of everyone. Else's zero sum game. I think, you know, we're all here kind of raising this, this boat and focusing on this paradigm shift, but when it comes to production success, that's our focus. And I think that's where we've, we've proven that >>One thing I'm really want to get your opinion on, you know, as, as to do matures and some of the innovations you guys are doing and, and making the platform, you know, basically a multi application platform, you can do more things with Hadoop. And we've been talking about this on the cube, is that as that happens, you're going to start you as an industry. You're going to start bumping up against the EDW vendors and some of the other database vendors in the traditional world. And you're now you're doing some of the things that those, those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. There's no overlap. We're gonna all play nice. But increasingly we're seeing that there is an overlap. How do you view that? Is that, and what is your relationship with those, with those EDW vendors and, and what are you hearing from customers when you go into a customer? Okay. >>So, I mean, there's a, there's a lot in that question. I think the F the first comment though, is don't look at Hadoop through this single data warehouse lens. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse where there's, here's a few decades of experience, there, there are many organizations that have a lot of activities that are based in that data warehouse. And that's where we're seeing a data warehouse offload that is complimentary, but it gives organizations this lever to say, well, I'm going to control the fill rate, and I'm going to take some of the data that's no longer, you know, really active and put that on Hadoop and really change my ability to manage the costs in a data warehouse environment. The other thing that's interesting is that the types of applications that duper doing, I think are creating a new class it's about operations and analytics, kind of combined together, taking high arrival rate data and making very quick micro changes to optimize whether that's fraud detection or recommendation engines, or taking sensor data and predictive analytics for, for maintenance, et cetera. There is just a tremendous number of, of applications. In some cases, leveraging a new data source in some cases, doing new applications, but it's just opening things up. And, and I think organizations are moving to be very data-driven and Hadoop is at the center of that. >>And you control the field, right? That's another really good soundbites. And, and these that, you mentioned this high arrival rate data, this fraud detection, predictive analytics, maintenance, these are things that you're doing today with >>Navarre right? Yeah, >>Absolutely. Great. All right, Jack. Well, listen, always a pleasure. Thanks very much for coming by. Great to see you again. All right. Keep it right there about Uber, right back with our next guest. This is the cube we're live from the big apple.

Published Date : Oct 30 2013

SUMMARY :

I from Midtown Manhattan, the cute quiet coverage of big data NYC So by the way, thank you so much for the We had three myths that you talked about, you know, one related to the distraction. So, you know, th the impression that it's a knock-down drag-out sort of what I call the tapped out of the Hadoop distro, you know, piece of it. And, you know, in the keynote, I talked about the difference between the no SQL market, And I thought, I thought I knew them all, but there were a couple in there that I didn't recognize as you probably knew them all, that's on that same, you know, massive data store, you know, big leg up. So you guys were first to say, And that's, you know, is, is Hadoop ready for prime time? where we can do, you know, some interesting batch analytics and then take that and put that someplace else. And you know, one of the big drivers as you expand Did you agree with his, his, to, you know, to serve different business functions. And, you know, the quotes are the ETL functions that they're doing on that hub are 10 And so, you know, the customers cause a lot of you know, when they're in development and test, it's really important to understand, you know, I wouldn't, I don't think you would either the space being robustness enterprise so dynamic that this isn't about, you know, one company success at the expense those tools can do now, you know, two years ago, it was very much just, this is all very complimentary Hadoop and your EDW. And if you look at, at trying to use Hadoop to completely replace an enterprise data warehouse And you control the field, right? Great to see you again.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Michael Wilson	PERSON	0.99+
10 times	QUANTITY	0.99+
Jack	PERSON	0.99+
Jack Norris	PERSON	0.99+
10 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$270 million	QUANTITY	0.99+
Mike	PERSON	0.99+
yesterday	DATE	0.99+
Dave Volante	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
Hadoop	TITLE	0.99+
Midtown Manhattan	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
Volante	PERSON	0.99+
thousands	QUANTITY	0.99+
first	QUANTITY	0.99+
20 different use cases	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
second	QUANTITY	0.99+
John furrier	PERSON	0.98+
NYC	LOCATION	0.98+
two years ago	DATE	0.98+
Hadoop	ORGANIZATION	0.98+
first comment	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
SQL	TITLE	0.97+
Terre data	ORGANIZATION	0.97+
One	QUANTITY	0.97+
1.7 trillion events	QUANTITY	0.97+
third	QUANTITY	0.97+
today	DATE	0.97+
one	QUANTITY	0.96+
single	QUANTITY	0.96+
a year ago	DATE	0.95+
one company	QUANTITY	0.94+
HBase	TITLE	0.94+
Navarre	PERSON	0.93+
EDW	ORGANIZATION	0.92+
over 2000 nodes	QUANTITY	0.91+
big apple	ORGANIZATION	0.91+
first markets	QUANTITY	0.9+
nodes	QUANTITY	0.89+
about eight months	QUANTITY	0.88+
2013	DATE	0.88+
Soundbite	ORGANIZATION	0.87+
three myths	QUANTITY	0.87+
Hindu	ORGANIZATION	0.87+
first open-source	QUANTITY	0.86+
Wiki bond	ORGANIZATION	0.85+
BigDataNYC	EVENT	0.85+
$15,000 a terabyte	QUANTITY	0.85+
three major	QUANTITY	0.82+
90 billion auctions a day	QUANTITY	0.81+
500 paying customers	QUANTITY	0.79+
comScore	ORGANIZATION	0.79+
map R	ORGANIZATION	0.78+
over a thousand nodes	QUANTITY	0.77+
Hilton	LOCATION	0.77+
few hundred dollars a terabyte	QUANTITY	0.76+
Number two	QUANTITY	0.76+
10 technologies	QUANTITY	0.74+

Scott Howser, Hadapt - MIT Information Quality 2013 - #MIT #CDOIQ #theCUBE

>> wait. >> Okay, We're back. We are in Cambridge, Massachusetts. This is Dave Volante. I'm here with Jeff Kelly. Where with Wicked Bond. This is the Cube Silicon Angles production. We're here at the Mighty Information Quality Symposium in the heart of database design and development. We've had some great guests on Scott Hauser is here. He's the head of marketing at Adapt Company that we've introduced to our community. You know, quite some time ago, Um, really bringing multiple channels into the Duke Duke ecosystem and helping make sense out of all this data bringing insights to this data. Scott, welcome back to the Cube. >> Thanks for having me. It's good to be here. >> So this this notion of data quality, the reason why we asked you to be on here today is because first of all, you're a practitioner. Umm, you've been in the data warehousing world for a long, long time. So you've struggled with this issue? Um, people here today, uh, really from the world of Hey, we've been doing big data for a long time. This whole big data theme is nothing new to us. Sure, but there's a lot knew. Um, and so take us back to your days as a zoo. A data practitioner. Uh, data warehousing, business intelligence. What were some of the data quality issues that you faced and how did you deal with him? So >> I think a couple of points to raise in that area are no. One of things that we like to do is try and triangulate on user to engage them. And every channel we wanted to go and bring into the fold, creating unique dimension of how do we validate that this is the same person, right? Because each channel that you engage with has potentially different requirements of, um, user accreditation or, ah, guarantee of, you know, single user fuel. That's why I think the Holy Grail used to be in a lot of ways, like single sign on our way to triangulate across the spirit systems, one common identity or person to make that world simple. I don't think that's a reality in the in the sense that when you look at, um, a product provider or solution provider and a customer that's external, write those those two worlds Avery spirit and there was a lot of channels and pitch it potentially even third party means that I might want to engage this individual by. And every time I want to bring another one of those channels online, it further complicates. Validating who? That person eighty. >> Okay, so So when you were doing your data warehouse thing again as an I t practitioner, Um, you have you You try to expand the channels, but every time he did that and complex if I hide the data source So how did you deal with that problem? So just create another database and stole five Everything well, >> unfortunately, absolutely creates us this notion of islands of information throughout the enterprise. Because, as you mentioned, you know, we define a schema effectively a new place, Um, data elements into that schema of how you identified how you engage in and how you rate that person's behaviors or engagement, etcetera. And I think what you'd see is, as you'd bring on new sources that timeto actually emerge those things together wasn't in the order of days or weeks. It's on months and years. And so, with every new channel that became interesting, you further complicate the problem and effectively, What you do is, you know, creating these pools of information on you. Take extracts and you try and do something to munch the data and put in a place where you give access to an analyst to say, Okay, here's it. Another, um, Sample said a day to try and figure out of these things. Align and you try and create effectively a new schema that includes all the additional day that we just added. >> So it's interesting because again, one of the themes that we've been hearing a lot of this conference and hear it a lot in many conferences, not the technology. It's the people in process around the technology. That's certainly any person person would agree with that. But at the same time, the technology historically has been problematic, particularly data. Warehouse technology has been challenging you. So you've had toe keep databases relatively small and despair, and you had to build business processes around those that's right a basis. So you've not only got, you know, deficient technology, if you will, no offense, toe data, warehousing friends, but you've got ah, process creep that's actually fair. That's occurred, and >> I think you know what is happening is it's one of the things that's led to sort of the the revolution it's occurring in the market right now about you know, whether it's the new ecosystem or all the tangential technologies around that. Because what what's bound not some technology issues in the past has been the schema right. As important as that is because it gives people a very easy way to interact with the data. It also creates significant challenges when you want to bring on these unique sources of information. Because, you know, as you look at things that have happened over the last decade, the engagement process for either a consumer, a prospect or customer have changed pretty dramatically, and they don't all have the same stringent requirements about providing information to become engaged that way. So I think where the schema has, you know, has value you obviously, in the enterprise, it also has a lot of, um, historical challenges that brings along with >> us. So this jump movement is very disruptive to the traditional market spaces. Many folks say it isn't traditional guy, say, say it isn't but clearly is, particularly as you go Omni Channel. I threw that word out earlier on the channels of discussion that we had a dupe summit myself. John Ferrier, Hobby lobby meta and as your and this is something that you guys are doing that bringing in data to allow your customers to go Omni Channel. As you do that, you start again. Increase the complexity of the corpus of data at the same time. A lot of a lot of times into do you hear about scheme alight ski, but less so how do you reconcile the Omni Channel? The scheme of less It's their scheme alight. And the data quality >> problems, Yes, I think for, you know, particular speaking about adapt one of things that we do is we give customers the ability to take and effectively dump all that data into one common repository that is HD if s and do and leverage some of those open source tools and even their own, you know, inventions, if you will, you know, with m R code pig, whatever, and allow them to effectively normalized data through it orations and to do and then push that into tables effectively that now we can give access to the sequel interface. Right? So I think for us the abilities you're absolutely right. The more channels. You, Khun, give access to write. So this concept of anomie channel where Irrespective of what way we engaged with a customer what way? They touch us in some way. Being able to provide those dimensions of data in one common repository gives the marketeer, if you will, an incredible flexibility and insights that were previous, Who'd be discoverable >> assuming that data qualities this scene >> right of all these So so that that was gonna be my question. So what did the data quality implications of using something like HD FSB. You're essentially scheme unless you're just dumping data and essentially have a raw format and and it's raw format. So now you've gotto reconcile all these different types of data from different sources on build out that kind of single view of a customer of a product, Whatever, whatever is yours. You're right. >> So how do you go >> about doing that in that kind of scenario? So I think the repository in Hindu breach defense himself gives you that one common ground toa workin because you've got, you know, no implications of schema or any other preconceived notions about how you're going toe to toe massage weight if you will, And it's about applying logic and looking for those universal ides. There are a bunch of tools around that are focused on this, but applying those tools and it means that doesn't, um, handy captain from the start by predisposing them to some structure. And you want them to decipher or call out that through whether it's began homegrown type scripts, tools that might be upstairs here and then effectively normalizing the data and moving it into some structure where you can interact with it on in a meaningful way. So that really the kind the old way of trying to bring, you know, snippets of the data from different sources into ah, yet another database where you've got a play structure that takes time, months and years in some cases. And so Duke really allows you to speed up that process significantly by basically eliminating that that part of the equation. Yeah, I think there's and there's a bunch of dimensions we could talk about things like even like pricing exercises, right quality of triangulating on what that pricing should be per product for geography, for engagement, etcetera. I think you see that a lot of those types of work. Let's have transitioned from, you know, mainframe type environments, environments of legacy to the Duke ecosystem. And we've seen cases where people talk about they're going from eight month, you know, exercises to a week. And I think that's where the value of this ecosystem in you know, the commodity scalability really provides you with flexibility. That was just previously you unachievable. >> So could you provide some examples either >> you know, your own from your own career or from some customers you're seeing in terms of the data quality implications of the type of work they're doing. So one of our kind of *** is that you know the data quality measures required for any given, uh, use case various, in some cases, depending on the type of case. You know, in depending on the speed that you need, the analysis done, uh, the type of data quality or the level data qualities going is going to marry. Are you seeing that? And if >> so, can you give some examples of the different >> types of way data quality Gonna manifest itself in a big data were close. Sure. So I think that's absolutely fair. And you know. Obviously there's there's gonna be some trade off between accuracy and performance, right? And so you have to create some sort of confidence coefficient part, if you will, that you know, within some degree of probability this is good enough, right? And there's got to be some sort of balance between that actor Jerseyan time Um, some of the things that you know I've seen a lot of customers being interested in is it is a sort of market emerging around providing tools for authenticity of engagement. So it's an example. You know, I may be a large brand, and I have very, um, open channels that I engage somebody with my B e mail might be some Web portal, etcetera, and there's a lot of fishing that goes on out there, right? And so people fishing for whether it's brands and misrepresenting themselves etcetera. And there's a lot of, you know, desire to try and triangulate on data quality of who is effectively positioned themselves as me, who's really not me and being able to sort of, you know, take a cybersecurity spin and started to block those things down and alleviate those sort of nefarious activities. So We've seen a lot of people using our tool to effectively understand and be able to pinpoint those activities based upon behavior's based upon, um, out liars and looking at examples of where the engagement's coming from that aren't authentic if that >> makes you feel any somewhat nebulous but right. So using >> analytics essentially to determine the authenticity of a person of intensity, of an engagement rather than taking more rather than kind of looking at the data itself using pattern detection to determine. But it also taking, you know, there's a bunch of, um, there's a bunch of raw data that exists out there that needs you when you put it together again. Back to this notion of this sort of, you know, landing zone, if you will, or Data Lake or whatever you wanna call it. You know, putting all of this this data into one repository where now I can start to do you know, analytics against it without any sort of pre determined schema. And start to understand, you know, are these people who are purporting to be, you know, firm X y Z are there really from X y Z? And if they're not, where these things originating and how, when we start to put filters or things in place to alleviate those sort of and that could apply, it sounds like to certainly private industry. But, I mean, >> it sounds like >> something you know, government would be very interested in terms ofthe, you know, in the news about different foreign countries potentially being the source of attacks on U. S. Corporations are part of the, uh, part of our infrastructure and trying to determine where that's coming from and who these people are. And >> of course, people were trying to get >> complicated because they're trying to cover up their tracks, right? Certainly. But I think that the most important thing in this context is it's not necessarily about being able to look at it after the fact, but it's being able to look at a set of conditions that occur before these things happen and identify those conditions and put controls in place to alleviate the action from taking place. I think that's where when you look at what is happening from now an acceleration of these models and from an acceleration of the quality of the data gathering being able to put those things into place and put effective controls in place beforehand is changing. You know the loss prevention side of the business and in this one example. But you're absolutely right. From from what I see and from what our customers were doing, it is, you know, it's multi dimensional in that you know this cyber security. That's one example. There's pricing that could be another example. There's engagements from, ah, final analysis or conversion ratio that could be yet another example. So I think you're right in it and that it is ubiquitous. >> So when you think about the historical role of the well historical we had Stewart on earlier, he was saying, the first known chief data officer we could find was two thousand three. So I guess that gives us a decade of history. But if you look back at the hole, I mean data quality. We've been talking about that for many, many decades. So if you think about the traditional or role of an organization, trying tio achieved data quality, single version of the truth, information, quality, information value and you inject it with this destruction of a dupe that to me anyway, that whole notion of data quality is changing because in certain use, cases inference just fine. Um, in false positives are great. Who cares? That's right. Now analyzing Twitter data from some cases and others like healthcare and financial services. It's it's critical. But so how do you see the notion of data quality evolving and adapting to this >> new world? Well, I think one of these you mentioned about this, you know, this single version of the truth was something that was, you know, when I was on the other side of the table, >> they were beating you over the head waken Do this, We >> can do this, and it's It's something that it sounds great on paper. But when you look at the practical implications of trying to do it in a very finite or stringent controlled way, it's not practical for the business >> because you're saying that the portions of your data that you can give a single version of the truth on our so small because of the elapsed time That's right. I think there's that >> dimension. But there's also this element of time, right and the time that it takes to define something that could be that rigid and the structure months. It's months, and by that time a lot of the innovations that business is trying to >> accomplish. The eyes have changed. The initiatives has changed. Yeah, you lost the sale. Hey, but we got the data. It would look here. Yeah, I think that's your >> right. And I think that's what's evolving. I think there's this idea that you know what Let's fail fast and let's do a lot of it. Orations and the flexibility it's being provided out in that ecosystem today gives people an opportunity. Teo iterated failed fast, and you write that you set some sort of, you know confidence in that for this particular application. We're happy with you in a percent confidence. Go fish. You are something a little >> bit, but it's good enough. So having said that now, what can we learn from the traditional date? A quality, you know, chief data officer, practitioners, those who've been very dogmatic, particularly in certain it is what can we learn from them and take into this >> new war? I think from my point of view on what my experience has always been is that those individuals have an unparalleled command of the business and have an appreciation for the end goal that the business is trying to accomplish. And it's taking that instinct that knowledge and applying that to the emergence of what's happening in the technology world and bringing those two things together. I think it's It's not so much as you know, there's a practical application in that sense of Okay, here's the technology options that we have to do these, you know, these desired you engaged father again. It's the pricing engagement, the cyber security or whatever. It's more. How could we accelerate what the business is trying to accomplish and applying this? You know, this technology that's out there to the business problem. I think in a lot of ways, you know, in the past it's always been here. But this really need technology. How can I make it that somewhere? And now I think those folks bring a lot of relevance to the technology to say Hey, here's a problem. Trying to solve legacy methodologies haven't been effective. Haven't been timely. Haven't been, uh, scaleable. Whatever hock me. Apply what's happening. The market today to these problems. >> Um, you guys adapt in particular to me any way a good signal of the maturity model and with the maturity of a dupe, it's It's starting to grow up pretty rapidly, you know, See, due to two auto. And so where are we had? What do you see is the progression, Um, and where we're going. >> So, you know, I mentioned it it on the cue for the last time it So it and I said, I believe that you know who do busy operating system of big data. And I believe that, you know, there's a huge transition taking place that was there were some interesting response to that on Twitter and all the other channels, but I stand behind that. I think that's really what's happening. Lookit. You know what people are engaging us to do is really start to transition away from the legacy methodologies and they're looking at. He's not just lower cost alternatives, but also more flexibility. And we talked about, you know, its summit. The notion of that revenue curve right and cost takeouts great on one side of the coin, and I are one side of the defense here. But I think equally and even more importantly, is the change in the revenue curve and the insights that people they're finding because of these unique channels of the Omni Challenge you describe being able to. So look at all these dimensions have dated one. Unified place is really changing the way that they could go to market. They could engage consumers on DH that they could provide access to the analyst. Yeah. I mean, ultimately, that's the most >> we had. Stewart Madness con who's maybe got written textbooks on operating systems. We probably use them. I know I did. Maybe they were gone by the time you got there, but young, but the point being, you know, a dupe azan operating system. The notion of a platform is really it's changing dramatically. So, um, I think you're right on that. Okay. So what's what's next for you guys? Uh, we talked about, you know, customer attraction and proof points. You're working. All right on that. I know. Um, you guys got a great tech, amazing team. Um, what's next for >> you? So I think it's it's continuing toe. Look at the market in being flexible with the market around as the Hughes case is developed. So, you know, obviously is a startup We're focused in a couple of key areas where we see a lot of early adoption and a lot of pain around the problem that we can solve. But I think it's really about continuing to develop those use cases, um, and expanded the market to become more of a, you know, a holistic provider of Angelique Solutions on top of a >> house. Uh, how's Cambridge working out for you, right? I mean, the company moved up from the founders, moved up from New Haven and chose shows the East Coast shows cameras were obviously really happy about. That is East Coast people. You don't live there full time, but I might as well. So how's that working out talent pool? You know, the vibrancy of the community, the the you know, the young people that you're able to tap. So >> I see there's a bunch of dimensions around that one. It's hot. It's really, really hot >> in human, Yes, but it's been actually >> fantastic. And if you look it not just a town inside the team, but I think around the team. So if you look at our board right Jet Saxena. Chris Lynch, I've been very successful. The database community over decades of experience, you know, and getting folks like that onto the board fell. The Hardiman has been, you know, in this space as well for a long time. Having folks like that is, you know, advisors and providing guidance to the team. Absolutely incredible. Hack Reduce is a great facility where we do things like hackathons meet ups get the community together. So I think there's been a lot of positive inertia around the company just being here in Cambridge. But, you know, from AA development of resource or recruiting one of you. It's also been great because you've got some really exceptional database companies in this area, and history will show you like there's been a lot of success here, not only an incubating technology, but building real database companies. And, you know, we're on start up on the block that people are very interested in, and I think we show a lot of, you know, dynamics that are changing in the market and the way the markets moving. So the ability for us to recruit talent is exceptional, right? We've got a lot of great people to pick from. We've had a lot of people joined from no other previously very successful database companies. The team's growing, you know, significantly in the engineering space right now. Um, but I just you know, I can't say enough good things about the community. Hack, reduce and all the resource is that we get access to because we're here in Cambridge. >> Is the hacker deuces cool? So you guys are obviously leveraging that you do how to bring people into the Sohag produces essentially this. It's not an incubator. It's really more of a an idea cloud. It's a resource cloud really started by Fred Lan and Chris Lynch on DH. Essentially, people come in, they share ideas. You guys I know have hosted a number of how twos and and it's basically open. You know, we've done some stuff there. It's it's very cool. >> Yeah, you know, I think you know, it's even for us. It's also a great place to recruit, right. We made a lot of talented people there, and you know what? The university participation as well We get a lot of talent coming in, participate in these activities, and we do things that aren't just adapt related, that we've had people teach had obsessions and just sort of evangelize what's happening in the ecosystem around us. And like I said, it's just it's been a great resource pool to engage with. And, uh, I think it's been is beneficial to the community, as it has been to us. So very grateful for that. >> All right. Scott has always awesome. See, I knew you were going to have some good practitioner perspectives on data. Qualities really appreciate you stopping by. My pleasure. Thanks for having to see you. Take care. I keep right to everybody right back with our next guest. This is Dave a lot. They would. Jeff Kelly, this is the Cube. We're live here at the MIT Information Quality Symposium. We'LL be right back.

Published Date : Jul 17 2013

SUMMARY :

the Duke Duke ecosystem and helping make sense out of all this data bringing insights to It's good to be here. So this this notion of data quality, the reason why we asked you to be on here today is because first of all, I don't think that's a reality in the in the sense that when you look at, um, that became interesting, you further complicate the problem and effectively, What you do is, databases relatively small and despair, and you had to build business processes around those it's occurring in the market right now about you know, whether it's the new ecosystem or all the A lot of a lot of times into do you hear about scheme alight ski, but less so problems, Yes, I think for, you know, particular speaking about adapt one of things that we do is we So what did the data quality implications of using And I think that's where the value of this ecosystem in you know, the commodity scalability So one of our kind of *** is that you know the data quality that you know, within some degree of probability this is good enough, right? makes you feel any somewhat nebulous but right. And start to understand, you know, are these people who are purporting something you know, government would be very interested in terms ofthe, you know, in the news about different customers were doing, it is, you know, it's multi dimensional in that you know this cyber security. So if you think about the traditional or But when you look at the practical of the truth on our so small because of the elapsed time That's right. could be that rigid and the structure months. Yeah, you lost the sale. I think there's this idea that you know what Let's fail fast and A quality, you know, chief data officer, practitioners, those who've been very dogmatic, here's the technology options that we have to do these, you know, these desired you engaged you know, See, due to two auto. And I believe that, you know, there's a huge transition taking place Uh, we talked about, you know, customer attraction and proof points. um, and expanded the market to become more of a, you know, a holistic provider the the you know, the young people that you're able to tap. I see there's a bunch of dimensions around that one. on the block that people are very interested in, and I think we show a lot of, you know, dynamics that are changing in So you guys are obviously leveraging that you do how to bring people into the Sohag Yeah, you know, I think you know, it's even for us. Qualities really appreciate you stopping by.

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Scott	PERSON	0.99+
Omni Channel	ORGANIZATION	0.99+
Chris Lynch	PERSON	0.99+
Scott Howser	PERSON	0.99+
Dave Volante	PERSON	0.99+
Cambridge	LOCATION	0.99+
five	QUANTITY	0.99+
eight month	QUANTITY	0.99+
today	DATE	0.99+
Angelique Solutions	ORGANIZATION	0.99+
Dave	PERSON	0.99+
John Ferrier	PERSON	0.99+
first	QUANTITY	0.99+
Fred Lan	PERSON	0.99+
Scott Hauser	PERSON	0.99+
Sohag	ORGANIZATION	0.99+
New Haven	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
two thousand	QUANTITY	0.99+
two things	QUANTITY	0.99+
Stewart	PERSON	0.99+
eighty	QUANTITY	0.99+
one	QUANTITY	0.99+
one example	QUANTITY	0.98+
each channel	QUANTITY	0.98+
one side	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
2013	DATE	0.97+
Hughes	PERSON	0.97+
a week	QUANTITY	0.96+
two	QUANTITY	0.96+
one repository	QUANTITY	0.96+
#CDOIQ	ORGANIZATION	0.96+
East Coast	LOCATION	0.96+
two worlds	QUANTITY	0.95+
a decade	QUANTITY	0.94+
one common repository	QUANTITY	0.93+
Hack Reduce	ORGANIZATION	0.92+
#MIT	ORGANIZATION	0.91+
one common repository	QUANTITY	0.91+
Wicked Bond	ORGANIZATION	0.91+
Cube	ORGANIZATION	0.91+
one common	QUANTITY	0.89+
MIT Information Quality	EVENT	0.89+
Mighty Information Quality Symposium	EVENT	0.88+
Khun	PERSON	0.87+
MIT Information Quality	ORGANIZATION	0.86+
single version	QUANTITY	0.86+
a day	QUANTITY	0.85+
twos	QUANTITY	0.85+
Teo	PERSON	0.85+
Sample	PERSON	0.82+
Duke Duke	ORGANIZATION	0.81+
one side of	QUANTITY	0.8+
single sign	QUANTITY	0.8+
Duke	ORGANIZATION	0.76+
Jet Saxena	PERSON	0.75+
Hobby	ORGANIZATION	0.75+
last decade	DATE	0.74+
Data Lake	LOCATION	0.72+
themes	QUANTITY	0.7+
Adapt Company	ORGANIZATION	0.65+
Cube Silicon Angles	ORGANIZATION	0.62+
Hindu	OTHER	0.61+
Duke	LOCATION	0.6+
Hadapt	ORGANIZATION	0.58+
Hardiman	PERSON	0.57+
three	QUANTITY	0.52+
Symposium	ORGANIZATION	0.51+
points	QUANTITY	0.5+
#theCUBE	ORGANIZATION	0.49+
Stewart Madness	PERSON	0.49+
U. S.	ORGANIZATION	0.48+
couple	QUANTITY	0.47+

Jack Norris - Hadoop Summit 2013 - theCUBE - #HadoopSummit

>>Ash it's, you know, what will that mean to my investment? And the announcement fusion IO is that, you know, we're 25 times faster on read intensive HBase applications. The combination. So as organizations are deploying Hadoop, and they're looking at technology changes coming down the pike, they can rest assured that they'll be able to take advantage of those in a much more aggressive fashion with map R than, than other distribution. >>Jack, how I got to ask you, we were talking last night at the Hadoop summit, kind of the kickoff party and, you know, everyone was there. All the top execs were there and all the developers, you know, we were in the queue. I think, I think that either Dave or myself coined the term, the big three of big data, you guys ROMs cloud Cloudera map R and Hortonworks, really at the, at the beginning of the key players early on and Charles from Cloudera was just recently on. And, and he's like, oh no, this, this enterprise grade stuff has been kicked around. It's been there from the beginning. You guys have been there from the beginning and Matt BARR has never, ever waffled on your, on your messaging. You've always been very clear. Hey, we're going to take a dupe open source a dupe and turn it into an enterprise grade product. Right. So that's clear, right? That's, that's, that's a great, that's a great, so what's your take on this because now enterprise grade is kind of there, I guess, the buzz around getting the, like the folks that have crossed the chasm implemented. So what can you comment on that about one enterprise grade, the reality of it, certainly from your perspective, you haven't been any but others. And then those folks that are now rolling it out for the first time, what can you share with them around? What does it mean to be enterprise grade? >>So enterprise grade is more about the customer experience than, than a marketing claim. And, you know, by enterprise grade, what we're talking about are some of the capabilities and features that they've grown to expect in their, their other enterprise applications. So, you know, the ability to meet full S SLA is full ha recovery from multiple failures, rolling upgrades, data protection was consistent snapshots business continuity with mirroring the ability to share a cluster across multiple groups and have, you know, volumes. I mean, there's a, there's a host of features that fall under the umbrella enterprise grade. And when you move from no support for any of those features to support to a few of them, I don't think that's going to, to ha it's more like moving to low availability. And, and there's just a lot of differences in terms of when we say enterprise grade with those features mean versus w what we view as kind of an incomplete story. So >>What do you, what do you mean by low availability? Well, I mean, it's tongue in cheek. It's nice. It's a good term. It's really saying, you know, just available when you sometimes is that what you mean? Is this not true availability? I mean, availability is 99.9%. Right? >>Right. So if you've got a, an ha solution that can't recover from multiple failures, that's downtime. If you've got an HBase application that's running online and you have data that goes down and it takes 10 to 30 minutes to have the region servers recover it from another place in the distribution, that's downtime. If you have snapshots that aren't consistent across the cluster, that doesn't provide data protection, there's no point in time recovery for, for a cluster. So, you know, there's a lot of details underneath that, but what it, what it amounts to is, do you have interruptions? Do you have downtime? Do you have the potential for losing data? And our answer is you need a series of features that are hardened and proven to deliver that. >>What about recoverability? You mentioned that you guys have done a lot of work in that area with snapshotting, that's kind of being kicked around, are our folks addressing, what are the comp what's your competition doing in those areas of recoverability just mentioned availability. Okay, got that. Recoverability security, compliance, and usability. Those are the areas that seem to be the hot focus areas what's going on in the energy. How would you give them the grade, the letter grade, if you will, candidly, compared to what you guys offer? Well, the, >>The first of all, it's take recoverability. You know, one of the tenants is you have a point in time recovery, the ability to restore to a previous point that's consistent across the cluster. And right now there's, there's no point in time recovery for, for HDFS, for the files. And there's no point in time recovery for HBase tables. So there's snapshot support. It's being talked about in the open source community with respect to snapshots, but it's being referred to in the JIRAs as fuzzy snapshots and really compared to copy table. >>So, Jack, I want to turn the conversation to the, kind of the topic we've talked about before kind of the open versus a proprietary that, that whole debate we've, we've, we've heard about that. We talked about that before here on the cube. So just kind of reiterate for us your take. I mean, we, we hear perhaps because of the show we're at, there's a lot of talk about the open source nature of Hadoop and some of the purists, as you might call them are saying, it's gotta be open a hundred percent Patrick compatible, et cetera. And then there's others that are taking a different approach, explain your approach and why you think that's the key way to make, to really spur adoption of a dupe and make it >>W w we're we're a part of the community we're, we've got, you know, commitment going on. We've, you know, pioneered and pushed a patchy drill, but we have done innovations as well. And I think that those innovations are really required to support and extend the, the whole ecosystem. So canonical distributes RN, three D distribution. We've got, you know, all our, our packages are, are available on get hub and, and open source. So it's not, it's not a binary debate. And I think the, the point being that there's companies that have jumped ahead and now that Peloton is, is, you know, pedaling faster and, and we'll, we'll catch up. We'll streamline. I think the difference is we rearchitected. So we're basically in a race car and, you know, are, are racing ahead with, with enterprise grade features that are required. And there's a lot of work that still needs to be done, needs to be accomplished before that full rearchitecture is, is in place. >>Well, I mean, I think for me, the proof is really in the pudding when you, when it comes to talk about customers that are doing real things and real production, grade mission, critical applications that they're running. And to me that shows the successor or relative success of a given approach. So I know you guys are working with companies like ancestry.com, live nation and Quicken loans. Maybe you could, could you walk us through a couple of those scenarios? Let's take ancestry.com. Obviously they've got a huge amount of data based on the kind of geological information, where do you guys do >>With them? Yeah, so they've got, I mean, they've got the world's largest family genealogy services available on the web. So there's a massive amount of data that they make accessible and, and, you know, ability for, for analysis. And then they've rolled out new features and new applications. One of which is to ship a kit out, have people spit in a tube, returned back and they do DNA matching and reveal additional details. So really some really fabulous leading edge things that are being done with, with the use of, of Hadoop. >>Interesting. So talk about when you went to, to work with them, what were some of their key requirements? Was it around, it was more around the enterprise enterprise, grade security and uptime kind of equation, or was it more around some of the analytics? What, what, what's the kind of the killer use case for them? >>It's kind of, you know, it's, it's hard with a specific company or even, you know, to generalize across companies. Cause they're really three main areas in terms of ease of use and administration dependability, which includes the full ha and then, and then performance. And in some cases, it's, it's just one of those that kind of drives it. And it's used to justify, in other cases, it's kind of a collection. The ease of use is being able to use a cluster, not only as Hadoop, but to access it and treat it like enterprise storage. So it's a complete POSIX compliance file system underneath that allows the, the mounting and access and updates and using it in dynamic read-write. So what that means from an application level, it's, it's faster, it's much easier to administer and it's much easier and reliable for developers to, to utilize. >>I got to ask you about the marketing question cause I see, you know, map our, you guys have done a good job of marketing. Certainly we want to be thankful to you guys is supporting the cube in the past and you guys have been great supporters of our mission, but now the ecosystem's evolving a lot more competition. Claudia mentioned those eight companies they're tracking in quote Hadoop, and certainly Jeff and I, and, and SiliconANGLE by look at there's a lot more because Hadoop washing has been going on now for the term Hadoop watching me and jumping in and doing Hadoop, slapping that onto an existing solution. It's not been happening full, full, full bore for a year. At least what's the next for you guys to break above the noise? Obviously the communities are very active projects are coming online. You guys have your mission in the enterprise. What's the strategy for you guys going forward is more of the same and anything new even share. >>Yeah, I, I, I think as far as breaking above the noise, it will be our customers, their success and their use cases that really put the spotlight on what the differences are in terms of, of, you know, using a big data platform. And I think what, what companies will start to realize is I'd rather analogy between supply chain and the big, the big revolution in supply chain was focusing on inventory at each stage in the supply chain. And how do you reduce that inventory level and how do you speed the, the flow of goods and the agility of a company for competitive advantage. And I think we're going to view data the same way. So companies instead of raw data that they're copying and moving across different silos, if they're able to process data in place and send small results sets, they're going to be faster, more agile and more competitive. >>And that puts the spotlight on what data platform is out there that can support a broad set of applications and it can have the broadest set of functionality. So, you know, what we're delivering is a mission grade, you know, enterprise grade mission, critical support platform that supports MapReduce and does that high performance provides NFS POSIX access. So you can use it like a file system integrates, you know, enterprise grade, no SQL applications. So now you can do, you know, high-speed consistent performance, real time operations in addition to batch streaming, integrated search, et cetera. So it's, it's really exciting to provide that platform and have organizations transform what they're doing. >>How's the feedback on with Ted Dunning? I haven't seen a lot of buzz on the Twittersphere is getting positive feedback here. He's a, a tech athlete. He's a guru, he's an expert. He's got his hands in all the pies. He's a scientist type. What's he up to? What's his, what's his role within Mapa and he's obviously playing in the open-source community. What's he up to these days, >>Chief application architect, he's on the leading edge of my house. So machine learning, so, you know, sharing insights there, he was speaking at the storm meetup two nights ago and sharing how you can integrate long running batch, predictive analytics with real-time streaming and how the use of snapshots really that, that easy and possible. He travels the world and is helping organizations understand how they can take some very complex, long running processes and really simplify and shorten those >>Chance to meet him in New York city had last had duke world at a, at a, a party and great guy, fantastic geek, and certainly is doing a great work and shout out to Ted. Congratulations, continue up that support. How's everyone else doing? How's John and Treevis doing how's the team at map are we're pedaling as best as you can growing >>Really quickly. No, we're just shifting gears. Would it be on pedaling >>Engine? >>Yeah. Give us an update on the company in terms of how the growth and kind of where you guys are moving that. >>Yeah. We're, we're expanding worldwide, you know, just this, you know, last few months we've opened up offices and in London and Munich and Paris, we're expanding in Asia, Japan and Korea. So w our, our sales and services and engineering, and basically across the whole company continues to expand rapidly. Some really great, interesting partnerships and, and a lot of growth Natalie's we add customers, but it's, it's nice to see customers that continue to really grow their use of map are within their organization, both in terms of amount of data that they're analyzing and the number of applications that they're bringing to bear on the platform. >>Well, that a little bit, because I think, you know, one of the, one of the trends we do see is when a company brings in big data, big data platform, and they might start experiment experimenting with it, build an application. And then maybe in the, maybe in the marketing department, then the sales guys see it and they say, well, maybe we can do something with that. How is that typically the kind of the experience you're seeing and how do you support companies that want to start expanding beyond those initial use cases to support other departments, potentially even other physical locations around the world? How do you, how do you kind of, >>That's been the beauty of that is if you have a platform that can support those new applications. So if you know, mission critical workloads are not an issue, if you support volumes so that you can logically separate makes it much easier, which we have. So one of our customers Zions bank, they brought in Matt BARR to do fraud detection. And pretty soon the fact that they were able to collect all of that data, they had other departments coming to them and saying, Hey, we'd like to use that to do analysis on because we're not getting that data from our existing system. >>Yeah. They come in and you're sitting on a goldmine, there are use cases. And you also mentioned kind of, as you're expanding internationally, what's your take on the international market for big data to do specifically is, is the U S kind of a leaps and bounds ahead of the rest of the world in terms of adoption of the technology. What are you seeing out there in terms of where, where the rest of the, >>I wouldn't say leaps and bounds, and I think internationally, they're able to maybe skip some of the experimental steps. So we're seeing, we're seeing deployment of class financial services and telecom, and it's, it's fairly broad recruit technologies there. The largest provider of recruiting services, indeed.com is one of their subsidiaries they're doing a lot with, with Hadoop and map are specifically, so it's, it's, it's been, it's been expanding rapidly. Fantastic. >>I also, you know, when you think about Europe, what's going on with Google and some of the, the privacy concerns even here, or I should say, is there, are there different regulatory environments you've got to navigate when you're talking about data and how you use data when you're starting to expand to other, other locales? >>Yeah. There's typically by vertical, there's different, different requirements, HIPAA and healthcare, and basal to, and financial services. And so all of those, and it, it, it basically, it's the same theme of when you're bringing Hadoop into an organization and into a data center, the same sorts of concerns and requirements and privacy that you're applying in other areas will be applied on Hindu. >>I'm now kind of turning back to the technology. You mentioned Apache drill. I'd love to get an update on kind of where, where that stands. You know, it's put, then put that into context for people. We hear a lot about the SQL and Hadoop question here, where does drill fit into that, into that equation? >>Well, the, the, you know, there's a lot of different approaches to provide SQL access. A lot of that is driven by how do you, how do you leverage some of the talent and organization that, you know, speak SQL? So there's developments with respect to hive, you know, there's other projects out there. Apache drill is an open source project, getting a lot of community involvement. And the design center there is pretty interesting. It started from the beginning as an open source project. And two main differences. One was in looking at supporting SQL it's, let's do full ANSI SQL. So it's full 2003 ANSI, sequel, not a SQL like, and that'll support the greatest number of applications and, you know, avoid a lot of support and, and issues. And the second design center is let's support a broad set of data sources. So nested sources like Jason scheme on discovery, and basically fitting it into an enterprise environment, which sometimes is kinda messy and can get messy as acquisitions happen, et cetera. So it's complimentary, it's about, you know, enabling interactive, low latency queries. >>Jack, I want to give you the final word. We are out of time. Thanks for coming on the cube. Really preached. Great to see you again, keep alumni, but final word. And we'll end the segment here on the cube is your quick thoughts on what's happening here at Hadoop world. What is this show about? Share with the audience? What's the vibe, the summary quick soundbite on Hadoop. >>I think I'll go back to how we started. It's not, if you used to do putz, how you use to do and, you know, look at not only the first application, but what it's going to look like in multiple applications and pay attention to what enterprise grade means. >>Okay. They were secure. We got a more coverage coming, Jack Norris with map R I'll say one of the big three original, big three, still on the, on the list in our mind, and the market's mind with a unique approach to Hadoop and the mid-June great. This is the cube I'm Jennifer with Jeff Kelly. We'll be right back after this short break, >>Let's settle the PR program out there and fighting gap tech news right there. Plenty of the attack was that providing a new gadget. Let's talk about the latest game name, but just the.

Published Date : Jun 27 2013

SUMMARY :

IO is that, you know, we're 25 times faster on read intensive HBase applications. All the top execs were there and all the developers, you know, So, you know, the ability to meet full S SLA is full ha It's really saying, you know, just available when So, you know, there's a lot of details compared to what you guys offer? You know, one of the tenants is you have a point of Hadoop and some of the purists, as you might call them are saying, it's gotta be open a hundred percent that Peloton is, is, you know, pedaling faster and, and we'll, we'll catch up. So I know you guys are working with companies like ancestry.com, live nation and Quicken that they make accessible and, and, you know, ability for, So talk about when you went to, to work with them, what were some of their key requirements? It's kind of, you know, it's, it's hard with a specific company or even, I got to ask you about the marketing question cause I see, you know, map our, you guys have done a good job of marketing. And how do you reduce that inventory level and how do you speed the, you know, what we're delivering is a mission grade, you know, enterprise grade mission, How's the feedback on with Ted Dunning? so, you know, sharing insights there, he was speaking at the storm meetup How's John and Treevis doing how's the team at map are we're pedaling as best as you can No, we're just shifting gears. and basically across the whole company continues to expand rapidly. Well, that a little bit, because I think, you know, one of the, one of the trends we do see is when a company brings in big data, That's been the beauty of that is if you have a platform that can support those And you also mentioned kind of, they're able to maybe skip some of the experimental steps. and it, it, it basically, it's the same theme of when you're bringing Hadoop into We hear a lot about the SQL and Hadoop question support the greatest number of applications and, you know, avoid a lot of support and, Great to see you again, you know, look at not only the first application, but what it's going to look like in multiple This is the cube I'm Jennifer with Jeff Kelly. Plenty of the attack was that providing a new gadget.

ENTITIES

Entity	Category	Confidence
Ted	PERSON	0.99+
London	LOCATION	0.99+
Claudia	PERSON	0.99+
Jeff Kelly	PERSON	0.99+
Asia	LOCATION	0.99+
Ted Dunning	PERSON	0.99+
Jack Norris	PERSON	0.99+
Dave	PERSON	0.99+
John	PERSON	0.99+
Jack	PERSON	0.99+
10	QUANTITY	0.99+
Paris	LOCATION	0.99+
Korea	LOCATION	0.99+
Matt BARR	PERSON	0.99+
Munich	LOCATION	0.99+
New York	LOCATION	0.99+
99.9%	QUANTITY	0.99+
Jennifer	PERSON	0.99+
Treevis	PERSON	0.99+
25 times	QUANTITY	0.99+
Japan	LOCATION	0.99+
Google	ORGANIZATION	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
Jeff	PERSON	0.99+
eight companies	QUANTITY	0.99+
first time	QUANTITY	0.99+
mid-June	DATE	0.99+
Charles	PERSON	0.98+
Europe	LOCATION	0.98+
30 minutes	QUANTITY	0.98+
One	QUANTITY	0.98+
first application	QUANTITY	0.98+
Ash	PERSON	0.98+
two nights ago	DATE	0.98+
Hortonworks	ORGANIZATION	0.98+
each stage	QUANTITY	0.97+
SQL	TITLE	0.97+
SiliconANGLE	ORGANIZATION	0.97+
Natalie	PERSON	0.97+
ancestry.com	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Patrick	PERSON	0.96+
last night	DATE	0.95+
Jason	PERSON	0.95+
2003	DATE	0.95+
Hadoop	EVENT	0.94+
Apache	ORGANIZATION	0.94+
Hadoop	PERSON	0.93+
indeed.com	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.92+
HBase	TITLE	0.92+
Hadoop Summit 2013	EVENT	0.92+
Quicken loans	ORGANIZATION	0.92+
two main differences	QUANTITY	0.89+
HIPAA	TITLE	0.89+
#HadoopSummit	EVENT	0.89+
S SLA	TITLE	0.89+
Hadoop	ORGANIZATION	0.88+
Cloudera	ORGANIZATION	0.85+
map R	TITLE	0.85+
a year	QUANTITY	0.83+
Zions bank	ORGANIZATION	0.83+
Peloton	LOCATION	0.78+
NFS	TITLE	0.78+
MapReduce	TITLE	0.77+
Cloudera map R	ORGANIZATION	0.75+
live	ORGANIZATION	0.74+
second design center	QUANTITY	0.73+
Hindu	ORGANIZATION	0.7+
theCUBE	ORGANIZATION	0.7+
three main areas	QUANTITY	0.68+
one enterprise grade	QUANTITY	0.65+

Jack Norris | Strata Data Conference 2013

>>Okay. We're back here inside the cube, our flagship program about the events and extract the signal from the noise. This is strata conference. O'Reilly media is a big data event. We're talking about Hadoop analytics, data platforms, and big is come into the enterprise from the front door. As we heard them yesterday. I'm John Frey with Dave Volante, wiki.org. And we're here with Jack Norris, our cube alumni, and a favorite guest here. You're a in charge executive at map. Our, you guys are leading the charge with this use of a dupe. Welcome back to the cube. Thank you. Okay, so what's, let's chat about what's going on. What's your take on all the big news out here for the distributions. I'll the big power moose. You guys have a relationship with EMC. Okay. Exclusive relationship with those guys. Intel's got a distribution Horton versus with Microsoft, a lot of things going on. So this is your wheelhouse. So what's your take on the Hadoop action here? >>Well, I think there's an article in Forbes where I think they, they said it best. This is showing that map bars had the right strategy all along. And what we're seeing is, is basically there's a fairly low bar to taking a patchy Hadoop and providing a distribution. And so we're seeing a lot of new entrance in the market and there's, there's a lot of options. If you want to try Hadoop and experiment and get started. And then there's production class Hadoop, which includes enterprise data protection, snapshots mirrors, ability to integrate. And that's basically map R so start and test and dev with, with a lot of options and then move into production, class >>Mapbox. So break it down for the folks out there who are tipping the toe in the water and hearing all the noise. Cause it's right now, the noise level is very high, right? With the, with the recent announcements. But you guys have been doing business obviously for many years in this area. So when people say, Hey, I want to get a Hadoop distribution with enterprise. What, what should they be looking for? Okay. Because it's not that easy to kind of swing through the noise. So could you share with the folks out there, what, what to look for in like the, the table stakes, the check boxes? Cause there's a lot of claims. There's a lot of noise is this. And that is a lot of different options. Some teams have more committers or no committers than others, so that's all noise, but let's what are the key things that customers need to know? So I think there's, miling, >>There's three areas. All right. One is kind of how it integrates into your enterprise. And with Hadoop, you have the Hadoop distributed file system API. That's how you interact. Well, if you're able to also use standard tools that can use standard file and database access, it makes it much, much easier. So map ours unique and supporting NFS and making that happen. That's a, that's a big difference. The second is on dependability and there's high availability capabilities and then there's data protection. So I'll focus on snapshots as an example, you've got data replicated and Hindu. That's great. But if you have a user error, an application error, that's replicated just as quickly. So having the ability to recover and double-edged in time. Yeah. So if I can say, Hey, I made a mistake. Can I go back two minutes earlier with snapshots that makes it possible map ours, unique and snapshot support. And then finally, there's there's disaster recovery mirroring where you can go across clusters, mirror, what's going on across the land and being able to recover in the case of a disaster where you lose a whole cluster or use a whole >>Section and that's not available in >>Other, those aren't available either. That's >>NFS, >>Snapshots has been on the JIRA list for over five years. >>Yeah. Okay. So I wonder >>If I could find that and then there's third. Cause I said three and almost said two, the third is performance and scale and, but >>That'd be for >>Integration, dependability and speed. >>Okay. So dependability Jr's part of the VR snapshots. MDR. Okay. So let's talk about the performance because you guys had asked a Google's a big partner of you guys. So we should, we just had them on the cube strata. So you have to have a record setting. Do you have a record setting? EMC take that. Well, you work with DMC. So let me talk about the performance real quick. Then we'll talk about some of the EMC conversations, but performance, you have a variety of diverse performance benchmarks, Google you have within the enterprise. Can you talk about those? >>So, so what we announced this week was the minute sort world record. So minutes or runs across technologies is just, how can you, you know, how much data can you sort in 60 seconds? And if you look back at, at the previous record that was done in the labs with Microsoft with special purpose software, and they did 1.4 terabytes Hadoop hasn't been used since 2009, it's been several years because it's got features in there that work against performance. Things like checkpointing and logging because it assumes you've got long running MapReduce jobs. So we set the record with our distribution of Hadoop. So we have kind of one hand tied behind our back, given that technology. Secondly, we sent it in the cloud, which is the other hand tied behind our back because it's a, it's a virtualized environment. So we set the record with just with your legs And a 1.5 terabytes in 60 seconds. Very proud of that. >>Well, that's interesting because we've been doing a lot of labs testing, Dave and I and our teams on cost. Right. So, yeah. And it's an interesting benchmark because you always don't look at the nuance, the cost to compare a cloud performance versus bare metal. Most people don't factor into setup, cost of deployment. Exactly. So can you just quickly talk about that and how significant of an order of magnitude of your customer? >>So the, the previous Hadoop record took 3,400 servers about 27,000 cores, 13, 13,000, almost 14,000 discs and did 600 gigs, actually a little less than that at 5 78. And on Google, we did it with 2020 100 virtual instances, 8,000 cores did 1.5 terabytes >>And costs. You spin up the Google versus >>Basically if you look at that and you assume conservatively 4,000 per server, it's $13.8 million worth of hardware previously. And the cost to do that run on Google was $20 and 33 cents. >>Well, you got to discount. I mean, come on a partner mean it really costs that much. I mean, they that's what they would charge for it. Actually >>We are map artist's case on that minute. If you look at the Asheville charges to be 1200, >>Okay. It's not six millions, so millions to thousands. Yep. Okay. That's impressive. We'll have to go look at the numbers. Like we're going to look at GreenPlum's numbers in the next couple of weeks when talking about the Google relationship and men were that the up way with that was that >>Very excited about it. We're actually deployed throughout the cloud. We've got multiple partners Google's in limited preview. So we've got a number of customers kind of, you know, testing that and, and doing some really interesting things. >>So we monitor the data center market. I'll see with our proprietary tool that you know about the viewfinder and crowd spots and thing is that the data center verticals interesting, right? If you look at the sentiment analysis of what the conversation is on, on just the Twitter data, it's Facebook, apple, these companies. And when we dig into the numbers, it's not so much the companies, it's the fact that their data center operations are significantly being looked at as the leading indicator for where CEO's are going. So I want to ask you in your conversations with your customers, what are the conversations around moving to the cloud and where are they on that transition? Because we hear, yeah, one of the cloud for all the benefits you were mentioning, but Google and Facebook, these are the gold standards as, as architecture necessarily a cut and paste architecture, but they see the benefits that they're doing. So what are your conversations with your enterprise customers around the cloud cloud architecture and what other features besides replication and disaster recovery, are they, are they looking at >>Well, it's basically work, workload driven and dataset driven. So data that's already in the cloud are kind of a natural first step is, well, why don't I do the analysis there as well? So things like Google earth and digital advertising data, that's real interesting candidates for that also periodic workload. So if they have workloads that need to spin up and spin down, the, the cloud works, works really well for that. And in some cases it's driven by their own environments. They've got data centers that are approaching capacity and they need to kind of do offloads and then looking at the, at the cloud because it's easy to get up running quickly and uses an alternative. >>I want to do come back to one of your three sort of value props here, particularly the dependability piece and specifically the snapshot. So somebody asked me one time, how do you know a couple of years ago, how do you back up a petabyte as he could do this thing? And then his answer was, well, you don't know. So I want to, I want to ask you how your customers are protecting and, and, and, and what you guys are bringing to the table. >>So snapshots is not a bolt on feature. It's basically a low level feature based on the underlying data architecture. So when we architected that from the beginning, snapshots was, was a, was a core feature. And if you use a technique called redirect on, right, you're not copying the data, right? So you can do efficient, you can do a petabyte snapshot, you know, basically almost instantaneously because you're tracking the pointers of the latest blocks that have been written. So if, if the data change rate is, is basically, data's not changing, you can snapshot every minute and not have any additional storage overhead. >>Right. Okay. And, and so you can set that. So you, you map, map, our technologies will allow them to set that, dial that up, dial it down and switches. >>So we support logical volumes. So you can set policies at that volume and you can say, well, this volume is critical data. And then I can set policies. Well, critical data is every minute. And then I can change what the definition of critical data is. Maybe it's every five minutes, et cetera. So you can set up these different policies at volumes and have snapshots happen independently for each. >>Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, as opposed to kind of a one size fits all approach. >>Exactly. And that, that also corresponds to user access, administrative privileges, you know, other features and policies within the, within the cluster. >>How about the, you know, this whole trend toward bringing SQL into, into Hadoop. What's, what's your take on that? And what's your angle? >>So interactive, SQL's an important aspect because you've got so many people trained in the organization and, and leverage, you know, sequel, but it's one of many use cases that needs to run across a big data platform. So there's a range of big data analytics, batch analytics, interactive capabilities with sequel, database operations, no sequel search streaming, all those are kind of functions that need to run across a platform. So it's a piece, but it's not the big driver, because what we've seen is that there's higher rival rate of machine generated data and machine generated response to respond to those for digital advertising, for recommendation engines for fraud detection can really move the needle for an organization, have huge swings and profitability >>And the ball down the field big time. Yeah. And >>Having an interactive piece with a kind of a human element involved, it doesn't really scale and work on a 24 by seven basis. >>Jack final question, we're over now by a minute. But when I ask a one party question, obviously, very competitive landscape right now in terms of competitiveness, the stakes are higher because the demand in the market market opportunities is massive. What's map ours business strategy going forward, no change in direction. Is it going to be same old, same old. You guys have any new things going down and you see the marketplace. >>We've got a huge lead when it comes to kind of mission critical enterprise grade features. And our focus is one platform. So the ability to support enterprise Hadoop, enterprise HBase and provide those full capabilities for ease of use for dependability, for performance. And, you know, we've seen a lot of companies test on one distribution and switch to map are and will continue to help that in the future. >>Well, we, we will, we will say we've been covering this big data space now going on four years now, Dave and I, and we've watched all the players pivot a few times. You guys have not, you guys have been true to your mission from day one and that we know where you stand. No one, everyone knows where you stand enterprise grade. It's a good strategy. I think everyone's putting that on their label now. So enterprise grade Washington, we call it a congratulations map art and said the cube. We'll be right back with our next guest here on day three wall-to-wall coverage at O'Reilly media. When do our news, our next from 12 to one, we'll be right back after this short break.

Published Date : Mar 4 2013

SUMMARY :

So what's your take on the Hadoop If you want to try Hadoop So could you share with the folks out there, what, what to look for in like the, the table stakes, And with Hadoop, you have the Hadoop That's If I could find that and then there's third. So let's talk about the performance because you And if you look back at, at the previous record that was done in the labs with So can you just quickly talk about that and how significant And on Google, we did it with 2020 100 virtual instances, And costs. And the cost to do that run on Google was $20 Well, you got to discount. If you look at the Asheville charges to be 1200, We'll have to go look at the numbers. So we've got a number of customers kind of, you know, testing that and, So I want to ask you in your conversations with your customers, So data that's already in the cloud are kind of a natural first step is, well, So I want to, I want to ask you how your customers are protecting and, and, So you can do efficient, you can do a petabyte snapshot, So you, you map, So you can set policies at that volume and you can say, Can you do that by workload or dataset or by application or whatever I get essentially provided as a service, you know, other features and policies within the, within the cluster. How about the, you know, this whole trend toward bringing SQL into, into Hadoop. you know, sequel, but it's one of many use cases that needs to run And the ball down the field big time. Having an interactive piece with a kind of a human element involved, and you see the marketplace. So the ability to support enterprise Hadoop, You guys have not, you guys have been true to your mission from day

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
$20	QUANTITY	0.99+
Jack Norris	PERSON	0.99+
John Frey	PERSON	0.99+
apple	ORGANIZATION	0.99+
$13.8 million	QUANTITY	0.99+
Dave	PERSON	0.99+
600 gigs	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
60 seconds	QUANTITY	0.99+
1.5 terabytes	QUANTITY	0.99+
33 cents	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
3,400 servers	QUANTITY	0.99+
six millions	QUANTITY	0.99+
8,000 cores	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
O'Reilly	ORGANIZATION	0.99+
1200	QUANTITY	0.99+
third	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Asheville	LOCATION	0.99+
millions	QUANTITY	0.99+
two	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
2009	DATE	0.99+
1.4 terabytes	QUANTITY	0.99+
SQL	TITLE	0.99+
three	QUANTITY	0.99+
yesterday	DATE	0.99+
24	QUANTITY	0.99+
this week	DATE	0.99+
four years	QUANTITY	0.99+
one party	QUANTITY	0.99+
over five years	QUANTITY	0.99+
three areas	QUANTITY	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.98+
2020	DATE	0.98+
one	QUANTITY	0.98+
100 virtual instances	QUANTITY	0.97+
second	QUANTITY	0.97+
one platform	QUANTITY	0.97+
first step	QUANTITY	0.97+
Jack	PERSON	0.97+
one time	QUANTITY	0.97+
Secondly	QUANTITY	0.95+
about 27,000 cores	QUANTITY	0.94+
HBase	TITLE	0.93+
13, 13,000	QUANTITY	0.93+
GreenPlum	ORGANIZATION	0.92+
day three	QUANTITY	0.92+
DMC	ORGANIZATION	0.91+
Intel	ORGANIZATION	0.9+
a minute	QUANTITY	0.9+
day one	QUANTITY	0.89+
Strata Data Conference	EVENT	0.89+
4,000 per server	QUANTITY	0.89+
14,000 discs	QUANTITY	0.87+
five minutes	QUANTITY	0.85+
Washington	LOCATION	0.84+
one distribution	QUANTITY	0.83+
wiki.org	OTHER	0.83+
seven	QUANTITY	0.83+
couple of years ago	DATE	0.83+
5 78	QUANTITY	0.82+
each	QUANTITY	0.81+
Jr	PERSON	0.79+
12	QUANTITY	0.77+

Jack Norris - Strata Conference 2012 - theCUBE

>>Hi everybody. We're back. This is Dave Volante from Wiki bond.org. We're live at strata in Santa Clara, California. This is Silicon angle TVs, continuous coverage of the strata conference. So Riley media or Raleigh media is a great partner of ours. And thanks to them for allowing us to be here. We've been going all week cause it's day three for us. I'm here with Jeff Kelly Wiki bonds that lead big data analysts. And we're here with Jack Norris. Who's the VP of marketing at Matt bar Jack. Welcome to the cube. Thank you, Dave. Thanks very much for coming on. And you know, we've been going all week. You guys are a great sponsor of ours. Thank you for the support. We really appreciate it. How's the show going for you? >>Great. A lot of attention, a lot of focus, a lot of discussion about Hadoop and big data. >>Yeah. So you guys getting a lot of traffic. I mean, it says I hear this 2,500 people here up from 1400 last year. So that's >>Yeah, we've had like five, six people deep in the, in the booth. So I think there's a lot of, a lot of interests. There's interesting. >>You know, when we were here last year, when you looked at the, the infrastructure and the competitive landscape, there wasn't a lot going on and just a very short time, that's completely changed. And you guys have had your hand in that. So, so that's good. Competition is a good thing, right? And, and obviously customers want choice, but so we want to talk about that a little bit. We want to talk about map bar, the kind of problems you're solving. So why don't we start there? What is map are all about? And you've got your own distribution of, of, of enterprise Hadoop. You make it Hadoop enterprise ready? Let's start there. >>Okay. Yeah, I mean, we invested heavily in creating a alternative distribution one that took the best of the open source community with the best of the map, our innovations, and really it's, it's about making Hadoop more applicable, broader use cases, more mission, critical support, you know, being able to sit in and work in a lights out data center environment. >>Okay. So what was the problem that you set out to solve? Why, why do, why do we need another distribution of Hadoop? Let me ask it that way. Get nice and close to. >>So there, there are some just big issues with, with the duke. >>One of those issues, let's talk about that. There's >>Some ease of use issues. There's some deep dependability issues. There's some, some performance. So, you know, let's take those in order right now. If you look at some of the distributions, Apache Hadoop, great technology, but it requires a programmer, right? To get access to the data it's through the Hadoop API, you can't really see the data. So there's a lot of focus of, you know, what do I do once the data's in there opening that up, providing a full file based access, right? So I can look at it and treat it like enterprise storage, see the data, use my standard tools, standard commands, you know, drag and drop from a file browser. You can do that with Matt bar. You can't do that with other districts >>Talking about mountain HDFS as a NFS correct >>Example. Correct. And then, and then just the underlying storage services. The fact that it's append only instead of full random read-write, you know, causes some, some issues. So, you know, that's some of the, the ease of use features. There's a whole lot. We could discuss there. Big picture for reliability. Dependability is there's a single point of failure, multiple single points of failure within Hadoop. So you risk data loss. So people have looked at Hadoop. Traditionally is, is batch oriented. Scratchpad right. We were out to solve that, right? We want to make sure that you can use it for mission critical data, that you don't have a risk of a data loss that you've got full high availability. You've got the full data protection in terms of snapshots and mirroring that you would expect with the enterprise products. >>It gets back to when you guys were, you know, thinking about doing this. I'm not even sure you were at the company at the time, but you, your DNA was there and you're familiar with it. So you guys saw this big data movement. You saw this at duke moon and you said, okay, this is cool. It's going to be big. And it's gonna take a long time for the community to fix all these problems. We can fix them. Now let's go do that. Is that the general discussion? Yeah. >>You know, I think, I think the what's different about this. This is the first open source package. The first open source project that's created a market. If you look at the other open source, you know, Linux, my SQL, et cetera, it was really late in the life cycle of a product. Everyone knew what the features were. It was about, you know, giving an alternative choice, better Unix. Your, your, the focus is on innovation and our founders, you know, have deep enterprise background or CTO was at Google and charge of big table, understands MapReduce at scale, spent time as chief software architect at Spinnaker, which was kind of the fastest clustered Nazanin on the planet. So recognize that the underlying layers of Hadoop needed some rearchitecture and needed some deep investment and to do that effectively and do that quickly required a whole lot of focus. And we thought that was the best way to go to market. >>Talk about the early validation from customers. Obviously you guys didn't just do this in a vacuum, I presume. So you went out and talked to some customers. Yeah. >>What sorts of conversations with customers, why we're in stealth mode? We're probably the loudest stealth >>As you were nodding. And I mean, what were they telling you at the time? Yeah, please go do this. >>The, what we address weren't secrets. I there've been gyrus for open for four or five years on, on these issues. >>Yeah. But at the same time, Jack, you've got this, you got this purist community out there that says, I don't want to, I don't want to rip out HDFS. You know, I want it to be pure. What'd you, what'd you say to those guys, you just say, okay, thank you. We, we understand you're not a prospect. >>And I think, I think that, you know, duke has a huge amount of momentum. And I think a lot of that momentum is that there isn't any risks to adopting Hadoop, right? It's not like the fractured no SQL market where there's 122 different entrance, which one's going to win. Hadoop's got the ecosystem. So when you say pure, it's about the API APIs, it's about making sure that if I create a MapReduce job, it's going to run an Apache. It's going to run a map bar. It's going to run on the other distributions. That's where I think that the heat and the focus is now to do that. You also have to have innovation occurring up and down the stack that that provides choice and alternatives for. >>So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, which is the elephant in the room here. People will worry about lock-in >>Pun intended. >>No, no, but good one good catch. But so, but you're basically saying, Hey, where we're no more locked in than cloud era. Right. I mean, they've got their own >>Actually. I think we're less because it's so easy to get data in and out with our NFS. That there's probably less so, >>So, and I'm gonna come back to that. But so for instance, many, when I, when I say peers, I mean some users in ISV, some guys we've had on here, we had an Abby Mehta from Triceda on the other day, for instance, he's one who said, I just don't have time to mess with that stuff and figure out all that API integration. I mean, there are people out there that just don't want to go that route. Okay. But, but you're saying I'm, I'm inferring this plenty who do right. >>And the, and by the API route, I want to make sure I understand what you're saying. You >>Talked about, Hey, it's all about the API integration. It's not >>About, it's not the, it it's about the API APIs being consistent, a hundred percent compatible. Right. So if I, you know, write a program, that's, that's going after HDFS and the HDFS API, I want to make sure that that'll run on other distributions. Right. >>And that's your promise. Yeah. Okay. All right. So now where I was going with this was th again, there are some peers to say, oh, I just don't want to mess with all that. Now let's talk about what that means to mess with all that. So comScore was a big, high profile case study for you guys. They, they were cloud era customer. They basically, in my understanding is a couple of days migrated from Cloudera to Mapbox. And the impetus was, let's talk about that. Why'd they do that >>Performance data protection, ease of use >>License fee issues. There was some license issues there as well, right? The, the, your, your maintenance pricing was more attractive. Is that true? Or >>I read more mainly about price performance and reliability, and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. Didn't actually tell all their users, they had one guys debug the software for half a day because something was wrong. It finished so quickly. >>So, so it took him a couple of days to migrate and then boom, >>Boom. And they've, they handle about 30 billion objects a day. So there, you know, the use of that really high performance support for, for streaming data flows, you know, they're talking about, they're doing forecasts and insights into web behavior, and, you know, they w the earlier they can do that, the better off they are. So >>Greg, >>So talk about the implications of, of your approach in terms of the customer base. So I'm, I'm imagining that your customers are more, perhaps advanced than a lot of your typical Hadoop users who are just getting started tinkering with Hadoop. Is it fair to say, you know, your customers know what they want and they want performance and they want it now. And they're a little more advanced than perhaps some of the typical early adopters. >>We've got people to go to our website and download the free version. And some of them are just starting off and getting used to Hadoop, but we did specifically target those very experienced Hadoop users that, you know, we're kind of, you know, stubbing their toes on, on the issues. And so they're very receptive to the message of we've made it faster. We've made it more reliable, you know, we've, we've added a lot of ease of use to the, to the Hindu. >>So I found this, let me interrupt, go back to what I was saying before is I found this comment that I found online from Mike Brown comScore. Skipio I presume you mean, he said comScore's map our direct access NFS feature, which exposes a duke distributed file system data as NFS files can then be easily mounted, modified, or overwritten. So that's a data access simplification. You also said we could capitalize on the purchase of map bar with an annual maintenance charge versus a yearly cost per node. NFS allowed our enterprise systems to easily access the data in the cluster. So does that make sense to you that, that enterprise of that annual maintenance charge versus yearly cost per node? I didn't get that. >>Oh, I think he's talking about some, some organizations prefer to do a perpetual license versus a subscription model that's >>Oh, okay. So the traditional way of licensing software >>And that, that you have to do it basically reinforces the fact that we've really invested in have kind of a, a product, you know, orientation rather than just services on top of, of some opensource. >>Okay. So you go in, you license it and then yeah. Perpetual license. >>Then you can also start with the free edition that does all the performance NFS support kick the tires >>Before you buy it. Sorry. Sorry, Jeff. Sorry to interrupt. No, no problem >>At all. So another topic, a lot of interest is security making a dupe enterprise ready. One of the pillars, there is security, making sure access controls, for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from some of the other vendors out there, or the other >>Full Kerberos support. We Lincoln to enterprise standards for access eldap, et cetera. We leveraged the Linux, Pam security, and we also provide volume control. So, you know, right now in Hindu in Apache to dupe other distributions, you put policies at the file level or the entire cluster. And we see many organizations having separate physical clusters because of that limitation, right? And we'd provide volume. So you can define a volume. And in that volume control, access control, administrative privileges data protection class, and, you know, in a sense kind of segregate that content. And that provides a lot of, a lot of control and a lot more, you know, security and protection and separation of data. >>That scenario, the comScore scenario, common where somebody's moving off an existing distribution onto a map are, or, or you more going, going, seeing demand from new customers that are saying, Hey, what's this big data thing I really want to get into it. How's it shake out there >>Right now? There's this huge pent up demand for these features. And we're seeing a lot of people that have run on other distributions switched to map our >>A little bit of everything. How about, can you talk a little bit about your, your channel? You go to market strategy, maybe even some of your ecosystem and partnerships in the little time. >>Sure. So EMC is a big partner of the EMC Greenplum Mr. Edition is basically a map R you can start with any of our additions and upgrade to that. Greenplum with just a licensed key that gives us worldwide service and support. It's been a great partnership. >>We hear a lot of proof of concepts out there >>For, yeah. And then it just hit the news news today about EMC's distribution, Mr. Distribution being available with UCS Cisco's ECS gear. So now that's further expanded the, the footprint that we have about. >>Okay. So you're the EMC relationship. Anything else that you can share with us? >>We have other announcements coming out and >>Then you want to pre-announce in the queue. >>Oops. Did I let that slip >>It's alive? So be careful. And so, in terms of your, your channel strategy, you guys mostly selling direct indirect combination, >>It's it? It, it's kind of an indirect model through these, these large partners with a direct assist. >>Yeah. Okay. So you guys come in and help evangelize. Yep. Excellent. All right. Do you have anything else before we gotta got a roll here? >>Yeah, I did wonder if you could talk a little bit about, you mentioned EMC Greenplum so there's a lot of talk about the data warehouse market, the MPB data warehouses, versus a Hadoop based on that relationship. I'm assuming that Matt BARR thinks well, they're certainly complimentary. Can you just touch on that? And, you know, as opposed to some who think, well, Hadoop is going to be the platform where we go, >>Well, th th there's just, I mean, if you look at the typical organization, they're just really trying to get their, excuse me, their arms around a lot of this machine generated content, this, you know, unstructured data that just growing like wildfire. So there's a lot of Paducah specific use cases that are being rolled out. They're also kind of data lakes, data, oceans, whatever you want to call it, large pools where that information is then being extracted and loaded into data warehouses for further analysis. And I think the big pivot there is if it's well understood what the issue is, you define the schema, then there's a whole host of, of data warehouse applications out there that can be deployed. But there's many things where you don't really understand that yet having to dupe where you don't need to find a schema a is a, is a big value, >>Jack, I'm sorry. We have to go run a couple of minutes behind. Thank you very much for coming on the cube. Great story. Good luck with everything. And sounds like things are really going well and market's heating up and you're in the right place at the right time. So thank you again. Thank you to Jeff. And we'll be right back everybody to the strata conference live in Santa Clara, California, right after this word from our.

Published Date : Apr 27 2012

SUMMARY :

And you know, we've been going all week. A lot of attention, a lot of focus, a lot of discussion about Hadoop So that's So I think there's a lot of, And you guys have had your hand in that. broader use cases, more mission, critical support, you know, being able to sit in and work Let me ask it that way. So there, there are some just big issues with, One of those issues, let's talk about that. So there's a lot of focus of, you know, what do I do once the data's in So you risk data loss. It gets back to when you guys were, you know, thinking about doing this. It was about, you know, giving an alternative choice, better Unix. So you went out and talked to some customers. And I mean, what were they telling you at the time? I there've been gyrus for open for four or five You know, I want it to be And I think, I think that, you know, duke has a huge amount of momentum. So when I'm talking about purists, I don't, I agree with you the whole lock-in thing, I mean, they've got their own I think we're less because it's so easy to get data in and out with our NFS. So, and I'm gonna come back to that. And the, and by the API route, I want to make sure I understand what you're saying. Talked about, Hey, it's all about the API integration. So if I, you know, write a program, that's, that's going after for you guys. Is that true? and, you know, they tested our stuff at work real well in a test environment, they put it in production environment. you know, the use of that really high performance support for, to say, you know, your customers know what they want and they want performance and they want it now. experienced Hadoop users that, you know, we're kind of, you know, So does that make sense to you that, So the traditional way of licensing software And that, that you have to do it basically reinforces the fact that we've really invested in have kind Before you buy it. for instance, making sure let's talk about how you guys approach that and maybe how you differentiate from a lot of control and a lot more, you know, security and protection and separation of data. off an existing distribution onto a map are, or, or you more going, And we're seeing a lot of people that have run on other distributions switched to map our How about, can you talk a little bit about your, your channel? Mr. Edition is basically a map R you can start with any of our additions So now that's further Anything else that you can share with us? you guys mostly selling direct indirect combination, It, it's kind of an indirect model through these, these large partners with Do you have anything else before And, you know, as opposed to some who think, excuse me, their arms around a lot of this machine generated content, this, you know, So thank you again.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Jeff	PERSON	0.99+
Jack Norris	PERSON	0.99+
five	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
Jack	PERSON	0.99+
EMC	ORGANIZATION	0.99+
last year	DATE	0.99+
Matt BARR	PERSON	0.99+
four	QUANTITY	0.99+
UCS	ORGANIZATION	0.99+
2,500 people	QUANTITY	0.99+
Santa Clara, California	LOCATION	0.99+
Greg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Mike Brown	PERSON	0.99+
half a day	QUANTITY	0.99+
Spinnaker	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
comScore	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Riley	ORGANIZATION	0.98+
EMC Greenplum	ORGANIZATION	0.98+
Abby Mehta	PERSON	0.98+
Linux	TITLE	0.97+
strata conference	EVENT	0.97+
SQL	TITLE	0.97+
One	QUANTITY	0.97+
one guys	QUANTITY	0.97+
today	DATE	0.97+
Raleigh	ORGANIZATION	0.97+
122 different entrance	QUANTITY	0.97+
six people	QUANTITY	0.97+
Skipio	PERSON	0.96+
Jeff Kelly	PERSON	0.95+
single point	QUANTITY	0.95+
about 30 billion objects a day	QUANTITY	0.94+
Strata Conference 2012	EVENT	0.93+
ECS	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.91+
Triceda	ORGANIZATION	0.9+
Apache	TITLE	0.9+
firs	QUANTITY	0.9+
Paducah	LOCATION	0.89+
Greenplum	ORGANIZATION	0.89+
single points	QUANTITY	0.88+
day three	QUANTITY	0.88+
NFS	TITLE	0.87+
Wiki bond.org	OTHER	0.87+
1400	QUANTITY	0.85+
Unix	TITLE	0.85+
Wiki bonds	ORGANIZATION	0.84+
Silicon angle	ORGANIZATION	0.83+
Mapbox	ORGANIZATION	0.78+
Apache	ORGANIZATION	0.76+
MapReduce	ORGANIZATION	0.75+
Kerberos	ORGANIZATION	0.75+
first open	QUANTITY	0.74+
Pam	TITLE	0.73+
Matt bar	ORGANIZATION	0.73+
Nazanin	ORGANIZATION	0.61+
Cloudera	TITLE	0.59+
moon	LOCATION	0.58+
Cisco	ORGANIZATION	0.54+
one	QUANTITY	0.53+
days	QUANTITY	0.52+
MapReduce	TITLE	0.47+

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
yahoo	ORGANIZATION	0.99+
Mike Charles	PERSON	0.99+
san Francisco	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Aaron T. Myers	PERSON	0.99+
University of Washington	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
facebook	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
richard Kimble	PERSON	0.99+
Michaelson	PERSON	0.99+
two interns	QUANTITY	0.99+
Oregon	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Todd	PERSON	0.99+
Claudia	PERSON	0.99+
AMC	ORGANIZATION	0.99+
five weeks ago	DATE	0.99+
Northeastern University	ORGANIZATION	0.99+
monday	DATE	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
Dave	PERSON	0.99+
TMC	ORGANIZATION	0.99+
ralph kimball	PERSON	0.99+
burlingame	LOCATION	0.99+
Ferrari	ORGANIZATION	0.98+
today	DATE	0.98+
five	QUANTITY	0.98+
Brown	ORGANIZATION	0.98+
thirties	QUANTITY	0.98+
one	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
Apache	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
erin	PERSON	0.98+
google	ORGANIZATION	0.97+
One	QUANTITY	0.97+
twitter	ORGANIZATION	0.97+
Brown	PERSON	0.97+
a year ago	DATE	0.97+
Salesforce	ORGANIZATION	0.97+
john furry	PERSON	0.96+
one big computer	QUANTITY	0.95+
new york city	LOCATION	0.95+
Mendel	PERSON	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hindu: