Ash Munshi, Pepperdata - #SparkSummit - #theCUBE

(upbeat music) >> Announcer: Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCUBE, it's day two at the Spark Summit 2017. I'm David Goad and here with George Gilbert from Wikibon, George. >> George: Good to be here. >> Alright and the guest of honor of course, is Ash Munshi, who is the CEO of Pepperdata. Ash, welcome to the show. >> Thank you very much, thank you. >> Well you have an interesting background, I want you to just tell us real quick here, not give the whole bio, but you got a great background in machine learning, you were an early user of Spark, tell us a little bit about your experience. >> So I'm actually a mathematician originally, a theoretician who worked for IBM Research, and then subsequently Larry Ellison at Oracle, and a number of other places. But most recently I was CTO at Yahoo, and then subsequent to that I did a bunch of startups, that involved different types of machine learning, and also just in general, sort of a lot of big data infrastructure stuff. >> And go back to 2012 with Spark right? You had an interesting development. Right, so 2011, 2012, when Spark was still early, we were actually building a recommendation system, based on user-generated reviews. That was a project that was done with Nando de Freitas, who is now at DeepMind, and Peter Cnudde, who's one of the key guys that runs infrastructure at Yahoo. We started that company, and we were one of the early users of Spark, and what we found was, that we were analyzing all the reviews at Amazon. So Amazon allows you to crawl all of their reviews, and we basically had natural language processing, that would allow us to analyze all those reviews. When we were doing sort of MapReduce stuff, it was taking us a huge number of nodes, and 24 hours to actually go do analysis. And then we had this little project called Spark, out of AMPlab, and we decided spin it up, and see what we could do. It had lots of issues at that time, but we were able to actually spin it up on to, I think it was in the order of 100,000 nodes, and we were able take our times for running our algorithms from you know, sort of tens of hours, down to sort of an hour or two, so it was a significant improvement in performance. And that's when we realized that, you know, this is going to be something that's going to be really important once this set of issues, where it, once it was going to get mature enough to make happen, and I'm glad to see that that it's actually happened now, and it's actually taken over the world. >> Yeah that little project became a big deal, didn't it? >> It became a big deal, and now everybody's taking advantage of the same thing. >> Well bring us to the present here. We'll talk about Pepperdata and what you do, and then George is going to ask a little bit more about some of the solutions that you have. >> Perfect, so Pepperdata was a company founded by two gentlemen, Sean Suchter and Chad Carson. Sean used to run Yahoo Search, and one of the first guys who actually helped develop Hadoop next to Eric14 and that team. And then Chad was one of the first guys who actually figured out how to monetize clicks, and was the data science guy around the whole thing. So those are the two guys that actually started the company. I joined the company last July as CEO, and you know, what we've done recently, is we've sort of expanded our focus of the company to addressing DevOps for big data. And the reason why DevOps for big data is important, is because what's happened in the last few years, is people have gone from experimenting with big data, to taking big data into production, and now they're actually starting to figure out how to actually make it so that it actually runs properly, and scales, and does all the other kinds of things that are there, right? So, it's that transition that's actually happened, so, "Hey, we ran it in production, "and it didn't quite work the way we wanted to, "now we actually have to make it work correctly." That's where we sort of fit in, and that's where DevOps comes in, right? DevOps comes in when you're actually trying to make production systems that are going to perform in the right way. And the reason for DevOps is it shortens the cycle between developers and operators, right? So the tighter the loop, the faster you can get solutions out, because business users are actually wanting that to happen. That's where we're squarely focused, is how do we make that work? How do we make that work correctly for big data? And the difference between, sort of classic DevOps and DevOps for big data, is that you're now dealing with not just, you know, a set of computers solving an isolated sort of problem. You're dealing with thousands of machines that are solving one problem, and the amount of data is significantly larger. So the classical methodologies that you have, while, you know, agile and all that still works, the tools don't work to actually figure out what you can do with DevOps, and that's where we come in. We've got a set of tools that are focused on performance effectively, 'cause that's the big difference between distributed systems performance I should say, that's the big difference between that, and sort of classic even scaled out computing, right? So if you've got web servers, yes performance is important, and you need data for those, but that can actually be sharded nicely. This is one system working on one problem, right? Or a set of systems working on one problem. That's much harder, it's a different set of problems, and we help solve those problems. >> Yeah, and George you look like you're itching to dig into this, feel free. (exclaims loudly) >> Well so, it was, so one of the big announcements at the show, and the sort of the headline announcement today, was Spark server lists, like so it's not just someone running Spark in the cloud sort of as a manage service, it's up there as a, you know, sort of SaaS application. And you could call it platform of the service, but it's basically a service where, you know, the infrastructure is invisible. Now, for all those customers who are running their own clusters, which is pretty much everyone I would imagine at this point, how far can you take them in hiding much of the overhead of running those clusters? And by the overhead I mean, you know, the primarily performance and maximizing, you know, sort of maximizing resource efficiency. >> So, you have to actually sort of double-click on to the kind of resources that we're talking about here, right? So there's the number of nodes that you're going to need to actually do the computation. There is, you know, the amount of disc storage and stuff that you're going to need, what type of CPUs you're going to need. All of that stuff is sort of part of the costing if you will, of running an infrastructure. If somebody hides all that stuff, and makes it so that it's economical, then you know, that's a great thing, right? And if it can actually be made so that it's works for huge installations, and hides it appropriately so I don't pay too much of a tax, that's a wonderful thing to do. But we have, our customers are enterprises, typically Fortune 200 enterprises, and they have both a mixture of cloud-based stuff, where they actually want to control everything about what's going on, and then they have infrastructure internally, which by definition they control everything that's going on, and for them we're very, very applicable. I don't know how we'd applicable in this, sort of new world as a service that grows and shrinks. I can certainly imagine that whoever provides that service would embed us, to be able to use the stuff more efficiently. >> No, you answered my question, which is, for the people who aren't getting the turnkey you know, sort of SaaS solution, and they need help managing, you know, what's a fairly involved stack, they would turn to you? >> Ash: Yes. >> Okay. >> Can I ask you about the specific products? >> George: Oh yes. >> I saw you at the booth, and I saw you were announcing a couple of things. Well what is new-- >> Ash: Correct. >> With the show? >> Correct, so at the show we announced Code Analyzer for Apache Spark, and what that allows people to do, is really understand where performance issues are actually happening in their code. So, one of the wonderful things about Spark, compared to MapReduce, is that it abstracts the paradigm that you actually write against, right? So that's a wonderful thing, 'cause it makes it easier to write code. The problem when we abstract, is what does that abstraction do down in the hardware, and where am I losing performance? And being able to give that information back to the user. So you know, in Spark, you have jobs that can run in parallel. So an apps consists of jobs, jobs can run in parallel, and each one of these things can consume resources, CPU, memory, and you see that through sort of garbage collection, or a disc or a network, and what you want to find out, is which one these parallel tasks was dominating the CPU? Why was it dominating the CPU? Which one actually caused the garbage collector actually go crazy at some point? While the Spark UI provides some of that information, what it doesn't do, is gives you a time series view of what's going on. So it's sort of a blow-by-blow view of what's going on. By imposing the time series view on sort of an enhanced version of the Spark UI, you now have much better visibility about which offending stages are causing the issue. And the nice thing about that is, once you know that, you know exactly which piece of code that you actually want to go and look at. So classic example would be, you might have two stages that are running in parallel. The Spark UI will tell you that it's stage three that's causing the problem, but if you look at the time series, you'll find out that stage two actually runs longer, and that's the one that's pegging the CPU. And you can see that because we have the time series, but you couldn't see that any other way. >> So you have a code analyzer and also the app profiler. >> So the app profiler is the other product that we announced a few months ago. We announced that I guess about three months ago or so. And the app profiler, what it does, is it actually looks after the run is done, it actually looks at all the data that the run produces, so the Spark history server produces, and then it actually goes back and analyzes that and says, "Well you know what? "You're executors here, are not working as efficiently, "these are the executors "that aren't working as efficiently." It might be using too much memory or whatever, and then it allows the developer to basically be able to click on it and say, "Explain to me why that's happening?" And then it gives you a little, you know, a little fix-it if you will. It's like, if this is happening, you probably want to do these things, in order to improve performance. So, what's happening with our customers, is our customers are asking developers to run the application profiler first, before they actually put stuff on production. Because if the application profiler comes back and says, "Everything is green." That there's no critical issues there. Then they're saying, "Okay fine, put it on my cluster, "on the production cluster, "but don't do it ahead of time." The application profiler, to be clear, is actually based on some work that, on open source project called Dr. Elephant, which comes out of LinkedIn. And now we're working very closely together to make sure that we actually can advance the set of heuristics that we have, that will allow developers to understand and diagnose more and more complex problems. >> The Spark community has the best code names ever. Dr. Elephant, I've never heard of that one before. (laughter) >> Well Dr. Elephant, actually, is not just the Spark community, it's actually also part of the MapReduce community, right? >> David: Ah, okay. >> So yeah, I mean remember Hadoop? >> David: Yes. >> The elephant thing, so Dr. Elephant, and you know. >> Well let's talk about where things are going next, George? >> So, you know, one of the things we hear all the time from customers and vendors, is, "How are we going to deal with this new era "of distributed computing?" You know, where we've got the cloud, on-prem, edge, and like so, for the first question, let's leave out the edge and say, you've got your Fortune 200 client, they have, you know, production clusters or even if it's just one on-prem, but they also want to work in the cloud, whether it's for elastics stuff, or just for, they're gathering a lot of data there. How can you help them manage both, you know, environments? >> Right, so I think there's a bunch of times still, before we get into most customers actually facing that problem. What we see today is, that a lot of the Fortune 200, or our customers, I shouldn't say a lot of the Fortune 200, a lot of our customers have significant, you know, deployments internally on-prem. They do experimentation on the cloud, right? The current infrastructure for managing all these, and sort of orchestrating all this stuff, is typically YARN. What we're seeing, is that more than likely they're going to wind up, or at least our intelligence tells us that it's going to wind up being Kubernetes that's actually going to wind up managing that. So, what will happen is-- >> George: Both on-prem and-- >> Well let me get to that, alright? >> George: Okay. >> So, I think YARN will be replaced certainly on-prem with Kupernetes, because then you can do multi data center, and things of that sort. The nice thing about Kupernetes, is it in fact can span the cloud as well. So, Kupernetes as an infrastructure, is certainly capable of being able to both handle a multi data center deployment on-prem, along with whatever actually happens on the cloud. There is infrastructure available to do that. It's very immature, most of the customers aren't anywhere close to being able to do that, and I would say even before Kupernetes gets accepted within the environment, it's probably 18 months, and there's probably another 18 months to two years, before we start facing this hybrid cloud, on-prem kind of problem. So we're a few years out I think. >> So, would, for those of us including our viewers, you know, who know the acronym, and know that it's a, you know, scheduler slash cluster manager, resource manager, would that give you enough of a control plane and knowledge of sort of the resources out there, for you to be able to either instrument or deploy an instrument to all the clusters (mumbles). >> So we are actually leading the effort right now for big data on Kupernetes. So there is a group of, there's a small group working. It's Google, us, Red Hat, Palantir, Bloomberg now has joined the group as well. We are actually today talking about our effort on getting HDFS working on Kupernetes, so we see the writing on the wall. We clearly are positioning ourselves to be a player in that particular space, so we think we'll be ready and able to take that challenge on. >> Ash this is great stuff, we've just got about a minute before the break, so I wanted to ask you just a final question. You've been in the Spark community for a while, so what of their open source tools should we be keeping our eyes out for? >> Kupernetes. >> David: That's the one? >> To me that is the killer that's coming next. >> David: Alright. >> I think that's going to make life, it's going to unify the microservices architecture, plus the sort of multi data center and everything else. I think it's really, really good. Board works, it's been working for a long time. >> David: Alright, and I want to thank you for that little Pepper pen that I got over at your booth, as the coolest-- >> Come and get more. >> Gadget here. >> We also have Pepper sauce. >> Oh, of course. (laughter) Well there sir-- >> It's our sauce. >> There's the hot news from-- >> Ash: There you go. >> Pepperdata Ash Munshi. Thank you so much for being on the show, we appreciate it. >> Ash: My pleasure, thank you very much. >> And thank you for watching theCUBE. We're going to be back with more guests, including Ali Ghodsi, CEO of Databricks, coming up next. (upbeat music) (ocean roaring)

Published Date : Jun 7 2017

SUMMARY :

brought to you by Databricks. and here with George Gilbert from Wikibon, George. Alright and the guest of honor of course, I want you to just tell us real quick here, and then subsequent to that I did a bunch of startups, and it's actually taken over the world. and now everybody's taking advantage of the same thing. about some of the solutions that you have. So the classical methodologies that you have, Yeah, and George you look like And by the overhead I mean, you know, is sort of part of the costing if you will, and I saw you were announcing a couple of things. And the nice thing about that is, once you know that, And then it gives you a little, The Spark community has the best code names ever. is not just the Spark community, and like so, for the first question, that a lot of the Fortune 200, or our customers, and there's probably another 18 months to two years, and know that it's a, you know, scheduler Bloomberg now has joined the group as well. so I wanted to ask you just a final question. plus the sort of multi data center Oh, of course. Thank you so much for being on the show, we appreciate it. And thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
David Goad	PERSON	0.99+
Ash Munshi	PERSON	0.99+
George	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
Larry Ellison	PERSON	0.99+
George Gilbert	PERSON	0.99+
Google	ORGANIZATION	0.99+
Sean Suchter	PERSON	0.99+
David	PERSON	0.99+
Sean	PERSON	0.99+
Ash	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Peter Cnudde	PERSON	0.99+
2011	DATE	0.99+
DeepMind	ORGANIZATION	0.99+
Bloomberg	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
two guys	QUANTITY	0.99+
Pepperdata	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
first question	QUANTITY	0.99+
Spark UI	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
DevOps	TITLE	0.99+
2012	DATE	0.99+
Chad Carson	PERSON	0.99+
two years	QUANTITY	0.99+
18 months	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
one problem	QUANTITY	0.99+
last July	DATE	0.99+
Databricks	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
Spark Summit 2017	EVENT	0.99+
Code Analyzer	TITLE	0.99+
Spark	TITLE	0.98+
100,000 nodes	QUANTITY	0.98+
today	DATE	0.98+
Palantir	ORGANIZATION	0.98+
an hour	QUANTITY	0.98+
IBM Research	ORGANIZATION	0.98+
Both	QUANTITY	0.98+
two gentlemen	QUANTITY	0.98+
Chad	PERSON	0.98+
two stages	QUANTITY	0.98+
first guys	QUANTITY	0.98+
both	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
each one	QUANTITY	0.97+
tens of hours	QUANTITY	0.95+
Kupernetes	ORGANIZATION	0.95+
MapReduce	TITLE	0.95+
Yahoo Search	ORGANIZATION	0.94+

Guy Churchward & Phu Hoang, DataTorrent Inc. | Mobile World Congress 2017

(techno music) >> Announcer: Live, from Silicon Valley, it's "the Cube," covering Mobile World Congress 2017. Brought to you by Mintel. >> Okay, welcome back everyone. We're here live in Palo Alto, California, covering Mobile World Congress, which is later in Spain right now, in Barcelona, it's gettin' close to bedtime, or, if you're a night owl, you're out hittin' the town, because Barcelona stays out very late, or just finishing your dinner. Of course, we'll bring in all theCube coverage here. News analysis, commentary, and of course, reaction to all the big mega-trends. And our next two guests is Guy Churchward who is the President and CEO of Data Torrent, formerly of EMC. You probably recognize him from theCube, from the EMC world, the many times he's been on. Cube alumni. And Phu Hoang, who's the co-founder and Chief Strategy Officer of Data Torrent. Co-founder, one of the founders. Also one of the early, early Yahoo engineers. I think he was the fourth engineer at Yahoo. Going way back on the 90s. Built that to a large scale. And Yahoo is credited for the invention of Hadoop, and many other great big data things. And we all know Yahoo was data-full. Guys, welcome to theCube's special coverage. Great to see you. >> Thank you so much. So I'm psyched that you guys came in, because, two things. I want to talk about the new opportunity at Data Torrent, and get some stories around the large scales experience that you guys have dealing with data. 'Cause you're in the middle of where this is intersecting with Mobile World Congress. Right now, Mobile World Congress is on the collision course between cloud-ready, classic enterprise network architectures with consumer, all happening at the same time. And data, with internet of things, is that going to be at the center of all the action? So, (laughing) these are not devices. So, that's the core theme. So, Guy, I want to get your take on, what attracted you to Data Torrent? What was the appeal for the opportunity? >> You mean, why am I here, why have I just arrived? >> I've always data-obsessed. You know this. From the days of running the storage business on their data protection, before that I was doing data analytics and security forensics. And if you look at, as you said, whether it's big data, or cloud, and the immersion of IOT, one thing's for sure, for me. It was never about big data, as in a big blob of stuff. It was all about small data sprawl. And the world's just getting more diverse by the second, and you can see that by Mobile World, right? The challenge then you have is, companies, they need to analyze their business. In other words, data analytics. About 30 years ago, when I was working for BA Systems, I remember meeting a general of the army. And he said the next war will be one in the data center, not on the battlegrounds. And so you really understand-- >> He's right about that. >> Yeah. And you have to be very, very close. So in other words, companies have started to obsess about what I call the do loop. And that really means, when data is created, and then ingesting the data, and getting insight from the data, and then actioning on that. And it's that do loop. And what you want to do, is you want to squeeze that down into a sub-second. And if you can run your analytics at the pace of your business, then you're in good shape. If you can't, you lose. And that means from a security perspective, or you're not going to win the bids. In any shape or form. That's not a business-- >> John: So speed is critical. >> Yeah, and people say, speed and accuracy. Because what you don't want to do is to run really really fast and fall off a cliff. So you really need to make sure that speed is there and accuracy is there. In the good old days, when I was running security forensics, you could either do complex end processing, which was a very small amount of information coming in and then querying it like crazy, or things like log management, where you would store data at rest, and then look at it afterwards. But now with the paradigm of all the technology catching up, so whether that's the disk space that you get, and the storage and the processing, and things like Hadoop with the clustering, you now break that paradigm. Where you can collect all the information from a business and process it before you land the data, and then get the insight out of it, and then action. So that was my thing, of looking and saying, look, this whole thing's going to happen. In last year -- >> And at large scale, too. I mean, what you're talking about in the theoretical side makes a lot of sense, but also putting that into large scale, is even more challenging. >> Yeah, we had, when I was going through the processes, dating, you know, to see whether was a company that made sense, I chatted one of our investors. And they're also a customer. And I said, why did you choose Data Torrent? And they said, "We tested everything in production, we tested all the competitive products out there, and we broke everything except Data Torrent. And actually, we tested you in production up to a billion events per second, and you didn't break. And we believe that that quantity is something that you need as a stepping stone to move forward." >> And what use cases does that fit for? Just give me some anecdotal (snaps fingers) billion transactions. At that speed, what's some use cases that really take advantage of that? >> They were mastering in, what I would call, industrialization of IT. So in other words, once you get into things like turbines, wind generation, train parts. We're going to be very very soon, looking out of a window and seeing -- >> John: So is it flow data? Is it the speed of the flow? Is it the feed of all the calculations, or both? >> It's a bit of both. And what I'll do, is I'll give Phu a chance, otherwise, we'll end up chatting about it. >> John: Phu, come on, you're the star. (laughing) When you founded this company, you had a background at Yahoo, which you built from scratch, but that was a first-mover opportunity, Web 1.0, as they say. That evolved up and then, everyone used Yahoo Finance. Everyone used Yahoo Search as a directory early on. And then everything just got bigger and bigger and bigger, and then you had to build your own stuff with Hadoop. >> Yeah. >> So you lived it. The telcos don't have the same problem. They actually got backed into the data, from being in the voice business, and then the data business. The data came after the voice. So what's the motivation behind Data Torrent? Tell us a little bit more. >> It's exactly what you say, actually. Going through the 12 years at Yahoo, and really, we learned big data the hard way. Making mistakes month after month, about how to do this thing right. We didn't have the money, and then we found out that, actually, proprietary systems of the shelf system that we thought were available, really couldn't do their jobs. So we had to invent our own technology, to deal with the kind of data processing that we had. At some point, Yahoo had a billion users using Yahoo at any given point in time, right? And the amount of impressions, the amount of clicks, the amount of activity, that a billion users have, onto the system. And all of the log files that you have to process to understand what's going on. On the other side of that, we need to be able to understand all of those activities in order to sell to our advertisers. Slice and dice behaviors and users, and so on. We didn't have the technology to do that. The only thing we knew how to do was, to have these cheap racks of cheap servers, that we were using to serve webpages. And we turned to that to say, this is what we're going to need to do, to solve these big data problems. And so, the idea of, okay we need to take this big problem and divide it into smaller pieces, so that we can run on these cheap servers, sort of became the core tenant of how we do distributor processing that became Hadoop, at the end of the day, right? >> You had big data come in because you were, big data-full, as we say. You weren't building software to solve someone else's problem. You had your own problem, you had a lot of data. You were full with data. >> Exactly. >> Had to go on a data diet, to your point. (crosstalk) >> And no one to turn to. >> And no one to turn to. >> All right. So let's spin this around or Mobile World Congress. 'Cause the big theme is, obviously, we all know what device is. In fact, we just released here on theCube early this morning Peter Burris pre-announced our new research initiative called IOTP. Which stands for Internet Of Things And People. And so now you add the complexity of people devices, whether that's going to be some sort of watch, phones, anything around them. That adds to the industrial aspect of turbines and what not. Internet of Things is a new edge architecture. So the data tsunami coming, besides the challenges of telcos to provision these devices, are going to be very challenging. So the question I want to ask you guys is, how do you see this evolving, because you have certainly connectivity. Yeah, you know, low latency, small little data coming from the windmills or whatever. Versus big high-dense bandwidth, mobility. And then you got network core issues, right. So how does this going to look like? Where does the data piece fit in? Because all aspects of this have data. What's your thoughts on this, and architecture. Tell us about your impressions, and the conversations you've had. >> First of all, I think data will exist everywhere. On the fringe, in the middle, at the center. And there's going to be data analytics and processing in every path of that. The challenge will be to kind of figure out what part of processing do you put on the fringe, what part do you put at the center. And I think that's a fluid thing that is going to be constantly changing. Going back to the telcos. We've had numbers of conversationw with telcos. And, yes we're helping them right now with their current set of issues around capacity management and billing, all those things. But they are also looking to the next step in their business. They're making all this money from provisioning, but they know they sit on top of this massive amount of really valuable data, from their customers. Every cellphone is sending them all of this data. And so there's a huge opportunity for them to monetize, or really produce value, back to their customers. And that could come in form of offers, to customers. But now you're talking about massive analytics targeting. That is also real-time, because if you're sending an offer to someone at a particular location, if you do that slowly, or in batch, and you give them an offer 10 minutes later, they're no longer where they are. They're 10 minutes away, right? >> Well, first two questions to follow up on that. One, do they know they have a data advantage opportunity here? Do they know that data is potentially a competitive advantage? >> From our conversation, they absolutely do. They're just trying to figure out, so what do we do here? It's new to them. >> I want to get both your perspectives. Guy, I want you to weigh in on this one, 'cause this is another theme that's coming out of the reporting and analysis from Mobile World Congress. This has come also from the cloud side as well. Integration now, is more important than ever, because, for instance, they might have an Oracle there, there might be Oracle databases outside their network. That they might want to tap into. So tapping other people's data. Not just what they can get, the telcos. It's going to be important. So how do you guys see the integration aspect, how we, top of the first inning, national anthem going on. I mean, where are we in this integration? There's a pregame, or, what inning are we in on this? >> Yeah, we're definitely not on the home run on it. I think our friend, and your friend Steve Manly, I sat down with him, and I gave him a brief, you know, what we were doing, and he was blown away by the technology and the opportunity, but he was certainly saying, but the challenge is the diversity of the data types. And then where they're going to be. Autonomic cars. You know each manufacturer will tell the car behind it, what it just experienced, but the question is, when will a Tesla tell a Range Rover, or tell a BMW? So you have actually -- >> They're different platforms, just different stats, it's a nightmare. >> Right. So in other words, >> And trackability. And whether it's going to be open APIs, whether it's technologies like Kafka. But the integration of that, and making sure that you can do transformation and then normalize it and drive it forward. It's kind of interesting, you know. You mentioned the telco space, and do they understand it. In some respects, what Phu went through with Yahoo, in other words, you go to a webpage, you pull it up, it knows you because of a cookie and it figures out, and then sells advertising to you on that page. Now think about you as a location, and you're walking past a Starbucks, and they want to sell you a coffee for ten cents less than they would normally do. They need to know you're there then. And this is the thing, and this is why real-time is going to be so critical. And similarly, like you said, you look out the window and you see DHL, or UPS, or FedEx drones out the window. You not only have an insight issue. You also have a security issue, you have a compliance issue, you have a locational issue. >> I think you're onto something. And I think I actually had this talk today with Steve Manly EMC World last year, around time series data. So this is interesting. Everyone wants to store everything, but it actually might not be worth anything anymore. If the drone is delivering your package, or whatever realtime data is in realtime, it's really important right there in realtime, or near realtime. It might not be worth anything after. But yet a purchase at a store, at a time, might be worth knowing that as a record to pull in. You get what I'm saying? So there's a notion of data that's interesting. >> And I think, and again, Phu's the expert. I'm still running up onto it. It's just a pet hobby, an obsession of mine. But the market has this term ETL. In other words, Extract, Transform, Land. Or load. But in essence, it's always talked about in that (mumbles) batch. In other words, I get the data, transform it, drop it, and then I have a look at it. We're going upside-down. So the idea now is to actually extract, transform, insight, action, then landing. So in other words, get the value at the fresh data, before it's the data late. Because if you set the data late, by default, it's actually stale. And actually, then there's the fascination of saying, if you're delivering realtime data to a person, you can't think fast enough to actually make a live decision. So therefore, you've almost got any information that comes to you, has to tier out. So it comes to a process. You get that fresh use of it, and then it drops into a data lake. And so I think there's using both, but I think what will you see in the market, and, again, you've experienced the disk flash momentum that happened last year. You're going to see that from a data source from at-rest, advanced, to real-time data streams on our applications next year. So I think the issue is, the formative year, and back to your, you know, get it right, get the integration, but make sure your APIs are there, talking to the right technologies. I think everything's going to be exciting this year and new and fresh and people really want to do it. Next year is going to be the year where you're going to see an absolute changing of the guards. >> And then also the SLA requirements, they'll start to get into this when you start looking at integration. >> You're absolutely right. Actually, the SLA part is actually very very important here. Because, as you move analytics from this back world, where it has, you do it once a day, and if it dies, it's okay, you just do it again. To where it is now continuous, 24 by 7, giving you insight continuously about your business, your people, your services, and so on. Then all of a sudden, it has to have the same characteristics as your business. Which is, it's 24 by 7, it can never go down, it can never lose data. So, all of a sudden you're putting tremendous requirements on an analytics system, which has, all the way from the beginning of history 'til now, been a very relaxed batch thing, to all of a sudden being something that is enterprise-grade, 24 by 7. And I think that that's actually where it's going to be the toughest nut to crack. >> So tell about some of the things that you've learned. And pretend for a second, let's pretend that you, as a co-founder at Data Torrent, and Guy, and you are teamed up. You guys run this telco. Let's just make one up, Verizon. Or AT&T, or pick one. And you sit there saying, okay, you've got the keys to the kingdom. And you can do whatever you want (laughing). You can be Donald Trump, or you can be whoever you want. You can fire everybody, or you can pick it over and run it. What would you do? You know you've got IOT. So this is business model innovation opportunities. I want you to put the technical hat on, plus knowing what you know around the business model opportunities. What do you do? You know IOT's an opportunity. Amazon is going after that heavily. Do you bolt a cloud together? Do you go after Amazon? Do you co-op with Amazon? Do you co-integrate? Do you grab the IOT? Do you use the data? I mean, given where we are today, what's the best move if we were consulting with this. >> You know, I will be the last person to be talking about giving advice to a telco. But since we are, we own our own telco here, and then we're pretending, I would say the following. IOT is going to happen, right? Earlier, when I say a billion people, that's just human beings. Once you now talk about censoring, you can program how many times they can send you data per second, then the growth in volume is immense, right? I think there's a huge opportunity, as a telco, in terms of the data that they have available and the insight that they could have about what's going on. That is not easy. I don't think that, as a telco, in the current DNA of a telco, I can go ahead and do all that analytics and really open up my business to the data insight layer. I would partner, and find a way-- >> Well, we're consulting, we're going to sit around and say hey, what do we have? We have relationship with the consumer, big marketing budgets. We can talk to them directly, we have access to their device. >> But you'll bifurcate the business. We're in the boardroom here, this is nothing more than that. But I would look at it and say look, you've got a consumer business, the same as in IOT. There's really, for me, there's three parts of IOT. There is the bit that I love which, you can geek out, which is basically the consumer market, which, there's no money in for a large-scale tenant, right, enterprise. And then you have the industrialization of IOT, which is I've got a leaky pipe, and I want a hardened device, ruggedized, which is wifi, so, now as a telco, I could create a IOT cloud, that allows me to put these devices out there, and in fact, I use Arlo, the little cameras. And they've got one now, where I can basically float it with its own cellular signal. So it's its own cellphone. That's a great use of IOT for that. And then you step to the consumer side of, I've got a cellphone, and then what I'll do is literally, in essence, riff off what Yahoo did in the early days and say, I'm now the new browser. The person's the browser. So in other words, follow the location, follow where he is, and then basically do locational-based advertising. >> By the way, you have to license the patent from our earlier guest, he'll say will he leak, 'cause he's got th6e patent on personal firewall for personal server. He's built a mobile personal server. >> Yeah. >> But this is the opportunity around wireless. Why I love the confusion, but the opportunity around wireless right now is, you can get bandwidth at high capacity. You have millimeter wave four, that doesn't go through walls, but you have other diverse frequencies and spectrum for instance, you can blend it all together to have that little drip signal, if you will, going into the cloud from the leaky pipe. Or if you need turbine, full-fat pipe, you maybe go somewhere. So, I think this is an interesting opportunity. >> And they're going to end up watching the data centers as well. There's still the gamut of saying our customer is going to continue to support their own data centers, or are there going to be one to a hundred data centers out there? And then how does selling a manufacturer or a telco play into that, and do they want to be that guy or not? >> Guy, Phu, thanks for coming in. I want to give you guys a chance to put a plug in for Data Torrent. Thanks for sharing some great commentary on the industry. So, what's up with you guys? Give us the update. Are you hiring? You growing? What are you guys doing? Customers? What's the update? Technology, innovations? >> So we've got a release coming out tomorrow which is a momentum release. I can't talk too much about the numbers, but in essence, from a fact base, we have a thing called a patchy apex. So it's open sourced, so you can use our product for free. But that's growing like gangbusters. From a top-level project, that's actually the fastest-growing one, and it's only been out for seven months. We just broke through 50,000 users on it. From our product, we're doing very well on the back of it. So we actually have subscription for the production side. >> So revenue is a subscription model. >> Yeah, and we meet both sides. So in other words, for the engineer who writes it, you've got the open source. And then when you put it into production, from the operations side, you can then license our products to enable you to manage an easy-- >> So when it gets commercialized, you pay as you go, when you use it. >> And you don't have to, if you don't want to. You've got all the tools to do it. But, we focus for our products group of, time to value, total cost of ownership. We're trying to bring Hadoop and real scale, realtime streaming to the masses. So what's the technology innovation? What's the disruptive enabler for you guys? >> I think we talked about it, right? You've got two really competing trends going on here. On one side, data is getting more and more and more massive. So it's going to take longer and longer to process it. Yet at the other side, business wants to be able to get data, have insight, and take action sub-second. So how do you get both at the same time? That's really the magic of the technology. >> Thanks for coming in. Great to meet you, Phu. I'd love to talk about the old Yahoo days, a total throwback, Web 1.0, a great time in history, pre-bubble bursting. Greatness happening in the valley and all around the world, and I remember those days clearly. Guy, great to see you. Congratulations on your new CEO committee. And great to have you on theCube. This is theCube bringing the coverage, and commentary, and reaction of Mobile World Congress here, in California. As everyone goes to bed in Barcelona, we're just gettin' down to the end of our day here in the afternoon in California. Be right back with more after this short break. (techno music)

Published Date : Mar 1 2017

SUMMARY :

Brought to you by Mintel. And Yahoo is credited for the invention of Hadoop, So I'm psyched that you guys came in, because, two things. And if you look at, as you said, And what you want to do, is you want to squeeze that and process it before you land the data, I mean, what you're talking about in the theoretical side And I said, why did you choose Data Torrent? And what use cases does that fit for? So in other words, once you get into things like And what I'll do, is I'll give Phu a chance, and then you had to build your own stuff with Hadoop. So you lived it. And all of the log files that you have to process You had big data come in because you were, Had to go on a data diet, to your point. So the question I want to ask you guys is, and you give them an offer 10 minutes later, Do they know that data It's new to them. So how do you guys see the integration aspect, and I gave him a brief, you know, what we were doing, just different stats, it's a nightmare. So in other words, and then sells advertising to you on that page. And I think I actually had this talk today with Steve Manly So the idea now is to actually extract, transform, when you start looking at integration. and if it dies, it's okay, you just do it again. And you can do whatever you want (laughing). and the insight that they could have about what's going on. We can talk to them directly, There is the bit that I love which, you can geek out, By the way, you have to license the patent to have that little drip signal, if you will, And they're going to end up watching I want to give you guys a chance to put a plug in So it's open sourced, so you can use our product for free. And then when you put it into production, So when it gets commercialized, you pay as you go, What's the disruptive enabler for you guys? So how do you get both at the same time? And great to have you on theCube.

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Steve Manly	PERSON	0.99+
Sanjay	PERSON	0.99+
Rick	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
David	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Fernando Castillo	PERSON	0.99+
John	PERSON	0.99+
Dave Balanta	PERSON	0.99+
Erin	PERSON	0.99+
Aaron Kelly	PERSON	0.99+
Jim	PERSON	0.99+
Fernando	PERSON	0.99+
Phil Bollinger	PERSON	0.99+
Doug Young	PERSON	0.99+
1983	DATE	0.99+
Eric Herzog	PERSON	0.99+
Lisa	PERSON	0.99+
Deloitte	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Spain	LOCATION	0.99+
25	QUANTITY	0.99+
Pat Gelsing	PERSON	0.99+
Data Torrent	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Dave	PERSON	0.99+
Pat	PERSON	0.99+
AWS Partner Network	ORGANIZATION	0.99+
Maurizio Carli	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Drew Clark	PERSON	0.99+
March	DATE	0.99+
John Troyer	PERSON	0.99+
Rich Steeves	PERSON	0.99+
Europe	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
85%	QUANTITY	0.99+
Phu Hoang	PERSON	0.99+
Volkswagen	ORGANIZATION	0.99+
1	QUANTITY	0.99+
Cook Industries	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Dave Valata	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Boston	LOCATION	0.99+
Stephen Jones	PERSON	0.99+
UK	LOCATION	0.99+
Barcelona	LOCATION	0.99+
Better Cybercrime Metrics Act	TITLE	0.99+
2007	DATE	0.99+
John Furrier	PERSON	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Yahoo Search: