Murthy Mathiprakasam, - Informatica - Big Data SV 17 - #BigDataSV - #theCUBE1

(electronic music) >> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are live in Silicon Valley for Big Data Silicon Valley. Our companion showed at Big Data NYC in conjunction with Strata Hadoop, Big Data Week. Our next guest is Murthy Mathiprakasam, with the director of product marketing Informatica. Did I get it right? >> Murthy: Absolutely (laughing)! >> Okay (laughing), welcome back. Good to see you again. >> Good to see you! >> Informatica, you guys had a AMIT on earlier yesterday, kicking off our event. It is a data lake world out there, and the show theme has been, obviously beside a ton of machine learning-- >> Murthy: Yep. >> Which has been fantastic. We love that because that's a real trend. And IOT has been a subtext to the conversation and almost a forcing function. Every year the big data world is getting more and more pokes and levers off of Hadoop to a variety of different data sources, so a lot of people are taking a step back, and a protracted view of their landscape inside their own companies and, saying, Okay, where are we? So kind of a checkpoint in the industry. You guys do a lot of work with customers, your history with Informatica, and certainly over the past few years, the change in focus, certainly on the product side, has been kind of interesting. You guys have what looks like to be a solid approach, a abstraction layer for data and metadata, to be the keys to the kingdom, but yet not locking it down, making it freely available, yet provide the governance and all that stuff. >> Murthy: Exactly. >> And my interview with AMIT laid it all out there. But the question is what are the customers doing? I'd like to dig in, if you could share just some of the best practices. What are you seeing? What are the trends? Are they taking a step back? How is IOT affecting it? What's generally happening? >> Yeah, I know, great question. So it has been really, really exciting. It's been kind of a whirlwind over the last couple years, so many new technologies, and we do get the benefit of working with a lot of very, very, innovative organizations. IOT is really interesting because up until now, IOT's always been sort of theoretical, you're like, what's the thing? >> John: Yeah. (laughing) What's this Internet of things? >> But-- >> And IT was always poo-pooing someone else's department (laughing). >> Yeah, exactly. But we have actually have customers doing this now, so we've been working with automative manufacturers on connected vehicle initiatives, pulling sensor data, been working with oil and gas companies, connected meters and connected energy, manufacturing, logistics companies, looking at putting meters on trucks, so they can actually track where all the trucks are going. Huge cost savings and service delivery kind of benefits from all this stuff, so you're absolutely right IOT, I think is finally becoming real. And we have a streaming solution that kind of works on top of all the open source streaming platforms, so we try to simplify everything, just like we have always done. We did that MapReduce, with Spark, now with all the streaming technologies. You gave a graphical approach where you can go in and say, Well, here's what the kind of processing we want. You'd lay it out visually and it executes in the Hadoop cluster. >> I know you guys have done a great job with the product, it's been very complimentary you guys, and it's almost as if there's been an transformation within Informatica. And I know you went private and everything, but a lot of good product shops there. You guys got a lot good product guys, so I got to ask you the question, I don't see IOT sometimes as an operational technology component, usually running their own stacks, not even plugged into IT, so that's the whole another story. I'll get to that in a second. But the trend here is you have the batch world, companies that have been in this ecosystem here that are on the show floor, at O'Reilly Media, or talking to us on The Cube. Some have been just pure play batch-related! Then the fashionable steaming technologies have come out, but what's happened with Spark, you're starting to see the collision between batch and realtime-- >> Umm-hmm. >> Called streaming or what not. And at the center of that's the deep learning, it's the IOT, and it's the AI, that's going to be at the intersection of these two colliding forces, so you can't have a one-trick pony here and there. You got to kind of have a blended, more of a holistic, horizontal, scalable approach. >> Murthy: Yes. >> So I want to get your reaction to that. And two, what product gaps and organizational gaps and process gaps emerge from this trend? And what do you guys do? So, three-part question. >> Murthy: Yeah (laughing). >> Go ahead. Go ahead. >> I'll try to cover all three. >> So, first, the collision and your reaction to that trend. >> Murthy: Yeah, yeah. >> And then the gaps. >> Absolutely. So basically, you know Informatica, we've supported every type of kind of variation of these type of environments, and so we're not really a believer in it's this or that. It's not on premise or cloud, it's not realtime or batch. We want to make it simple and no matter how you want to process the data, or where you want to process it. So customers who use our platform for their realtime or streaming solutions, are using the same interface, as if they were doing it batched. We just run it differently under the hood. And so, that simplifies and makes a lot of these initiatives more practical because you might start with a certain latency, and you think maybe it's okay to do it at one speed. Maybe you decide to change. It could be faster or slower, and you don't have to go through code rewrites and just starting completely from scratch. That's the benefit of the abstraction layer, like you were saying. And so, I think that's one way that organizations can shield themselves from the question because why even pose that question in the first... Why is it either this or that? Why not have a system that you can actually tune and maybe today you want to start batch, and tomorrow you evolve it to be more streaming and more realtime. Help me on the-- >> John: On the gaps-- >> Yes. >> Always product gaps because, again, you mentioned that you're solving it, and that might be an integration challenge for you guys. >> Yep. >> Or an integration solution for you guys, challenge, opportunity, whatever you guys want to call it. >> Absolutely! >> Organizational gaps maybe not set up for and then processed. >> Right. I think it was interesting that we actually went out to dinner with a couple of customers last night. And they were talking a lot about the organizational stuff because the technology they're using is Informatica, so that's part's easy. So, they're like, Okay, it's always the stuff around budgeting, it's around resourcing, skills gap, and we've been talking about this stuff for a long time, right. >> John: Yeah. >> But it's fascinating, even in 2017, it's still a persistent issue, and part of what their challenge was is that even the way IT projects have been funded in the past. You have this kind of waterfall-ish type of governance mechanism where you're supposed to say, Oh, what are you going to do over the next 12 months? We're going to allocate money for that. We'll allocate people for that. Like, what big data project takes 12 months? Twelve months you're going to have a completely (laughing) different stack that you're going to be working with. And so, their challenge is evolving into a more agile kind of model where they can go justify quick-hit projects that may have very unknown kind of business value, but it's just getting by in that... Hey, sometime might be discovered here? This is kind of an exploration-use case, discovery, a lot of this IOT stuff, too. People are bringing back the sensor data, you don't know what's going to coming out of that or (laughing)-- >> John: Yeah. >> What insights you're going to get. >> So there's-- >> Frequency, velocity, could be completely dynamic. >> Umm-hmm. Absolutely! >> So I think part of the best practice is being able to set outside of this kind of notion of innovation where you have funding available for... Get a small cross-functional team together, so this is part of the other aspect of your question, which is organizationally, this isn't just IT. You got to have the data architects from IT, you got to have the data engineers from IT. You got to have data stewards from the line of business. You got business analysts from the line of business. Whenever you get these guys together-- >> Yeah. >> Small core team, and people have been talking about this, right. >> John: Yeah. >> Agile development and all that. It totally applies to the data world. >> John: And the cloud's right there, too, so they have to go there. >> Murthy: That's right! Exactly. So you-- >> So is the 12-month project model, the waterfall model, however you want... maybe 24 months more like it. But the problem on the fail side there is that when they wake up and ship the world's changed, so there's kind of a diminishing return. Is that kind of what you're getting out there on that fail side? >> Exactly. It's all about failing fast forward and succeeding very quickly as well. And so, when you look at most of the successful organizations, they have radically faster project lifecycles, and this is all the more reason to be using something like Informatica, which abstracts all the technology away, so you're not mired in code rewrites and long development cycles. You just want to ship as quickly as possible, get the organization by in that, Hey, we can make this work! Here's some new insights that we never had before. That gets you the political capital-- >> John: Yeah. >> For the next project, the next project, and you just got to keep doing that over and over again. >> Yeah, yeah. I always call that agile more of a blank check in a safe harbor because, in case you fail forward, (laughing) I'm failing forward. (laughing) You keep your job, but there's some merit to that. But here's the trick question for you: Now let's talk about hybrid. >> Umm-hmm. >> On prem and cloud. Now, that's the real challenge. What are you guys doing there because now I don't want to have a job on prem. I don't want to have a job on the cloud. That's not redundancy, that's inefficient, that's duplicates. >> Yes. >> So that's an issue. So how do you guys tee it up there for the customer? And what's the playbook for them, and people who are trying to scratching their heads saying, I want on prem. And Oracle got this right. Their earnings came out pretty good, same code on prem, off prem, same code base. So workloads can move depending upon the use cases. >> Yep. >> How do you guys compare? >> Actually that's the exact same approach that we're taking because, again, it's all about that customer shouldn't have to make the either or-- >> So for you guys, interfacing code same on prem and cloud. >> That's right. So you can run our big data solutions on Amazon, Microsoft, any kind of cloud Hadoop environment. We can connect to data sources that are in the cloud, so different SAAS apps. >> John: Umm-hmm. >> If you want to suck data out of there. We got all the out-of-the-box connectivity to all the major SAAS applications. And we can also actually leverage a lot of these new cloud processing engines, too. So we're trying to be the abstraction layer, so now it's not just about Spark and Spark streaming, there's all these new platforms that are coming out in the cloud. So we're integrating with that, so you can use our interface and then push down the processing to a cloud data processing system. So there's a lot of opportunity here to use cloud, but, again, we don't want to be... We want to make things more flexible. It's all about enabling flexibility for the organization. So if they want to go cloud, great. >> John: Yep. >> There's plenty of organizations that if they don't want to go cloud, that's fine, too. >> So if I get this right, standard interface on prem and cloud for the usability, under the hood it's integration points in clouds, so that data sources, whatever they are and through whatever could be Kinesis coming off Amazon-- >> Exactly! >> Into you guys, or Ah-jahs got some stuff-- >> Exactly! >> Over there, That all works under the hood. >> Exactly! >> Abstracts from the user. >> That's right! >> Okay, so the next question is, okay, to go that way, that means it's a multicloud world. You probably agree with that. Multicloud meaning, I'm a customer. I might have multiple workloads on multiple clouds. >> That's where it is today. I don't know if that's the endgame? And obviously all this is changing very, very quickly. >> Okay (laughing). >> So I mean, Informatica we're neutral across multiple vendors and everything. So-- >> You guys are Switzerland. >> We're the Switzerland (laughing), so we work with all the major cloud providers, and there's new one that we're constantly signing up also, but it's unclear how the market rule shipped out. >> Umm-hmm. >> There's just so much information out there. I think it's unlikely that you're going to see mass consolidation. We all know who the top players are, and I think that's where a lot of large enterprises are investing, but we'll see how things go in the future, too. >> Where should customers spend their focus because this you're seeing the clouds. I was just commenting about Google yesterday, with AMIT, AI, and others. That they're to be enterprise-ready. You guys are very savvy in the enterprising, there's a lot of table stakes, SLAs to integration points, and so, there's some clouds that aren't ready for prime time, like Google for the enterprise. Some are getting there fast like Amazon Ah-jahs super enterprise-friendly. They have their own problems and opportunities. But they are very strong on the enterprise. What do you guys advise customers? What are they looking at right now? Where should they be spending their time, writing more code, scripts, or tackling the data? How do you guys help them shift their focus? >> Yeah, yeah! >> And where-- >> And definitely not scripts (laughing). >> It's about the worst thing you can do because... And it's all for all the reasons we understand. >> Why is that? >> Well, again, we we're talking about being agile. There's nothing agile about manually sitting there, writing Java code. Think about all the developers that were writing MapReduce code three or four years ago (laughing). Those guys, well, they're probably looking for new jobs right now. And with the companies who built that code, they're rewriting all of it. So that approach of doing things at the lowest possible level doesn't make engineering sense. That's why the kind of abstraction layer approach makes so much better sense. So where should people be spending their time? It's really... The one thing technology cannot do is it can't substitute for context. So that's business context, understanding if you're in healthcare there's things about the healthcare industry that only that healthcare company could possibly know, and know about their data, and why certain data is structured the way it is. >> John: Yeah. >> Or financial services or retail. So business context is something that only that organization can possibly bring to the table, and organizational context, as you were alluding to before, roles and responsibilities, who should have access to data, who shouldn't have access to data, That's also something that can be prescribed from the outside. It's something that organizations have to figure out. Everything else under the hood, there's no reason whatsoever to be mired in these long code cycles. >> John: Yeah. >> And then you got to rewrite it-- >> John: Yeah. >> And you got to maintain it. >> So automation is one level. >> Yep. >> Machine learning is a nice bridge between the taking advantage of either vertical data, or especially, data for that context. >> Yep. >> But then the human has to actually synthesize it. >> Right! >> And apply it. That's the interface. Did I get that right, that progression? >> Yeah, yeah. Absolutely! And the reason machine learning is so cool... And I'm glad you segway into that. Is that, so it's all about having the machine learning assist the human, right. So the humans don't go away. We still have to have people who understand-- >> John: Okay. >> The business context and the organizational context. But what machine learning can do is in the world of big data... Inherently, the whole idea of big data is that there's too much data for any human to mentally comprehend. >> John: Yeah. >> Well, you don't have to mentally comprehend it. Let the machine learning go through, so we've got this unique machine learning technology that will actually scan all the data inside of Hadoop and outside of Hadoop, and it'll identify what the data is-- >> John: Yeah. >> Because it's all just pattern matching and correlations. And most organizations have common patterns to their data. So we figured up all this stuff, and we can say, Oh, you got credit card information here. Maybe you should go look at that, if that's not supposed to be there (laughing). Maybe there's a potential violation there? So we can focus the manual effort onto the places where it matters, so now you're looking at issues, problems, instead of doing the day-to-day stuff. The day-to-day stuff is fully automated and that's not what organizations-- >> So the guys that are losing their jobs, those Java developers writing scripts, to do the queries, where should they be focusing? Where should they look for jobs? Because I would agree with you that their jobs would be because the the MapReduce guys and all the script guys and the Java guys... Java has always been the bulldozer of the programming language, very functional. >> Murthy: Yep. >> But where those guys go? What's your advice for... We have a lot of friends, I'm sure you do, too. I know a lot of friends who are Java developers who are awesome programmers. >> Yeah. >> Where should they go? >> Well, so first, I'm not saying that Java's going to go away, obviously (laughing). But I think Java-- >> Well, I mean, Java guys who are doing some of the payload stuff around some of the deep--- >> Exactly! >> In the bowels of big data. >> That's right! Well, there's always things that are unique to the organization-- >> Yeah. >> Custom applications, so all that stuff is fine. What we're talking about is like MapReduce coding-- >> Yeah, what should they do? What should those guys be focusing on? >> So it's just like every other industry you see. You go up the value stack, right. >> John: Right. >> So if you can become more of the data governor, the data stewards, look at policy, look at how you should be thinking about organizational context-- >> John: And governance is also a good area. >> And governance, right. Governance jobs are just going to explode here because somebody has to define it, and technology can't do this. Somebody has to tell the technology what data is good, what data is bad, when do you want to get flagged if something is going wrong, when is it okay to send data through. Whoever decides and builds those rules, that's going to be a place where I think there's a lot of opportunities. >> Murthy, final question. We got to break, we're getting the hook sign here, but we got Informatica World coming up soon in May. What's going to be on the agenda? What should we expect to hear? What's some of the themes that you could tease a little bit, get people excited. >> Yeah, yeah. Well, one thing we want to really provide a lot of content around the journey to the cloud. And we've been talking today, too, there's so many organizations who are exploring the cloud, but it's not easy, for all the reasons we just talked about. Some organizations want to just kind of break away, take out, rip out everything in IT, move all their data and their applications to the cloud. Some of them are taking more of a progressive journey. So we got customers who've been on the leading front of that, so we'll be having a lot of sessions around how they've done this, best practices that they've learned. So hopefully, it's a great opportunity for both our current audience who's always looked to us for interesting insights, but also all these kind of emerging folks-- >> Right. >> Who are really trying to figure out this new world of data. >> Murthy, thanks so much for coming on The Cube. Appreciate it. Informatica World coming up. You guys have a great solution, and again, making it easier (laughing) for people to get the data and put those new processes in place. This is The Cube breaking it down for Big Data SV here in conjunction with Strata Hadoop. I'm John Furrier. More live coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, Did I get it right? Good to see you again. and the show theme has been, So kind of a checkpoint in the industry. What are the trends? over the last couple years, John: Yeah. And IT was always poo-pooing and it executes in the Hadoop cluster. so I got to ask you the question, and it's the AI, And what do you guys do? Go ahead. So, first, the collision and you don't have to and that might be an integration for you guys, not set up for and then processed. it's always the stuff around is that even the way IT could be completely dynamic. Umm-hmm. from the line of business. and people have been and all that. John: And the cloud's right there, too, So you-- So is the 12-month project model, at most of the successful organizations, and you just got to keep doing But here's the trick question for you: Now, that's the real challenge. So how do you guys So for you guys, sources that are in the cloud, the processing to a cloud that if they don't want to go cloud, That all works under the hood. Okay, so the next question I don't know if that's the endgame? So I mean, Informatica We're the Switzerland (laughing), go in the future, too. Google for the enterprise. And it's all for all the Think about all the from the outside. is a nice bridge between the has to actually synthesize it. That's the interface. So the humans don't go away. and the organizational context. Let the machine learning go through, instead of doing the day-to-day stuff. So the guys that are losing their jobs, I'm sure you do, too. going to go away, obviously (laughing). so all that stuff is fine. So it's just like every John: And governance that's going to be a place where I think What's some of the themes that you could for all the reasons we just talked about. to figure out this new world of data. get the data and put those

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2017	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Murthy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AMIT	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Twelve months	QUANTITY	0.99+
Java	TITLE	0.99+
Informatica	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
24 months	QUANTITY	0.99+
May	DATE	0.99+
tomorrow	DATE	0.99+
yesterday	DATE	0.99+
Google	ORGANIZATION	0.99+
Spark	TITLE	0.99+
first	QUANTITY	0.99+
last night	DATE	0.99+
today	DATE	0.98+
Murth	PERSON	0.98+
Informatica World	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
two	QUANTITY	0.98+
three-part	QUANTITY	0.98+
three	QUANTITY	0.98+
both	QUANTITY	0.97+
three	DATE	0.96+
NYC	LOCATION	0.96+
Big Data Week	EVENT	0.96+
one level	QUANTITY	0.96+
one	QUANTITY	0.96+
one speed	QUANTITY	0.96+
two colliding forces	QUANTITY	0.95+
one-trick	QUANTITY	0.93+
MapReduce	TITLE	0.93+
one way	QUANTITY	0.93+
four years ago	DATE	0.92+
#BigDataSV	TITLE	0.91+
Kinesis	ORGANIZATION	0.87+
The Cube	ORGANIZATION	0.86+
MapReduce	ORGANIZATION	0.85+
agile	TITLE	0.84+
Big Data	ORGANIZATION	0.81+

Ravi Dharnikota, SnapLogic & Katharine Matsumoto, eero - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (light techno music) >> Hey, welcome back everybody. Jeff Frick here with theCUBE. We're at Big Data SV, wrapping up with two days of wall-to-wall coverage of Big Data SV which is associated with Strata Comp, which is part of Big Data Week, which always becomes the epicenter of the big data world for a week here in San Jose. We're at the historic Pagoda Lounge, and we're excited to have our next two guests, talking a little bit different twist on big data that maybe you hadn't thought of. We've got Ravi Dharnikota, he is the Chief Enterprise Architect at SnapLogic, welcome. - Hello. >> Jeff: And he has brought along a customer, Katharine Matsumoto, she is a Data Scientist at eero, welcome. >> Thank you, thanks for having us. >> Jeff: Absolutely, so we had SnapLogic on a little earlier with Garavs, but tell us a little bit about eero. I've never heard of eero before, for folks that aren't familiar with the company. >> Yeah, so eero is a start-up based in San Francisco. We are sort of driven to increase home connectivity, both the performance and the ease of use, as wifi becomes totally a part of everyday life. We do that. We've created the world's first mesh wifi system. >> Okay. >> So that means you have, for an average home, three different individual units, and you plug one in to replace your router, and then the other three get plugged in throughout the home just to power, and they're able to spread coverage, reliability, speed, throughout your homes. No more buffering, dead zones, in that way back bedroom. >> Jeff: And it's a consumer product-- >> Yes. >> So you got all the fun and challenges of manufacturing, you've got the fun challenges of distribution, consumer marketing, so a lot of challenges for a start-up. But you guys are doing great. Why SnapLogic? >> Yeah, so in addition to the challenges with the hardware, we also are a really strong software. So, everything is either set up via the app. We are not just the backbone to your home's connectivity, but also part of it, so we're sending a lot of information back from our devices to be able to learn and improve the wifi that we're delivering based on the data we get back. So that's a lot of data, a lot of different teams working on different pieces. So when we were looking at launch, how do we integrate all of that information together to make it accessible to business users across different teams, and also how do we handle the scale. I made a checklist (laughs), and SnapLogic was really the only one that seemed to be able to deliver on both of those promises with a look to the future of like, I don't know what my next Sass product is, I don't know what our next API point we're going to need to hit is, sort of the flexibility of that as well as the fact that we have analysts were able to pick it up, engineers were able to pick it up, and I could still manage all the software written by, or the pipelines written by each of those different groups without having to read whatever version of code they're writing. >> Right, so Ravi, we heard you guys are like doubling your customer base every year, and lots of big names, Adobe we talked about earlier today. But I don't know that most people would think of SnapLogic really, as a solution to a start-up mesh network company. >> Yeah, absolutely, so that's a great point though, let me just start off with saying that in this new world, we don't discriminate-- (guest and host laugh) we integrate and we don't discriminate. In this new world that I speak about is social media, you know-- >> Jeff: Do you bus? (all laugh) >> So I will get to that. (all laugh) So, social, mobile, analytics, and cloud. And in this world, people have this thing which we fondly call integrators' dilemma. You want to integrate apps, you go to a different tool set. You integrate data, you start thinking about different tool sets. So we want to dispel that and really provide a unified platform for both apps and data. So remember, when we are seeing all the apps move into the cloud and being provided as services, but the data systems are also moving to the cloud. You got your data warehouses, databases, your BI systems, analytical tools, all are being provided to you as services. So, in this world data is data. If it's apps, it's probably schema mapping. If it's data systems, it's transformations moving from one end to the other. So, we're here to solve both those challenges in this new world with a unified platform. And it also helps that our lineage and the brain trust that brings us here, we did this a couple of decades ago and we're here to reinvent that space. >> Well, we expect you to bring Clayton Christensen on next time you come to visit, because he needs a new book, and I think that's a good one. (all laugh) But I think it was a really interesting part of the story though too, is you have such a dynamic product. Right, if you looked at your boxes, I've got the website pulled up, you wouldn't necessarily think of the dynamic nature that you're constantly tweaking and taking the data from the boxes to change the service that you're delivering. It's not just this thing that you made to a spec that you shipped out the door. >> Yeah, and that's really where the auto connected, we did 20 from our updates last year. We had problems with customers would have the same box for three years, and the technology change, the chips change, but their wifi service is the same, and we're constantly innovating and being able to push those out, but if you're going to do that many updates, you need a lot of feedback on the updates because things break when you update sometimes, and we've been able to build systems that catch that that are able to identify changes that say, not one person could be able to do by looking at their own things or just with support. We have leading indicators across all sorts of different stability and performance and different devices, so if Xbox changes their protocols, we can identify that really quickly. And that's sort of the goal of having all the data in one place across customer support and manufacturing. We can easily pinpoint where in the many different complicated factors you can find the problem. >> Have issues. - Yeah. >> So, I've actually got questions for both of you. Ravi, starting with you, it sounds like you're trying to tackle a challenge that in today's tools would have included Kafka at the data integration level, and there it's very much a hub and spoke approach. And I guess it's also, you would think of the application level integration more like the TIBCO and other EAI vendors in a previous generation-- - [Ravi] Yeah. >> Which I don't think was hub and spoke, it was more point to point, and I'm curious how you resolve that, in other words, how you'd tackle both together in a unified architecture? >> Yeah, that's an excellent question. In fact, one of the integrators' dilemma that I spoke about you've got the problem set where you've got the high-latency, high-volume, where you go to ETL tools. And then the low-latency, low-volume, you immediately go to the TIBCOs of the world and that's ESB, EAI sort of tool sets that you look to solve. So what we've done is we've thought about it hard. At one level we've just said, why can integration not be offered as a service? So that's step number one where the design experience is through the cloud, and then execution can just happen anywhere, behind your firewall or in the cloud, or in a big data system, so it caters to all of that. But then also, the data set itself is changing. You're seeing a lot of the document data model that are being offered by the Sass services. So the old ETL companies that were built before all of this social, mobile sort of stuff came around, it was all row and column oriented. So how do you deal with the more document oriented JSON sort of stuff? And we built that for, the platform to be able to handle that kind of data. Streaming is an interesting and important question. Pretty much everyone I spoke to last year were, streaming was a big-- let's do streaming, I want everything in real-time. But batch also has it's place. So you've got to have a system that does batch as well as real-time, or as near real-time as needed. So we solve for all of those problems. >> Okay, so Katharine, coming to you, each customer has a different, well, every consumer has a different, essentially, a stall base. To bring all the telemetry back to make sense out of what's working and what's not working, or how their environment is changing. How do you make sense out of all that, considering that it's not B to B, it's B to C so, I don't know how many customers you have, but it must be in the tens or hundreds. >> I'm sure I'm not allowed to say (laughs). >> No. But it's the distinctness of each customer that I gather makes the support challenge for you. >> Yeah, and part of that's exposing as much information to the different sources, and starting to automate the ways in which we do it. There's certainly a lot, we are very early on as a company. We've hit our year mark for public availability the end of last month so-- >> Jeff: Congratulations. >> Thank you, it's been a long year. But with that we learn more, constantly, and different people come to different views as different new questions come up. The special-snowflake aspect of each customer, there's a balance between how much actually is special and how much you can find patterns. And that's really where you get into much more interesting things on the statistics and machine learning side is how do you identify those patterns that you may not even know you're looking for. We are still beginning to understand our customers from a qualitative standpoint. It actually came up this week where I was doing an analysis and I was like, this population looks kind of weird, and with two clicks was able to send out a list over to our CX team. They had access to all the same systems because all of our data is connected and they could pull up the tickets based on, because through SnapLogic, we're joining all the data together. We use Looker as our BI tool, they were just able to start going into all the tickets and doing a deep dive, and that's being presented later this week as to like, hey, what is this population doing? >> So, for you to do this, that must mean you have at least some data that's common to every customer. For you to be able to use something like Looker, I imagine. If every customer was a distinct snowflake, it would be very hard to find patterns across them. >> Well I mean, look at how many people have iPhones, have MacBooks, you know, we are looking at a lot of aggregate-level data in terms of how things are behaving, and always the challenge of any data science project is creating those feature extractions, and so that's where the process we're going through as the analytics team is to start extracting those things and adding them to our central data source. That's one of the areas also where having very integrated analytics and ETL has been helpful as we're just feeding that information back in to everyone. So once we figure out, oh hey, this is how you differentiate small businesses from homes, because we do see a couple of small businesses using our product, that goes back into the data and now everyone's consuming it. Each of those common features, it's a slow process to create them, but it's also increases the value every time you add one to the central group. >> One last question-- >> It's an interesting way to think of the wifi service and the connected devices an integration challenge, as opposed to just an appliance that kind of works like an old POTS line, which it isn't, clearly at all. (all laugh) With 20 firmware updates a year (laughs). >> Yeah, there's another interesting point, that we were just having the discussion offline, it's that it's a start-up. They obviously don't have the resources or the app, but have a large IT department to set up these systems. So, as Katharine mentioned, one person team initially when they started, and to be able to integrate, who knows which system is going to be next. Maybe they experiment with one cloud service, it perhaps scales to their liking or not, and then they quickly change and go to another one. You cannot change the integration underneath that. You got to be able to adjust to that. So that flexibility, and the other thing is, what they've done with having their business become self-sufficient is another very fascinating thing. It's like, give them the power. Why should IT or that small team become the bottom line? Don't come to me, I'll just empower you with the right tool set and the patterns and then from there, you change and put in your business logic and be productive immediately. >> Let me drill down on that, 'cause my understanding, at least in the old world was that DTL was kind of brittle, and if you're constantly ... Part of actually, the genesis of Hadoop, certainly at Yahoo was, we're going to bring all the data we might ever possibly need into the repository so we don't have to keep re-writing the pipeline. And it sounds like you have the capability to evolve the pipeline rather quickly as you want to bring more data into this sort of central resource. Am I getting that about right? >> Yeah, it's a little bit of both. We do have a central, I think, down data's the fancy term for that, so we're bringing everything into S3, jumping it into those raw JSONs, you know, whatever nested format it comes into, so whatever makes it so that extraction is easy. Then there's also, as part of ETL, there's that last mile which is a lot of business logic, and that's where you run into teams starting to diverge very quickly if you don't have a way for them to give feedback into the process. We've really focused on empowering business users to be self-service, in terms of answering their own questions, and that's freed up our in list to add more value back into the greater group as well as answer harder questions, that both beget more questions, but also feeds back insights into that data source because they have access to their piece of that last business logic. By changing the way that one JSON field maps or combining two, they've suddenly created an entirely new variable that's accessible to everyone. So it's sort of last-leg business logic versus the full transport layer. We have a whole platform that's designed to transport everything and be much more robust to changes. >> Alright, so let me make sure I understand this, it sounds like the less-trained or more self-sufficient, they go after the central repository and then the more highly-trained and scarcer resource, they are responsible for owning one or more of the feeds and that they enrich that or make that more flexible and general-purpose so that those who are more self-sufficient can get at it in the center. >> Yeah, and also you're able to make use of the business. So we have sort of a hybrid model with our analysts that are really closely embedded into the teams, and so they have all that context that you need that if you're relying on, say, a central IT team, that you have to go back and forth of like, why are you doing this, what does this mean? They're able to do all that in logic. And then the goal of our platform team is really to focus on building technologies that complement what we have with SnapLogic or others that are accustomed to our data systems that enable that same sort of level of self-service for creating specific definitions, or are able to do it intelligently based on agreed upon patterns of extraction. >> George: Okay. >> Heavy science. Alright, well unfortunately we are out of time. I really appreciate the story, I love the site, I'll have to check out the boxes, because I know I have a bunch of dead spots in my house. (all laugh) But Ravi, I want to give you the last word, really about how is it working with a small start-up doing some cool, innovative stuff, but it's not your Adobes, it's not a lot of the huge enterprise clients that you have. What have you taken, why does that add value to SnapLogic to work with kind of a cool, fun, small start-up? >> Yeah, so the enterprise is always a retrofit job. You have to sort of go back to the SAPs and the Oracle databases and make sure that we are able to connect the legacy with a new cloud application. Whereas with a start-up, it's all new stuff. But their volumes are constantly changing, they probably have spikes, they have burst volumes, they're thinking about this differently, enabling everyone else, quickly changing and adopting newer technologies. So we have to be able to adjust to that agility along with them. So we're very excited as sort of partnering with them and going along with them on this journey. And as they start looking at other things, the machine learning and the AI and the IRT space, we're very excited to have that partnership and learn from them and evolve our platform as well. >> Clearly. You're smiling ear-to-ear, Katharine's excited, you're solving problems. So thanks again for taking a few minutes and good luck with your talk tomorrow. Alright, I'm Jeff Frick, he's George Gilbert, you're watching theCUBE from Big Data SV. We'll be back after this short break. Thanks for watching. (light techno music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE, that maybe you hadn't thought of. Jeff: And he has brought along a customer, for folks that aren't familiar with the company. We are sort of driven to increase home connectivity, and you plug one in to replace your router, So you got all the fun and challenges of manufacturing, We are not just the backbone to your home's connectivity, and lots of big names, Adobe we talked about earlier today. (guest and host laugh) but the data systems are also moving to the cloud. and taking the data from the boxes and the technology change, the chips change, - Yeah. more like the TIBCO and other EAI vendors the platform to be able to handle that kind of data. considering that it's not B to B, that I gather makes the support challenge for you. and starting to automate the ways in which we do it. and how much you can find patterns. that must mean you have at least some data as the analytics team is to start and the connected devices an integration challenge, and then they quickly change and go to another one. into the repository so we don't have to keep and that's where you run into teams of the feeds and that they enrich that and so they have all that context that you need it's not a lot of the huge enterprise clients that you have. and the Oracle databases and make sure and good luck with your talk tomorrow.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Katharine Matsumoto	PERSON	0.99+
Jeff	PERSON	0.99+
Ravi Dharnikota	PERSON	0.99+
Katharine	PERSON	0.99+
George Gilbert	PERSON	0.99+
Adobe	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
George	PERSON	0.99+
San Jose	LOCATION	0.99+
San Francisco	LOCATION	0.99+
tens	QUANTITY	0.99+
last year	DATE	0.99+
three years	QUANTITY	0.99+
Clayton Christensen	PERSON	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Ravi	PERSON	0.99+
San Jose, California	LOCATION	0.99+
SnapLogic	ORGANIZATION	0.99+
iPhones	COMMERCIAL_ITEM	0.99+
Kafka	TITLE	0.99+
two days	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
two	QUANTITY	0.99+
tomorrow	DATE	0.99+
two clicks	QUANTITY	0.99+
TIBCO	ORGANIZATION	0.99+
both	QUANTITY	0.99+
each customer	QUANTITY	0.99+
Xbox	COMMERCIAL_ITEM	0.99+
Big Data Week	EVENT	0.99+
Oracle	ORGANIZATION	0.99+
One last question	QUANTITY	0.98+
eero	ORGANIZATION	0.98+
Pagoda Lounge	LOCATION	0.98+
20 firmware updates	QUANTITY	0.98+
Adobes	ORGANIZATION	0.98+
this week	DATE	0.98+
S3	TITLE	0.98+
Strata Comp	ORGANIZATION	0.98+
MacBooks	COMMERCIAL_ITEM	0.98+
Each	QUANTITY	0.97+
three	QUANTITY	0.97+
each	QUANTITY	0.97+
one person	QUANTITY	0.96+
JSON	TITLE	0.96+
two guests	QUANTITY	0.95+
today	DATE	0.95+
three different individual units	QUANTITY	0.95+
later this week	DATE	0.95+
a week	QUANTITY	0.94+
#BigDataSV	TITLE	0.93+
earlier today	DATE	0.92+
one level	QUANTITY	0.92+
couple of decades ago	DATE	0.9+
CX	ORGANIZATION	0.9+
theCUBE	ORGANIZATION	0.9+
SnapLogic	TITLE	0.87+
end	DATE	0.87+
first mesh	QUANTITY	0.87+
one person team	QUANTITY	0.87+
Sass	TITLE	0.86+
one cloud	QUANTITY	0.84+
Big Data SV	TITLE	0.84+
last month	DATE	0.83+
one place	QUANTITY	0.83+
Big Data Silicon Valley 2017	EVENT	0.82+

Holden Karau, IBM Big Data SV 17 #BigDataSV #theCUBE

>> Announcer: Big Data Silicon Valley 2017. >> Hey, welcome back, everybody, Jeff Frick here with The Cube. We are live at the historic Pagoda Lounge in San Jose for Big Data SV, which is associated with Strathead Dupe World, across the street, as well as Big Data week, so everything big data is happening in San Jose, we're happy to be here, love the new venue, if you're around, stop by, back of the Fairmount, Pagoda Lounge. We're excited to be joined in this next segment by, who's now become a regular, any time we're at a Big Data event, a Spark event, Holden always stops by. Holden Karau, she's the principal software engineer at IBM. Holden, great to see you. >> Thank you, it's wonderful to be back yet again. >> Absolutely, so the big data meme just keeps rolling, Google Cloud Next was last week, a lot of talk about AI and ML and of course you're very involved in Spark, so what are you excited about these days? What are you, I'm sure you've got a couple presentations going on across the street. >> Yeah, so my two presentations this week, oh wow, I should remember them. So the one that I'm doing today is with my co-worker Seth Hendrickson, also at IBM, and we're going to be focused on how to use structured streaming for machine learning. And sort of, I think that's really interesting, because streaming machine learning is something a lot of people seem to want to do but aren't yet doing in production, so it's always fun to talk to people before they've built their systems. And then tomorrow I'm going to be talking with Joey on how to debug Spark, which is something that I, you know, a lot of people ask questions about, but I tend to not talk about, because it tends to scare people away, and so I try to keep the happy going. >> Jeff: Bugs are never fun. >> No, no, never fun. >> Just picking up on that structured streaming and machine learning, so there's this issue of, as we move more and more towards the industrial internet of things, like having to process events as they come in, make a decision. How, there's a range of latency that's required. Where does structured streaming and ML fit today, and where might that go? >> So structured streaming for today, latency wise, is probably not something I would use for something like that right now. It's in the like sub second range. Which is nice, but it's not what you want for like live serving of decisions for your car, right? That's just not going to be feasible. But I think it certainly has the potential to get a lot faster. We've seen a lot of renewed interest in ML liblocal, which is really about making it so that we can take the models that we've trained in Spark and really push them out to the edge and sort of serve them in the edge, and apply our models on end devices. So I'm really excited about where that's going. To be fair, part of my excitement is someone else is doing that work, so I'm very excited that they're doing this work for me. >> Let me clarify on that, just to make sure I understand. So there's a lot of overhead in Spark, because it runs on a cluster, because you have an optimizer, because you have the high availability or the resilience, and so you're saying we can preserve the predict and maybe serve part and carve out all the other overhead for running in a very small environment. >> Right, yeah. So I think for a lot of these IOT devices and stuff like that it actually makes a lot more sense to do the predictions on the device itself, right. These models generally are megabytes in size, and we don't need a cluster to do predictions on these models, right. We really need the cluster to train them, but I think for a lot of cases, pushing the prediction out to the edge node is actually a pretty reasonable use case. And so I'm really excited that we've got some work going on there. >> Taking that one step further, we've talked to a bunch of people, both like at GE, and at their Minds and Machines show, and IBM's Genius of Things, where you want to be able to train the models up in the cloud where you're getting data from all the different devices and then push the retrained model out to the edge. Can that happen in Spark, or do we have to have something else orchestrating all that? >> So actually pushing the model out isn't something that I would do in Spark itself, I think that's better served by other tools. Spark is not really well suited to large amounts of internet traffic, right. But it's really well suited to the training, and I think with ML liblocal it'll essentially, we'll be able to provide both sides of it, and the copy part will be left up to whoever it is that's doing their work, right, because like if you're copying over a cell network you need to do something very different as if you're broadcasting over a terrestrial XM or something like that, you need to do something very different for satellite. >> If you're at the edge on a device, would you be actually running, like you were saying earlier, structured streaming, with the prediction? >> Right, I don't think you would use structured streaming per se on the edge device, but essentially there would be a lot of code share between structured streaming and the code that you'd be using on the edge device. And it's being vectored out now so that we can have this code sharing and Spark machine learning. And you would use structured streaming maybe on the training side, and then on the serving side you would use your custom local code. >> Okay, so tell us a little more about Spark ML today and how we can democratize machine learning, you know, for a bigger audience. >> Right, I think machine learning is great, but right now you really need a strong statistical background to really be able to apply it effectively. And we probably can't get rid of that for all problems, but I think for a lot of problems, doing things like hyperparameter tuning can actually give really powerful tools to just like regular engineering folks who, they're smart, but maybe they don't have a strong machine learning background. And Spark's ML pipelines make it really easy to sort of construct multiple stages, and then just be like, okay, I don't know what these parameters should be, I want you to do a search over what these different parameters could be for me, and it makes it really easy to do this as just a regular engineer with less of an ML background. >> Would that be like, just for those of us who are, who don't know what hyperparameter tuning is, that would be the knobs, the variables? >> Yeah, it's going to spin the knobs on like our regularization parameter on like our regression, and it can also spin some knobs on maybe the engram sizes that we're using on the inputs to something else, right. And it can compare how these knobs sort of interact with each other, because often you can tune one knob but you actually have six different knobs that you want to tune and you don't know, if you just explore each one individually, you're not going to find the best setting for them working together. >> So this would make it easier for, as you're saying, someone who's not a data scientist to set up a pipeline that lets you predict. >> I think so, very much. I think it does a lot of the, brings a lot of the benefits from sort of the SciPy world to the big data world. And SciPy is really wonderful about making machine learning really accessible, but it's just not ready for big data, and I think this does a good job of bringing these same concepts, if not the code, but the same concepts, to big data. >> The SciPy, if I understand, is it a notebook that would run essentially on one machine? >> SciPy can be put in a notebook environment, and generally it would run on, yeah, a single machine. >> And so to make that sit on Spark means that you could then run it on a cluster-- >> So this isn't actually taking SciPy and distributing it, this is just like stealing the good concepts from SciPy and making them available for big data people. Because SciPy's done a really good job of making a very intuitive machine learning interface. >> So just to put a fine sort of qualifier on one thing, if you're doing the internet of things and you have Spark at the edge and you're running the model there, it's the programming model, so structured streaming is one way of programming Spark, but if you don't have structured streaming at the edge, would you just be using the core batch Spark programming model? >> So at the edge you'd just be using, you wouldn't even be using batch, right, because you're trying to predict individual events, right, so you'd just be calling predict with every new event that you're getting in. And you might have a q mechanism of some type. But essentially if we had this batch, we would be adding additional latency, and I think at the edge we really, the reason we're moving the models to the edge is to avoid the latency. >> So just to be clear then, is the programming model, so it wouldn't be structured streaming, and we're taking out all the overhead that forced us to use batch with Spark. So the reason I'm trying to clarify is a lot of people had this question for a long time, which is are we going to have a different programming model at the edge from what we have at the center? >> Yeah, that's a great question. And I don't think the answer is finished yet, but I think the work is being done to try and make it look the same. Of course, you know, trying to make it look the same, this is Boosh, it's not like actually barking at us right now, even though she looks like a dog, she is, there will always be things which are a little bit different from the edge to your cluster, but I think Spark has done a really good job of making things look very similar on single node cases to multi node cases, and I think we can probably bring the same things to ML. >> Okay, so it's almost time, we're coming back, Spark took us from single machine to cluster, and now we have to essentially bring it back for an edge device that's really light weight. >> Yeah, I think at the end of the day, just from a latency point of view, that's what we have to do for serving. For some models, not for everyone. Like if you're building a website with a recommendation system, you don't need to serve that model like on the edge node, that's fine, but like if you've got a car device we can't depend on cell latency, right, you have to serve that in car. >> So what are some of the things, some of the other things that IBM is contributing to the ecosystem that you see having a big impact over the next couple years? >> So there's a lot of really exciting things coming out of IBM. And I'm obviously pretty biased. I spend a lot of time focused on Python support in Spark, and one of the most exciting things is coming from my co-worker Brian, I'm not going to say his last name in case I get it wrong, but Brian is amazing, and he's been working on integrating Arrow with Spark, and this can make it so that it's going to be a lot easier to sort of interoperate between JVM languages and Python and R, so I'm really optimistic about the sort of Python and R interfaces improving a lot in Spark and getting a lot faster as well. And we're also, in addition to the Arrow work, we've got some work around making it a lot easier for people in R and Python to get started. The R stuff is mostly actually the Microsoft people, thanks Felix, you're awesome. I don't actually know which camera I should have done that to but that's okay. >> I think you got it! >> But Felix is amazing, and the other people working on R are too. But I think we've both been pursuing sort of making it so that people who are in the R or Python spaces can just use like Pit Install, Conda Install, or whatever tool it is they're used to working with, to just bring Spark into their machine really easily, just like they would sort of any other software package that they're using. Because right now, for someone getting started in Spark, if you're in the Java space it's pretty easy, but if you're in R or Python you have to do sort of a lot of weird setup work, and it's worth it, but like if we can get rid of that friction, I think we can get a lot more people in these communities using Spark. >> Let me see, just as a scenario, the R server is getting fairly well integrated into Sequel server, so would it be, would you be able to use R as the language with a Spark execution engine to somehow integrate it into Sequel server as an execution engine for doing the machine learning and predicting? >> You definitely, well I shouldn't say definitely, you probably could do that. I don't necessarily know if that's a good idea, but that's the kind of stuff that this would enable, right, it'll make it so that people that are making tools in R or Python can just use Spark as another library, right, and it doesn't have to be this really special setup. It can just be this library and they point out the cluster and they can do whatever work it wants to do. That being said, the Sequel server R integration, if you find yourself using that to do like distributed computing, you should probably take a step back and like rethink what you're doing. >> George: Because it's not really scale out. >> It's not really set up for that. And you might be better off doing this with like, connecting your Spark cluster to your Sequel server instance using like JDBC or a special driver and doing it that way, but you definitely could do it in another inverted sort of way. >> So last question from me, if you look out a couple years, how will we make machine learning accessible to a bigger and bigger audience? And I know you touched on the tuning of the knobs, hyperparameter tuning, what will it look like ultimately? >> I think ML pipelines are probably what things are going to end up looking like. But I think the other part that we'll sort of see is we'll see a lot more examples of how to work with certain kinds of data, because right now, like, I know what I need to do when I'm ingesting some textural data, but I know that because I spent like a week trying to figure out what the hell I was doing once, right. And I didn't bother to write it down. And it looks like no one else bothered to write it down. So really I think we'll see a lot of tools that look very similar to the tools we have today, they'll have more options and they'll be a bit easier to use, but I think the main thing that we're really lacking right now is good documentation and sort of good books and just good resources for people to figure out how to use these tools. Now of course, I mean, I'm biased, because I work on these tools, so I'm like, yeah, they're pretty great. So there might be other people who are like, Holden, no, you're wrong, we need to rethink everything. But I think this is, we can go very far with the pipeline concept. >> And then that's good, right? The democratization of these things opens it up to more people, you get more creative people solving more different problems, that makes the whole thing go. >> You can like install Spark easily, you can, you know, set up an ML pipeline, you can train your model, you can start doing predictions, you can, people that haven't been able to do machine learning at scale can get started super easily, and build a recommendation system for their small little online shop and be like, hey, you bought this, you might also want to buy Boosh, he's really cute, but you can't have this one. No no no, not this one. >> Such a tease! >> Holden: I'm sorry, I'm sorry. >> Well Holden, that will, we'll say goodbye for now, I'm sure we will see you in June in San Francisco at the Spark Summit, and look forward to the update. >> Holden: I look forward to chatting with you then. >> Absolutely, and break a leg this afternoon at your presentation. >> Holden: Thank you. >> She's Holden Karau, I'm Jeff Frick, he's George Gilbert, you're watching The Cube, we're at Big Data SV, thanks for watching. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

Announcer: Big Data We're excited to be joined to be back yet again. so what are you excited about these days? but I tend to not talk about, like having to process and really push them out to the edge and carve out all the other overhead We really need the cluster to train them, model out to the edge. and the copy part will be left up to and then on the serving side you would use you know, for a bigger audience. and it makes it really easy to do this that you want to tune and you don't know, that lets you predict. but the same concepts, to big data. and generally it would run the good concepts from SciPy the models to the edge So just to be clear then, from the edge to your cluster, machine to cluster, like on the edge node, that's fine, R and Python to get started. and the other people working on R are too. but that's the kind of stuff not really scale out. to your Sequel server instance and they'll be a bit easier to use, that makes the whole thing go. and be like, hey, you bought this, look forward to the update. to chatting with you then. Absolutely, and break you're watching The Cube,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Holden Karau	PERSON	0.99+
Holden	PERSON	0.99+
Felix	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Joey	PERSON	0.99+
Jeff	PERSON	0.99+
IBM	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Seth Hendrickson	PERSON	0.99+
Spark	TITLE	0.99+
Python	TITLE	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
San Francisco	LOCATION	0.99+
June	DATE	0.99+
six different knobs	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
Boosh	PERSON	0.99+
Pagoda Lounge	LOCATION	0.99+
one knob	QUANTITY	0.99+
both sides	QUANTITY	0.99+
two presentations	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
The Cube	ORGANIZATION	0.98+
Java	TITLE	0.98+
both	QUANTITY	0.97+
one thing	QUANTITY	0.96+
one	QUANTITY	0.96+
Big Data week	EVENT	0.96+
single machine	QUANTITY	0.95+
R	TITLE	0.95+
SciPy	TITLE	0.95+
Big Data	EVENT	0.95+
single machine	QUANTITY	0.95+
each one	QUANTITY	0.94+
JDBC	TITLE	0.93+
Spark ML	TITLE	0.89+
JVM	TITLE	0.89+
The Cube	TITLE	0.88+
single	QUANTITY	0.88+
Sequel	TITLE	0.87+
Big Data Silicon Valley 2017	EVENT	0.86+
Spark Summit	LOCATION	0.86+
one machine	QUANTITY	0.86+
a week	QUANTITY	0.84+
Fairmount	LOCATION	0.83+
liblocal	TITLE	0.83+

Gaurav Dhillon | Big Data SV 17

>> Hey, welcome back everybody. Jeff Rick here with the Cube. We are live in downtown San Jose at the historic Pagoda Lounge, part of Big Data SV, which is part of Strata + Hadoop Conference, which is part of Big Data Week because everything big data is pretty much in San Jose this week. So we're excited to be here. We're here with George Gilbert, our big data analyst from Wikibon, and a great guest, Gaurav Dhillon, Chairman and CEO of SnapLogic. Gaurav, great to see you. >> Pleasure to be here, Jeff. Thank you for having me. George, good to see you. >> You guys have been very busy since we last saw you about a year ago. >> We have. We had a pretty epic year. >> Yeah, give us an update, funding, and customers, and you guys have a little momentum. >> It's a good thing. It's a good thing, you know. A friend and a real mentor to us, Dan Wormenhoven, the Founder and CEO of NetApp for a very long time, longtime CEO of NetApp, he always likes to joke that growth cures all startup problems. And you know what, that's the truth. >> Jeff: Yes. >> So we had a scorching year, you know. 2016 was a year of continuing to strengthen our products, getting a bunch more customers. We got about 300 new customers. >> Jeff: 300 new customers? >> Yes, and as you know, we don't sell to small business. We sell to the enterprise. >> Right, right. >> So, this is the who's who of pharmaceuticals, continued strength in high-tech, continued strength in retail. You know, all the way from Subway Sandwich to folks like AstraZeneca and Amgen and Bristol-Myers Squibb. >> Right. >> So, some phenomenal growth for the company. But, you know, we look at it very simply. We want to double our company every year. We want to do it in a responsible way. In other words, we are growing our business in such a way that we can sail over to cash flow break-even at anytime. So responsibly doubling your business is a wonderful thing. >> So when you look at it, obviously, you guys are executing, you've got good products, people are buying. But what are some of the macro-trends that you're seeing talking to all these customers that are really helping push you guys along? >> Right, right. So what we see is, and it used to be the majority of our business. It's now getting to be 50/50. But still I would say, historically, the primary driver for 2016 of our business was a digital transformation at a boardroom level causing a rethinking of the appscape and people bringing in cloud applications like Workday. So, one of the big drivers of our growth is helping fit Workday into the new fabric in many enterprises: Vassar College, into Capital One, into finance and various other sectors. Where people bring in Workday, they want to make that work with what they have and what they're going to buy in the future, whether it's more applications or new types of data strategies. And that is the primary driver for growth. In the past, it was probably a secondary driver, this new world of data warehousing. We like to think of it as a post-modern era in the use of data and the use of analytics. But this year, it's trending to be probably 50/50 between apps and data. And that is a shift towards people deploying in the same way that they moved from on-premise apps to SAS apps, a move towards looking at data platforms in the cloud for all the benefits of racking and stacking and having the capability rather than being in the air-conditioning, HVAC, and power consumption business. And that has been phenomenal. We've seen great growth with some of the work from Microsoft Azure with the Insights products, AWS's Redshift is a fantastic growth area for us. And these sorts of technologies, we think are going to be of significant impact to the everyday, the work clothing types of analytics. Maybe the more exotic stuff will stay on prem, but a lot of the regular business-like stuff, you know, stuff in suits and ties is moving into the cloud at a rapid pace. >> And we just came off the Google Next show last week. And Google really is helping continue to push kind of ML and AI out front. And so, maybe it's not the blue suit analytics. >> Gaurav: Indeed, yes. >> But it does drive expectations. And you know, the expectations of what we can get, what we should get, what we should be moving towards is rapidly changing. >> Rapidly changing, for example, we saw at The New York Times, which as many of Google's flagship enterprise customers are media-related. >> Jeff: Right. >> No accident, they're so proficient themselves being in the consumer internet space. So as we encountered in places like The New York Times, is there's a shift away from a legacy data warehouse, which people like me and others in the last century, back in my time in Informatica, might have sold them towards a cloud-first strategy of using, in their case, Google products, Bigtable, et cetera. And also, they're doing that because they aspirationally want to get at consumer prices without having to have a campus and the expense of Google's big brain. They want to benefit from some of those things like TensorFlow, et cetera, through the machine learning and other developer capabilities that are now coming along with that in the cloud. And by the way, Microsoft has amazing machine learning capability in its Azure for Microsoft Research as well. >> So Gaurav, it's interesting to hear sort of the two drivers. We know PeopleSoft took off starting with HR first and then would add on financials and stumble a little bit with manufacturing. So, when someone wants to bring in Workday, is it purely an efficiency value prop? And then, how are you helping them tie into the existing fabric of applications? >> Look, I think you have to ask Dave or Aneel or ask them together more about that dynamic. What I know, as a friend of the firm and as somebody we collaborate with, and, you know, this is an interesting statistic, 20 percent of Workday's financial customers are using SnapLogic, 20 percent. Now, it's a nascent business for them and you and I were around in the last century of ERP. We saw the evolution of functional winners. Some made it into suites and some didn't. Siebel never did. PeopleSoft at least made a significant impact on a variety of other things. Yes, there was Bonn and other things that prevented their domination of manufacturing and, of course, the small company in Walldorf did a very good job on it too. But that said, what we find is it's very typical, in a sense, how people using TIBCO and Informatica in the last century are looking at SnapLogic. And it's no accident because we saw Workdays go to market motion, and in a sense, are following, trying to do the same thing Dave and Aneel have done, but we're trying to do the same thing, being a bunch of ex-Informatica guys. So here's what it is. When you look at your legacy installation, and you want to modernize it, what are your choices? You can do a big old upgrade because it's on-premise software. Or you can say, "You know what? "For 20% more, I could just get the new thing." And guess what? A lot of people want to get the new thing. And that's what you're going to see all the time. And that's what's happening with companies like SnapLogic and Workday is, you know, someone. Right here locally, Adobe, it's an icon in technology and certainly in San Jose that logo is very big. A few years ago, they decided to make the jump from legacy middleware, TIBCO, Informatica, WebMethods, and they've replaced everything globally with SnapLogic. So in that same way, instead of trying to upgrade this version and that version and what about what we do in Japan, what do we do in Sweden, why don't you just find a platform as a service that lets you elevate your success and go towards a better product, more of a self-service better UX, millennial-friendly type of product? So that's what's happening out there. >> But even that three-letter company from Walldorf was on-stage last week. You can now get SAP on the Google Cloud Platform which I thought was pretty amazing. And the other piece I just love but there's still a few doubters out there on the SAS platform is now there's a really visual representation. >> Gaurav: There is. >> Of the dominance of that style going up in downtown San Francisco. It's 60 stories high, and it's taken over the landscape. So if there's ever any a doubt of enterprise adaptation of SAS, and if anything, I would wonder if kind of the proliferation of apps now within the SAS environment inside the enterprise starts to become a problem in and of its own self. Because now you have so many different apps that you're working on and working. God help if the internet goes down, right? >> It's true, and you know, and how do you make e pluribus unim, out of many one, right? So it's hilarious. It is almost at proliferation at this point. You know, our CFO tapped me the other day. He said, "Hey, you've got to check this out." "They're using a SAS application which they got "from a law firm to track stock options "inside the company." I'm like, "Wow, that is a job title and a vertical." So only high growth private venture backed companies need this, and typically it's high tech. And you have very capable SAS, even in the small grid squares in the enterprise. >> Jeff: Right, right. >> So, a sign, and I think that's probably another way to think about the work that we do at SnapLogic and others. >> Jeff: Right, right. >> Other people in the marketplace like us. What we do essentially is we give you the ERP of one. Because if you could choose things that make sense for you and they could work together in a very good way to give you very good fabric for your purposes, you've essentially bought a bespoke suit at rack prices. Right? Without that nine times multiplier of the last century of having to have just consultants without end, darkened the sky with consultants to make that happen. You know? So that, yes, SAS proliferation is happening. That is the opportunity, also the problem. For us, it's an opportunity where that glass is half-full we come in with SnapLogic and knit it together for you to give you fabric back. And people love that because the businesses can buy what they want, and the enterprise gets a comprehensive solution. >> Jeff: Right, right. >> Well, at the risk of taking a very short tangent, that comment about darkening the skies, if I recall, was the battle of the Persians threatening the 300 Greeks at the battle of Thermopylae. >> Gaurav: Yes. >> And they said, "We'll darken the skies with our arrows." And so the Greek. >> Gaurav: Come and get 'em. >> No, no. >> The famous line was, he said, "Give us your weapons." And the guy says, "Come and get 'em." (laughs) >> We got to that point, the Greek general says, "Well, we'll fight in the shade." (all laughing) But I wanted to ask you. >> This is the movie 300 as well, right? >> Yes. >> The famous line is, "Give us your weapons." He said, "Come and get 'em." (all laughing) >> But I'm thinking also of the use case where a customer brings in Workday and you help essentially instrument it so it can be a good citizen. So what does that make, or connect it so it can be a good citizen. How much easier does that mean or does that make fitting in other SAS apps or any other app into the fabric, application fabric? >> Right, right. Look, George. As you and I know, we both had some wonderful runs in the last century, and here we are doing version 2.0 in many ways, again, very similar to the Workday management. The enterprise is hip to the fact that there is a Switzerland nature to making things work together. So they want amazing products like Workday. They want amazing products like the SAP Cloud Suite, now with Concur, SuccessFactors in there. Some very cool things happening in the analytics world which you'll see at Sapphire and so on. So some very, very capable products coming from, I mean, Oracle's bought 80 SAS companies or 87 SAS companies. And so, what you're seeing is the enterprise understands that there's going to be red versus blue and a couple other stripes and colors and that they want their businesspeople to buy whatever works for them. But they want to make them work together. All right? So there is a natural sort of geographic or structural nature to this business where there is a need for Switzerland and there is a need for amazing technology, some of which can only come from large companies with big balance sheets and vertical understanding and a legacy of success. But if a customer like an AstraZeneca where you have a CIO like Dave Smoley who transformed Flextronics, is now doing the same thing at AstraZeneca bringing cloud apps, is able to use companies like SnapLogic and then deploy Workday appropriately, SAP appropriately, have his own custom development, some domestic, some overseas, all over the world, then you've got the ability again to get something very custom, and you can do that at a fraction of the cost of overconsulting or darkening the skies in the way that things were done in the last century. >> So, then tell us about maybe the convergence of the new age data warehousing, the data science pipeline, and then this bespoke collection of applications, not bespoke the way Oracle tried it 20 years ago where you had to upgrade every app tied into every other app on prem, but perhaps the integration, more from many to one because they're in the cloud. There's only one version of each. How do you tie those two worlds together? >> You know, it's like that old bromide, "Know when to hold 'em. "Know when to fold them." There is a tendency when programming becomes more approachable, you have more millennials who are able to pick up technology in a way. I mean, it's astounding what my children can do. So what you want to do is as a enterprise, you want to very carefully build those things that you want to build, make sure you don't overbuild. Or, say, if you have a development capability, then every problem looks like a development nail and you have a hammer called development. "Let's hire more Java programmers." That's not the answer. Conversely, you don't want to lose sight of the fact that to really be successful in this millennium, you have to have a core competence around technology. So you want to carefully assemble and build your capability. Now, nobody should ever outsource management. That's a bad idea. (chuckles) But what you want to do is you want to think about those things that you want to buy as a package. Is that a core competence? So, there are excellent products for finance, for human capital management, for travel expense management. Coupa just announced today their for managing your spend. Some of the work at Ariba, now the Ariba Cloud at SAP, are excellent products to help you do certain job titles really well. So you really shouldn't be building those things. But what you should be doing is doing the right element of build and buy. So now, what does that mean for the world of analytics? In my view, people building data platforms or using a lot of open source and a lot of DevOps labor and virtualization engineering and all that stuff may be less valuable over time because where the puck is going is where a lot of people should skate to is there is a nature of developing certain machine language and certain kind of AI capabilities that I think are going to be transformational for almost every industry. It is hard to imagine anything in a more mechanized back office, moving paper, manufacturing, that cannot go through a quantum of improvement through AI. There are obviously moral and certain humanity dystopia issues around that to be dealt with. But what people should be doing is I think building out the AI capabilities because those are very custom to that business. Those have to do with the business's core competence, its milieu of markets and competitors. But there should be, in a sense, stroking a purchase order in the direction of a SAS provider, a cloud data provider like Microsoft Azure or Redshift, and shrinking down their lift-and-shift bill and their data center bill by doing that. >> It's fascinating how long it took enterprises to figure out that. Just like they've been leveraging ADP for God knows how many years, you know, there's a lot of other SAS applications you can use to do your non-differentiated heavy lifting, but they're clearly all in now. So Gaurav, we're running low on time. I just want to say, when we get you here next year, what's top of your plate? What's top of priorities for 2017? Cause obviously you guys are knocking down things left and right. >> Thank you, Jeff. Look, priority for us is growth. We're a growth company. We grow responsibly. We've seen a return to quality on the part of investors, on the part of public and private investors. And you know, you'll see us continue to sort of go at that growth opportunity in a manner consistent with our core values of building product with incredible success. 99% of our customers are new to our products last quarter. >> Jeff: Ninety-nine percent? >> Yes sir. >> That says it all. >> And in the world of enterprise software where there's a lot of snake oil, I'm proud to say that we are building new product with old-fashioned values, and that's what you see from us. >> Well 99% customer retention, you can't beat that. >> Gaurav: Hard to beat! There's no way but down from there, right? (laughing) >> Exactly. Alright Gaurav, well, thanks. >> Pleasure. >> For taking a few minutes out of your busy day. >> Thank you, Jeff. >> And I really appreciate the time. >> Thank you, Jeff, thank you, George. >> Alright, he's George Gilbert. I'm Jeff Rick. You're watching the Cube from the historic Pagoda Lounge in downtown San Jose. Thanks for watching.

Published Date : Mar 15 2017

SUMMARY :

at the historic Pagoda Thank you for having me. since we last saw you about a year ago. We had a pretty epic year. and customers, and you guys the Founder and CEO of So we had a scorching year, you know. Yes, and as you know, we You know, all the way from Subway Sandwich growth for the company. So when you look at it, And that is the primary driver for growth. the blue suit analytics. And you know, the expectations of Google's flagship enterprise customers and the expense of Google's big brain. sort of the two drivers. What I know, as a friend of the firm And the other piece I just love if kind of the proliferation of apps now even in the small grid that we do at SnapLogic and others. and the enterprise gets at the battle of Thermopylae. And so the Greek. And the guy says, "Come and get 'em." the Greek general says, "Give us your weapons." and you help essentially instrument it a fraction of the cost of the new age data warehousing, of the fact that to really be successful we get you here next year, And you know, you'll see us continue And in the world of enterprise software retention, you can't beat that. Alright Gaurav, well, thanks. out of your busy day. the historic Pagoda Lounge

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Dave Smoley	PERSON	0.99+
Dan Wormenhoven	PERSON	0.99+
Jeff	PERSON	0.99+
Dave	PERSON	0.99+
Gaurav Dhillon	PERSON	0.99+
George	PERSON	0.99+
2017	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
AstraZeneca	ORGANIZATION	0.99+
Jeff Rick	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amgen	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Ariba	ORGANIZATION	0.99+
PeopleSoft	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
Gaurav	PERSON	0.99+
San Jose	LOCATION	0.99+
Vassar College	ORGANIZATION	0.99+
2016	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Sweden	LOCATION	0.99+
20%	QUANTITY	0.99+
20 percent	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
Walldorf	LOCATION	0.99+
80	QUANTITY	0.99+
Aneel	PERSON	0.99+
SnapLogic	ORGANIZATION	0.99+
TIBCO	ORGANIZATION	0.99+
87	QUANTITY	0.99+
next year	DATE	0.99+
Informatica	ORGANIZATION	0.99+
300 new customers	QUANTITY	0.99+
last week	DATE	0.99+
Bristol-Myers Squibb	ORGANIZATION	0.99+
60 stories	QUANTITY	0.99+
Ninety-nine percent	QUANTITY	0.99+
Adobe	ORGANIZATION	0.99+
Switzerland	LOCATION	0.99+
last century	DATE	0.99+
Wikibon	ORGANIZATION	0.99+
SAP	ORGANIZATION	0.99+
Coupa	ORGANIZATION	0.98+
two drivers	QUANTITY	0.98+
WebMethods	ORGANIZATION	0.98+
two worlds	QUANTITY	0.98+
Flextronics	ORGANIZATION	0.98+
Sapphire	ORGANIZATION	0.98+
SAP Cloud Suite	TITLE	0.98+
this year	DATE	0.98+

Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE

>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.

Published Date : Mar 15 2017

SUMMARY :

it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Jeff Rick	PERSON	0.99+
George	PERSON	0.99+
Jeff	PERSON	0.99+
Fred Rice	PERSON	0.99+
Mike Dusenberry	PERSON	0.99+
IBM	ORGANIZATION	0.99+
2017	DATE	0.99+
San Francisco	LOCATION	0.99+
John Furrier	PERSON	0.99+
San Jose	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
505 Howard Street	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Frederick Reiss	PERSON	0.99+
Spark Technology Center	ORGANIZATION	0.99+
Fred	PERSON	0.99+
IBM Spark Technology Center	ORGANIZATION	0.99+
one	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
Spark 2.2	TITLE	0.99+
three hours	QUANTITY	0.99+
Watson	ORGANIZATION	0.99+
UC Berkeley	ORGANIZATION	0.99+
one server	QUANTITY	0.99+
Spark	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Python	TITLE	0.99+
each server	QUANTITY	0.99+
both	QUANTITY	0.99+
each	QUANTITY	0.99+
each database	QUANTITY	0.98+
Big Data Week	EVENT	0.98+
Pagoda Lounge	LOCATION	0.98+
Strata Hadoob Conference	EVENT	0.98+
System ML	TITLE	0.98+
Big Data SV	EVENT	0.97+
each API	QUANTITY	0.97+
ML Live	TITLE	0.96+
today	DATE	0.96+
Thomas Vehicles	ORGANIZATION	0.96+
Apache System ML	TITLE	0.95+
Big Data	EVENT	0.95+
Apache Spark	TITLE	0.94+
Watson for Marketing	ORGANIZATION	0.94+
Sparking Water	TITLE	0.94+
first	QUANTITY	0.94+
one front	QUANTITY	0.94+
Big Data SV 2016	EVENT	0.94+
IBM Spark Technology Center	ORGANIZATION	0.94+
about 25 designers	QUANTITY	0.93+

Oliver Chiu, IBM & Wei Wang, Hortonworks | BigData SV 2017

>> Narrator: Live from San Jose, California It's the CUBE, covering Big Data Silicon Valley 2017. >> Okay welcome back everyone, live in Silicon Valley, this is the CUBE coverage of Big Data Week, Big Data Silicon Valley, our event, in conjunction with Strata Hadoop. This is the CUBE for two days of wall-to-wall coverage. I'm John Furrier with Analyst from Wikibon, George Gilbert our Big Data as well as Peter Buress, covering all of the angles. And our next guest is Wei Wang, Senior Director of Product Market at Hortonworks, a CUBE alumni, and Oliver Chiu, Senior Product Marketing Manager for Big Data and Microsoft Cloud at Azure. Guys, welcome to the CUBE, good to see you again. >> Yes. >> John: On the CUBE, appreciate you coming on. >> Thank you very much. >> So Microsoft and Hortonworks, you guys are no strangers. We have covered you guys many times on the CUBE, on HD insights. You have some stuff happening, here, and I was just talking about you guys this morning on another segment, like, saying hey, you know the distros need a Cloud strategy. So you have something happening tomorrow. Blog post going out. >> Wei: Yep. >> What's the news with Microsoft? >> So essentially I think that we are truly adopting the CloudFirst. And you know that we have been really acquiring a lot of customers in the Cloud. We have that announced in our earnings that more than a quarter of our customers actually already have a Cloud strategy. I want to give out a few statistics that Gardner told us actually last year. The increase for their end users went up 57% just to talk about Hadoop and Microsoft Azure. So what we're here, is to talk about the next generation. We're putting our latest and greatest innovation in which comes in in the package of the release of HDP2.6, that's our last release. I think our last conversation was on 2.5. So 2.6's great latest and newest innovations to put on CloudFirst, hence our partner, here, Microsoft. We're going to put it on Microsoft HD Insight. >> That's super exciting. And, you know, Oliver, one of the things that we've been really fascinated with and covering for multiple years now is the transformation of Microsoft. Even prior to Satya, who's a CUBE alumni by the way, been on the CUBE, when we were at XL event at Stanford. So, CEO of Microsoft, CUBE alumni, good to have that. But, it's interesting, right? I mean, the Open Compute Project. They donated a boatload of IP into the open-source. Heavily now open-source, Brendan Burns works for Microsoft. He's seeing a huge transformation of Microsoft. You've been working with Hortonworks for a while. Now, it's kind of coming together, and one of the things that's interesting is the trend that's teasing out on the CUBE all the time now is integration. He's seeing this flash point where okay, I've got some Hadoop, I've got a bunch of other stuff in the enterprise equation that's kind of coming together. And you know, things like IOT, and AIs all around the corner as well. How are you guys getting this all packaged together? 'Cause this kind of highlights some of the things that are now integrated in with the tools you have. Give us an update. >> Yeah, absolutely. So for sure, just to kind of respond to the trend, Microsoft kind of made that transformation of being CloudFirst, you know, many years ago. And, it's been great to partner with someone like Hortonworks actually for the last four years of bringing HD Insight as a first party Microsoft Cloud service. And because of that, as we're building other Cloud services around in Azure, we have over 60 services. Think about that. That's 60 PAZ and IAZ services in Microsoft, part of the Azure ecosystem. All of this is starting to get completely integrated with all of our other services. So HD Insight, as an example, is integrated with all of our relational investments, our BI investments, our machine learning investments, our data science investments. And so, it's really just becoming part of the fabric of the Azure Cloud. And so that's a testament to the great partnership that we're having with Hortonworks. >> So the inquiry comment from Gardner, and we're seeing similar things on the Wikibon site on our research team, is that now the legitimacy of say, of seeing how Hadoop fits into the bigger picture, not just Hadoop being the pure-play Big Data platform which many people were doing. But now they're seeing a bigger picture where I can have Hadoop, and I can have some other stuff all integrating. Is that all kind of where this is going from you guys' perspective? >> So yeah, it's again, some statistics we have done tech-validate service that our customers are telling us that 43% of the responders are actually using that integrated approach, the hybrid. They're using the Cloud. They're using our stuff on-premise to actually provide integrated end-to-end processing workload. They are now, I think, people are less think about, I would think, a couple years ago, people probably think a little bit about what kind of data they want to put in the Cloud. What kind of workload they want to actually execute in the Cloud, versus their own premise. I think, what we see is that line starting to blur a little bit. And given the partnership we have with Microsoft, the kind of, the enterprise-ready functionalities, and we talk about that for a long time last time I was here. Talk about security, talk about governance, talk about just layer of, integrated layer to manage the entire thing. Either on-premise, or in the Cloud. I think those are some of the functionalities or some of the innovations that make people a lot more at ease with the idea of putting the entire mission-critical applications in the Cloud, and I want to mention that, especially with our blog going out tomorrow that we will actually announce the Spark 2.1. In which, in Microsoft Azure HD Insight, we're actually going to guarantee 99.9% of SLA. Right, so it's, for that, it's for enterprise customers. In which many of us have together that is truly an insurance outfield, that people are not just only feel at ease about their data, that where they're going to locate, either in the Cloud or within their data center, but also the kind of speed and response and reliability. >> Oliver, I want to queue off something you said which was interesting, that you have 60 services, and that they're increasingly integrated with each other. The idea that Hadoop itself is made up of many projects or services and I think in some amount of time, we won't look at it as a discrete project or product, but something that's integrated with together makes a pipeline, a mix-and-match. I'm curious if you can share with us a vision of how you see Hadoop fitting in with a richer set of Microsoft services, where it might be SQL server, it might be streaming analytics, what that looks like and so the issue of sort of a mix-and-match toolkit fades into a more seamless set of services. >> Yeah, absolutely. And you're right, Hadoop and Wei will definitely reiterate this, is that Hadoop is a platform right, and certainly there is multiple different workloads and projects on that platform that do a lot of different things. You have Spark that can do machine learning and streaming, and SQL-like queries, and you have Hadoop itself that can do badge, interactive, streaming as well. So, you see kind of a lot of workloads being built on open-source Hadoop. And as you bring it to the Cloud, it's really for customers that what we found, and kind of this new Microsoft that is often talked about, is it's all about choice and flexibility for our customers. And so, some customers want to be 100% open-source Apache Hadoop, and if they want that, HD Insight is the right offering, and what we can do is we can surround it with other capabilities that are outside of maybe core Hadoop-type capabilities. Like if you want to media services, all the way down to, you know, other technologies nothing related to, specifically to data and analytics. And so they can combine that with the Hadoop offering, and blend it into a combined offering. And there are some customers that will blend open-source Hadoop with some of our Azure data services as well, because it offers something unique or different. But it's really a choice for our customers. Whatever they're open to, whatever their kind of their strategy for their organization. >> Is there, just to kind of then compare it with other philosophies, do you see that notion that Hadoop now becomes a set of services that might or might not be mixed and matched with native services. Is that different from how Amazon or Google, you know, you perceive them to be integrating Hadoop into their sort of pipelines and services? >> Yeah, it's different because I see Amazon and Google, like, for instance, Google kind of is starting to change their philosophy a little bit with introduction of dataproc. But before, you can see them as an organization that was really focused on bringing some of the internal learnings of Google into the marketplace with their own, you can say proprietary-type services with some of the offerings that they have. But now, they're kind of realizing the value that Hadoop, that Apache Hadoop ecosystem brings. And so, with that comes the introduction of their own manage service. And for AWS, their roots is IAZ, so to speak, is kind of the roots of their Cloud, and they're starting to bring kind of other systems, very similar to, I would say Microsoft Strategy. For us, we are all about making things enterprise-ready. So that's what the unique differentiator and kind of what you alluded to. And so for Microsoft, all of our data services are backed by 99.9% service-level agreement including our relationship with Hortonworks. So that's kind of one, >> Just say that again, one more time. >> 99.9% up-time, and if, >> SLA. >> Oliver: SLA and so that's a guarantee to our customers. So if anything we're, >> John: One more time. >> It's a guarantee to our customers. >> No, this is important. SLA, I mean Google Next didn't talk much about last week their Cloud event. They talked about speed thieves, >> Exactly >> Not a lot of SLAs. This is mandate for the enterprise. They care more about SLA so, not that they don't care about price, but they'd much rather have solid, bulletproof SLAs than the best price. 'Cause the total cost of ownership. >> Right. And that's really the heritage of where Microsoft comes from, is we have been serving our on-premises customers for so long, we understand what they want and need and require for a mission-critical enterprise-ready deployment. And so, our relationship with Hortonworks absolutely 99.9% service-level agreement that we will guarantee to our customers and across all of the Hadoop workloads, whether it would be Hive, whether it would be Spark, whether it'd be Kafka, any of the workloads that we have on HD Insight, is enterprise-ready by virtue, mission-critical, built-in, all that stuff that you would expect. >> Yeah, you guys certainly have a great track record with enterprise. No debate about that, 100%. Um, back to you guys, I want to take a step back and look at some things we've been observing kicking off this week at the Strata Hadoop. This is our eighth year covering, Hadoop world now has evolved into a whole huge thing with Big Data SV and Big Data NYC that we run as well. The bets that were made. And so, I've been intrigued by HD Insights from day one. >> Yep. >> Especially the relationship with Microsoft. Got our attention right away, because of where we saw the dots connecting, which is kind of where we are now. That's a good bet. We're looking at what bets were made and who's making which bets when, and how they're panning out, so I want to just connect the dots. Bets that you guys have made, and the bets that you guys have made that are now paying off, and certainly we've done before camera revolution analytics. Obviously, now, looking real good middle of the fairway as they say. Bets you guys have made that hey, that was a good call. >> Right, and we think that first and foremost, we are sworn to work to support machine learning, we don't call it AI, but we are probably the one that first to always put the Spark, right, in Hadoop. I know that Spark has gained a lot of traction, but I remember that in the early days, we are the ones that as a distro that, going out there not only just verbally talk about support of Spark, but truly put it in our distribution as one of the component. We actually now in the last version, we are actually allows also flexibility. You know Spark, how often they change. Every six weeks they have a new version. And that's kind of in the sense of running into paradox of what actually enterprise-ready is. Within six weeks, they can't even roll out an entire process, right? If they have a workload, they probably can't even get everyone to adopt that yet, within six weeks. So what we did, actually, in the last version, in which we will continue to do, is to essentially support multiple versions of Spark. Right, we essentially to talk about that. And the other bet we have made is about Hive. We truly made that as kind of an initiative behind project Stinger initiative, and also have ties now with LAP. We made the effort to join in with all the other open-source developers to go behind this project that make sure that SQL is becoming truly available for our customers, right. Not only just affordable, but also have the most comprehensive coverage for SQL, and C20-11. But also now having that almost sub-second interactive query. So I think that's the kind of bet we made. >> Yeah, I guess the compatibility of SQL, then you got the performance advantage going on, and this database is where it's in memory or it's SSD, That seems to be the action. >> Wei: Yeah. >> Oliver, you guys made some good bets. So, let's go down the list. >> So let's go down memory lane. I always kind of want to go back to our partnership with Hortonworks. We partnered with Hortonworks really early on, in the early days of Hortonworks' existence. And the reason we made that bet was because of Hortonworks' strategy of being completely open. Right, and so that was a key decision criteria for Microsoft. That we wanted to partner with someone whose entire philosophy was open-source, and committing everything back to the Apache ecosystem. And so that was a very strategic bet that we made. >> John: It was bold at the time, too. >> It was very bold, at the time, yeah. Because Hortonworks at that time was a much smaller company than they are today. But we kind of understood of where the ecosystem was going, and we wanted to partner with people who were committing code back into the ecosystem. So that, I would argue, is definitely one really big bet that was a very successful one and continues to play out even today. Other bets that we've made and like we've talked about prior is our acquisition of Revolution Analytics a couple years ago and that's, >> R just keeps on rolling, it keeps on rolling, rolling, rolling. Awesome. >> Absolutely. Yeah. >> Alright, final words. Why don't we get updated on the data science experiences you guys have. Is there any update there? What's going on, what seems to be, the data science tools are accelerating fast. And, in fact, some are saying that looks like the software tools years and years ago. A lot more work to do. So what's happening with the data science experience? >> Yeah absolutely and just tying back to that original comment around R, Revolution Analytics. That has become Microsoft, our server. And we're offering that, available on-premises and in the Cloud. So on-premises, it's completely integrated with SQL server. So all SQL server customers will now be able to do in-database analytics with R built-in-to-the-core database. And that we see as a major win for us, and a differentiator in the marketplace. But in the Cloud, in conjunction with our partnership with Hortonworks, we're making Microsoft R server, available as part of our integration with Azure HD Insights. So we're kind of just tying back all that integration that we talked about. And so that's built in, and so any customer can take R, and paralyze that across any number of Hadoop and Sparknotes in a managed service within minutes. Clusters will spin up, and they can just run all their data science models and train them across any number of Hadoop and Sparknotes. And so that is, >> John: That takes the heavy lifting away on the cluster management side, so they can focus on their jobs. >> Oliver: Absolutely. >> Awesome. Well guys, thanks for coming on. We really appreciate Wei Wang with Hortonworks, and we have Oliver Chiu from Microsoft. Great to get the update, and tomorrow 10:30, the CloudFirst news hits. CloudFirst, Hortonworks with Azure, great news, congratulations, good Cloud play for Hortonworks. To CUBE, I'm John Furrier with George Gilbert. More coverage live in Silicon Valley after this short break.

Published Date : Mar 15 2017

SUMMARY :

It's the CUBE, covering all of the angles. and I was just talking about you guys this morning a lot of customers in the Cloud. and one of the things that's interesting that we're having with Hortonworks. is that now the legitimacy of say, And given the partnership we have with Microsoft, and that they're increasingly integrated with each other. all the way down to, you know, other technologies a set of services that might or might not be and kind of what you alluded to. Oliver: SLA and so that's a guarantee to our customers. No, this is important. This is mandate for the enterprise. and across all of the Hadoop workloads, that we run as well. and the bets that you guys have made but I remember that in the early days, Yeah, I guess the compatibility of SQL, So, let's go down the list. And so that was a very strategic bet that we made. and we wanted to partner with people it keeps on rolling, rolling, rolling. Yeah. on the data science experiences you guys have. and in the Cloud. on the cluster management side, and we have Oliver Chiu from Microsoft.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Oliver	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Satya	PERSON	0.99+
John Furrier	PERSON	0.99+
Oliver Chiu	PERSON	0.99+
Peter Buress	PERSON	0.99+
43%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
99.9%	QUANTITY	0.99+
60 services	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Hadoop	TITLE	0.99+
CUBE	ORGANIZATION	0.99+
tomorrow 10:30	DATE	0.99+
Brendan Burns	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
last year	DATE	0.99+
last week	DATE	0.99+
SQL	TITLE	0.99+
Spark	TITLE	0.99+
57%	QUANTITY	0.99+
tomorrow	DATE	0.99+
Big Data Week	EVENT	0.99+
two days	QUANTITY	0.99+
Wei Wang	PERSON	0.99+
Big Data	ORGANIZATION	0.99+
Gardner	PERSON	0.98+

Shaun Connolly, Hortonworks - BigDataNYC - #BigDataNYC - #theCUBE

(upbeat electronic music) >> Male Voiceover: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors Sisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts. Dave Vellante and Peter Burress. >> We're back in the Big Apple. This is the Cube, the worldwide leader in live tech coverage, we're here at Big Data NYC, Big Data week is part of strata plus dupe world. Shaun Connolly is here as the vice president of strategy at Horton Works, long time friend and Cube alum, great to see you again. >> Thanks for having me, were back at the same venue last year, always a pleasure. >> Yeah, it's good, we're growing, I guess the event's growing, we haven't been over there yet, but some of our guys have, but what's it like over there? >> You know, it feels the same, some of the different use cases, I think last year was streaming, we're hearing more machine learning and things like that as far as use cases, so similar vibe. >> Yeah, so things are evolving, right? How's Hortonworks evolving? >> We're continuing to report our quarterly earnings as the only publicly traded company in this space, things from a business perspective are doing well. Our connected data platforms strategy which we unveiled at the beginning of this year, which is written data in motion and data at rest and enabling these new gen transformational applications continues to play out. The data in motion piece is sort of decoupled and unrelated to a hadou platform, it's really about acquiring and handling the FedEx for data delivery type notions, data logistics, secure transmission. That's based on the Apache Ni-Fi tech that was originally built sort of at the NSA over the past eight years, so. Really a nice robust piece of technology that we've pushed out to the edge in our latest release so you can really skin these down into a secure site to site transmission. A lot of sophisticated capabilities there, so we're seeing a lot of uptake in that sort of architectural vision, the products are maturing, both on prem and in the cloud, things are pretty exciting. >> Well this cloud thing seems pretty real. (Shaun laughing) You can get a lot of traction, right? Everybody kind of knew it was coming, but what are you seeing? >> Yeah so it was, I guess I started the journey back in 2009, when I was at Springsource in Paul Moretz was CEO of Vmware, and that was pre sort of cloud at that time. We were talking about this notion of platform as a service, and things like that. And that resonated really well with folks back then, but their main ask was how do you solve the data problem, how do actually get the data to the apps that need it. Fast forward to 2016, I think it's been a lot of open source innovation, you know a lot of commercial innovation, the rise of cloud for providing a fast path to value, booting up these used cases, it's a fascinating transition to watch. Many of our customers are, people use the word hybrid. What that means to me is they'll have data center workloads, or multi data center workloads, but they also have cloud workloads, sometimes even multi cloud workloads, and that inherent nature of the beast is why I use sort of the term of connected data architecture, is y%ou need an architecture that inherently is built to span that fact. And that's just increasing, that's just the world we live in today. >> But the fact is because there speed of light issues, there's data fidelity issues. >> Shaun: Yup. >> There's other types of things, how are you starting to see those practical and very physical realities start to impact the whole concept of design as it pertains to data, as it pertains to analytics, as it pertains to the infrastructure associated with the two? >> Yup, so at Hoop Summit that we had last June, there were really some really good sessions that were there. Folks like Comcast, Ford, Schlumberger talked about this connected data architecture reality, right. If you look at like, I like to use the connected car ecosystem as a good example, cause there were insurance providers and others that were sort of speaking on behalf of that, where you have the cars and other data that's inherently born up there, and there's a slug of use cases that are around edge analytics, streaming analytics, time series analytics, and we're seeing that, and I think the cloud lends itself really well for those types of use cases. But we also see manufacturing line data for the cars, where you want to get a 360 degree view of operational issues, and dovetail that with manufacturing line elements, and that's inherently what we've seen is, what your classic sort of on prem data wake, in quotes has been used for so you can get that 360 degree operational intelligence type of analytics to come out of that, right? So that type of use case, whether you apply it to oil and gas and having the sensors on the oil rigs, in the Schlumberger example, that pattern is repeating itself across different industries. British Gas, in Europe talks about how they're fundamentally changing the nature of the relationship with their customer because of the smart meters, and their connectivity in the homes and they can deliver a better value there. So that's inherently connected data realm, there's cloud use cases, and in the data center use cases. So I see these use cases, you know, they'll be use case specific in applications that are sprinkled across that fabric, if you will. And that's really what we're seeing. >> At our panel last year here in this venue, we would talk about a lot of things, one was the market, the sort of ebbs and flows you just mentioned, you guys are the only public player, Talon's joining that crew. >> Shaun: Yeah. Excellent. >> You've seen some. >> Shaun: We need more. >> We need more, we've seen some MNA, Plat 4 taken out, I don't know if that was, I don't know the specifics of that deal. Might have been an acu hire, might not, I don't know. And Data Mere did a raise, so you're seeing these rip currents, in all directions. What are you seeing in the marketplace, lot of funding early on, lot of players, lot of innovation, and now it's like, okay, the music at some point's going to stop, but. >> Yeah. >> What's your take? >> So in our last call, and I think we repeated it on our prior earnings call, you know, our focus and then we put out there in our earnings, in our Q3 earnings will sort of reiterate where we stand is, we basically said Q4 is when we look to go adjust to even or break even. >> Right. >> And then 2017 we'll go from there. We reiterated that guidance, we had a little over 62 million in billings for the quarter, so the business is pretty robust and growing, it's a. We're only five years into this, I mean we're just five years old, so it's a very fast pace of billings growth, right? That's almost a 250 million run rate, right? For exiting that quarter. You know, annual run rate. So we see a lot of the use cases really continuing to move on. I think what I and what our customers ask us is, they're on a digital transformation journey, and they want the industry to start talking about those types of business value drivers, right? So I think we should expect to see a transition from the piece parts animals in the zoo and what's the right open source piece of technology, and more why should you care, right? As a business, how is this transforming what you do? How does this open up new lines of business? We started seeing that at Hadoop Summit when I think about two dozen customers were sharing, very rich stories, right? So that's where things are. But I think running a company is, you have to run it with a certain sense of rigor and that was one of the reasons why we chose to go public, right? >> So, we by the way, we totally agree that customers want to stop talking about digital business in platitudes and start actually identifying specifically what is it about it that's new and different, and find ways of doing it. >> Shaun: Sure. >> Coming back to the issue, however, of how you go about making some of those transformations relevant. There is clearly a knowledge gap about what digital business is, what it isn't, certainly. But there's also a fair amount of skills that have yet to be developed, that are required for a lot of the use cases that companies are pursuing. Not just in terms of implementing the technology appropriately, but actually constructing and conceptualizing the use cases. >> Shaun: Sure. >> So that suggests that there's two paths forward. There's a path forward where we can do a better job of diffusing knowledge through people, and there's a path for where we can do a better job of building software that's easier to use. >> Shaun: Mm hmm. >> And there's both. How do you see this playing out over the course of the next few years? >> Yep, and I think in any new area as technology's emerging, like one of the things I use is Apache Software Foundation. Literally every other week there's a new data related Apache project that lands, so it's. It can be really confusing, but it's exhilarating from the fact of I participate in that, and I try and figure out what ones we can harness in a consumable platform, whether it's one prem or a cloud or what have you. What use cases can it light up? So I think you have both of those vectors, and it really depends on, I like to use the classic software adoption curve, you have a lot of the left side of the chasm folks, where a lot of this new stuff is going to be sharper edges, and they're always going to be trailblazers, right? But we are also seeing a lot of some of these advanced analytics. Some of these new solutions are automating the pipeline, so you can actually let the infrastructure and these engines do more of the thinking for you, so you get your model's output. Even to the point where you run multi model simulation in parallel, and out pops the best fit. That's where things will head, right? I think it's just a matter of the technology maturing, making sure we address things like security, metadata management, governance, and those illities that the enterprise expects, and then really forcing ourselves to simplify and automate as much as possible, right. And that was one of the reasons on that last one why in October 2011 we basically chose Teradata and Microsoft as key partners. Teradata because in 2011, clearly, right? >> Peter: Teradata. >> They're Teradata, right? Microsoft because it simplifies technologies and brings them to billions of users, right? And so we need to do both, you need to harden it, right? For the most rigorous large enterprises, but you need to simplify it for the meat of the market adopters, right? The early majority and late majority. You have to do both. >> Shaun, you're sitting across from a CEO, and you have to say these are the three things you need to do to enact this digital transformation. >> Shaun: Yup. >> What are the three things you're telling him? >> So, I think they need as a business to identify how do they want to leverage data as capital, and what pockets of value do they want to go chase, number one. Number two, how is their business being impacted by the fact that you have the rise of IOT and inherently increasing connected society and infrastructure. How is that impacting them? And number three is, how do they evolve what they're used to doing, right? You have to align it, exactly. >> Because that's really many respects of, I like to say there's a difference between invention and innovation. Invention is the engineering act, innovation's a social act, it's adopting those new practices >> Shaun: Exactly. >> That actually allow you to enact the invention and generate revenue. >> Exactly. Now in our space, I think we have a very compelling renovate value prop which is a cost savings where you can drive cost out, but the innovate use cases are the ones. Like if all you're going to do is renovate, then you will fail, you will stall, right? Because it's not a balance of cost savings. It's about how do you actually transform your business. And in the case of like the British Gas example, I used that as how they engaged that end consumer is fundamentally changing. So that's the question I put back in those conversations is how do you want to evolve your business and how do you leverage data as capital? Because the beauty of data as capital is you can actually generate multiple lines of interest off of a single data set, cause you can derive different insights off of that, so it's not like a dollar, right? And single compound, it's multiple compound annual interest rate on that. But they have to chase the right use cases. >> Although, we've also learned from great design that if you do the right thing better, you get rid of a lot waste and so coming back to your point, doing the right thing better often leads to cost savings. >> Yes. Exactly. One inherently can drive the other, but if you're just driving it then >> Peter: Just doing cost. >> You're not going to transform your buisiness. >> Peter: You're just going to continue to do the same or wrong things worse. >> Shaun: Exactly. >> Or wrong things cheaper. >> And that's difficult for enterprises. Because there's a certain way to do data management inherently inside in a highly structured manner, but I do think the rise of like IOT, I don't see as a market, I see it as infinite slices of prosciutto, right? (laughter) It's a very thinly sliced set of market opportunities, right? But it's forcing people to think about different use cases and how that might impact their business. >> We see those set of capabilities. >> Yup. >> Which leads to the prosciutto. >> Exactly. >> So you, and come up with a really nice sandwich. (laughter) >> It's my Italian. >> Let's keep going. >> I'm loving it. >> I'm getting a little hungry. >> You have always made a big deal out of your partnerships not being barney deals but being deep integration relationships. So you mentioned two here, Teradata and Microsoft. As the cloud becomes more prevalent, as things evolve and machine learning becomes the hot buzzword, et cetera. How have you evolved those relationships specifically in terms of the integration work that you've done? Have you kept up that engineering ethos, or? >> And that was the thing. With Microsoft, we clearly spent a lot of sweat equity on the Azure HDInsight service, but if you look at that ecosystem, they have Azure machine learning, right? They have a whole raft of services, right, that you can apply to the data when it's in the cloud, right? So how that piece integrates with the broader ecosystem of services is a lot of engineering work as well. I've always said, there's work to be done in our green box, but the other half of the work is how it plumbs into the rest. And so if you look at the AWS ecosystem, how do you optimize for S3 as a storage tier, and ephemeral workloads where HDFS is maybe a caching mechanism but it's not your primary storage, right? It brings up really interesting integration modes and how you actually bring your value out into really interesting use cases, right? So I think it's opened up a lot of areas where we can drive a lot more integration, drive the open source tech in a way that's relevant for those use cases. >> Alright, we got to go but, summit in Tokyo, is it next month? >> Yes, end of October. >> End of October. >> It's our first time, so primarily summits have been US and Europe. We had Melbourne end of August, and we have Tokyo end of October. I'll be, they're bringing the right hander out of retirement, so I'll be onstage in Tokyo. (laughing) I've usually been behind the scenes. >> Throwing the slurve? (laughter) >> Yeah, exactly. So I'm looking forward to it, it'll be exciting. >> Alright, good, and then 17, you're going to start again in the spring. >> Shaun: Yup. >> You're in Munich. >> Shaun: Yup. Munich. >> You were in Dublin last year, you're moving to Munich this year. >> Shaun: Exactly. >> Hopefully the Cube will be back, in Munich, alright? >> We love you guys, you guys do a good job. >> Let's make it happen, do good stuff in Europe, so thanks again for coming out. >> Shaun: Thanks for having me. >> Always a pleasure. Alright, keep it right there, we'll be back right after this short break. This is the Cube, we're live from New York City. ( upbeat electronic music)

Published Date : Sep 29 2016

SUMMARY :

Brought to you by headline sponsors and Cube alum, great to see you again. at the same venue last the same, some of the of at the NSA over the but what are you seeing? nature of the beast is why I use But the fact is because there in the data center use cases. and flows you just mentioned, you guys Shaun: Yeah. okay, the music at some So in our last call, and I think so the business is pretty of doing it. for a lot of the use and there's a path for where we can do a of the next few years? the pipeline, so you can actually let the for the meat of the market and you have to say these by the fact that you have the rise of IOT Invention is the engineering you to enact the invention And in the case of like that if you do the right thing better, One inherently can drive the other, You're not going to to do the same or wrong things worse. But it's forcing people to think about So you, and come up with of the integration work of sweat equity on the of August, and we have to it, it'll be exciting. start again in the spring. Shaun: Yup. to Munich this year. We love you guys, so thanks again for coming out. This is the Cube, we're

ENTITIES

Entity	Category	Confidence
Shaun	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dublin	LOCATION	0.99+
Nvidia	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Sisco	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
2011	DATE	0.99+
British Gas	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
Peter	PERSON	0.99+
Shaun Connolly	PERSON	0.99+
October 2011	DATE	0.99+
Tokyo	LOCATION	0.99+
New York City	LOCATION	0.99+
Apache Software Foundation	ORGANIZATION	0.99+
2009	DATE	0.99+
2016	DATE	0.99+
two	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
one	QUANTITY	0.99+
last year	DATE	0.99+
Vmware	ORGANIZATION	0.99+
2017	DATE	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
last year	DATE	0.99+
both	QUANTITY	0.99+
Springsource	ORGANIZATION	0.99+
this year	DATE	0.99+
Melbourne	LOCATION	0.99+
last June	DATE	0.99+
Schlumberger	ORGANIZATION	0.99+
Big Apple	LOCATION	0.99+
NYC	LOCATION	0.99+
first time	QUANTITY	0.99+
End of October	DATE	0.99+
end of October	DATE	0.99+
next month	DATE	0.99+
end of August	DATE	0.98+
Apache	ORGANIZATION	0.98+
single	QUANTITY	0.98+
two paths	QUANTITY	0.98+
Horton Works	ORGANIZATION	0.98+
BigDataNYC	ORGANIZATION	0.97+
over 62 million	QUANTITY	0.97+
US	LOCATION	0.96+
Azure	TITLE	0.96+
billions of users	QUANTITY	0.93+
today	DATE	0.92+

Next-Generation Analytics Social Influencer Roundtable - #BigDataNYC 2016 #theCUBE

>> Narrator: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors, CISCO, IBM, NVIDIA, and our ecosystem sponsors, now here's your host, Dave Valante. >> Welcome back to New York City, everybody, this is the Cube, the worldwide leader in live tech coverage, and this is a cube first, we've got a nine person, actually eight person panel of experts, data scientists, all alike. I'm here with my co-host, James Cubelis, who has helped organize this panel of experts. James, welcome. >> Thank you very much, Dave, it's great to be here, and we have some really excellent brain power up there, so I'm going to let them talk. >> Okay, well thank you again-- >> And I'll interject my thoughts now and then, but I want to hear them. >> Okay, great, we know you well, Jim, we know you'll do that, so thank you for that, and appreciate you organizing this. Okay, so what I'm going to do to our panelists is ask you to introduce yourself. I'll introduce you, but tell us a little bit about yourself, and talk a little bit about what data science means to you. A number of you started in the field a long time ago, perhaps data warehouse experts before the term data science was coined. Some of you started probably after Hal Varian said it was the sexiest job in the world. (laughs) So think about how data science has changed and or what it means to you. We're going to start with Greg Piateski, who's from Boston. A Ph.D., KDnuggets, Greg, tell us about yourself and what data science means to you. >> Okay, well thank you Dave and thank you Jim for the invitation. Data science in a sense is the second oldest profession. I think people have this built-in need to find patterns and whatever we find we want to organize the data, but we do it well on a small scale, but we don't do it well on a large scale, so really, data science takes our need and helps us organize what we find, the patterns that we find that are really valid and useful and not just random, I think this is a big challenge of data science. I've actually started in this field before the term Data Science existed. I started as a researcher and organized the first few workshops on data mining and knowledge discovery, and the term data mining became less fashionable, became predictive analytics, now it's data science and it will be something else in a few years. >> Okay, thank you, Eves Mulkearns, Eves, I of course know you from Twitter. A lot of people know you as well. Tell us about your experiences and what data scientist means to you. >> Well, data science to me is if you take the two words, the data and the science, the science it holds a lot of expertise and skills there, it's statistics, it's mathematics, it's understanding the business and putting that together with the digitization of what we have. It's not only the structured data or the unstructured data what you store in the database try to get out and try to understand what is in there, but even video what is coming on and then trying to find, like George already said, the patterns in there and bringing value to the business but looking from a technical perspective, but still linking that to the business insights and you can do that on a technical level, but then you don't know yet what you need to find, or what you're looking for. >> Okay great, thank you. Craig Brown, Cube alum. How many people have been on the Cube actually before? >> I have. >> Okay, good. I always like to ask that question. So Craig, tell us a little bit about your background and, you know, data science, how has it changed, what's it all mean to you? >> Sure, so I'm Craig Brown, I've been in IT for almost 28 years, and that was obviously before the term data science, but I've evolved from, I started out as a developer. And evolved through the data ranks, as I called it, working with data structures, working with data systems, data technologies, and now we're working with data pure and simple. Data science to me is an individual or team of individuals that dissect the data, understand the data, help folks look at the data differently than just the information that, you know, we usually use in reports, and get more insights on, how to utilize it and better leverage it as an asset within an organization. >> Great, thank you Craig, okay, Jennifer Shin? Math is obviously part of being a data scientist. You're good at math I understand. Tell us about yourself. >> Yeah, so I'm a senior principle data scientist at the Nielsen Company. I'm also the founder of 8 Path Solutions, which is a data science, analytics, and technology company, and I'm also on the faculty in the Master of Information and Data Science program at UC Berkeley. So math is part of the IT statistics for data science actually this semester, and I think for me, I consider myself a scientist primarily, and data science is a nice day job to have, right? Something where there's industry need for people with my skill set in the sciences, and data gives us a great way of being able to communicate sort of what we know in science in a way that can be used out there in the real world. I think the best benefit for me is that now that I'm a data scientist, people know what my job is, whereas before, maybe five ten years ago, no one understood what I did. Now, people don't necessarily understand what I do now, but at least they understand kind of what I do, so it's still an improvement. >> Excellent. Thank you Jennifer. Joe Caserta, you're somebody who started in the data warehouse business, and saw that snake swallow a basketball and grow into what we now know as big data, so tell us about yourself. >> So I've been doing data for 30 years now, and I wrote the Data Warehouse ETL Toolkit with Ralph Timbal, which is the best selling book in the industry on preparing data for analytics, and with the big paradigm shift that's happened, you know for me the past seven years has been, instead of preparing data for people to analyze data to make decisions, now we're preparing data for machines to make the decisions, and I think that's the big shift from data analysis to data analytics and data science. >> Great, thank you. Miriam, Miriam Fridell, welcome. >> Thank you. I'm Miriam Fridell, I work for Elder Research, we are a data science consultancy, and I came to data science, sort of through a very circuitous route. I started off as a physicist, went to work as a consultant and software engineer, then became a research analyst, and finally came to data science. And I think one of the most interesting things to me about data science is that it's not simply about building an interesting model and doing some interesting mathematics, or maybe wrangling the data, all of which I love to do, but it's really the entire analytics lifecycle, and a value that you can actually extract from data at the end, and that's one of the things that I enjoy most is seeing a client's eyes light up or a wow, I didn't really know we could look at data that way, that's really interesting. I can actually do something with that, so I think that, to me, is one of the most interesting things about it. >> Great, thank you. Justin Sadeen, welcome. >> Absolutely, than you, thank you. So my name is Justin Sadeen, I work for Morph EDU, an artificial intelligence company in Atlanta, Georgia, and we develop learning platforms for non-profit and private educational institutions. So I'm a Marine Corp veteran turned data enthusiast, and so what I think about data science is the intersection of information, intelligence, and analysis, and I'm really excited about the transition from big data into smart data, and that's what I see data science as. >> Great, and last but not least, Dez Blanchfield, welcome mate. >> Good day. Yeah, I'm the one with the funny accent. So data science for me is probably the funniest job I've ever to describe to my mom. I've had quite a few different jobs, and she's never understood any of them, and this one she understands the least. I think a fun way to describe what we're trying to do in the world of data science and analytics now is it's the equivalent of high altitude mountain climbing. It's like the extreme sport version of the computer science world, because we have to be this magical unicorn of a human that can understand plain english problems from C-suite down and then translate it into code, either as soles or as teams of developers. And so there's this black art that we're expected to be able to transmogrify from something that we just in plain english say I would like to know X, and we have to go and figure it out, so there's this neat extreme sport view I have of rushing down the side of a mountain on a mountain bike and just dodging rocks and trees and things occasionally, because invariably, we do have things that go wrong, and they don't quite give us the answers we want. But I think we're at an interesting point in time now with the explosion in the types of technology that are at our fingertips, and the scale at which we can do things now, once upon a time we would sit at a terminal and write code and just look at data and watch it in columns, and then we ended up with spreadsheet technologies at our fingertips. Nowadays it's quite normal to instantiate a small high performance distributed cluster of computers, effectively a super computer in a public cloud, and throw some data at it and see what comes back. And we can do that on a credit card. So I think we're at a really interesting tipping point now where this coinage of data science needs to be slightly better defined, so that we can help organizations who have weird and strange questions that they want to ask, tell them solutions to those questions, and deliver on them in, I guess, a commodity deliverable. I want to know xyz and I want to know it in this time frame and I want to spend this much amount of money to do it, and I don't really care how you're going to do it. And there's so many tools we can choose from and there's so many platforms we can choose from, it's this little black art of computing, if you'd like, we're effectively making it up as we go in many ways, so I think it's one of the most exciting challenges that I've had, and I think I'm pretty sure I speak for most of us in that we're lucky that we get paid to do this amazing job. That we get make up on a daily basis in some cases. >> Excellent, well okay. So we'll just get right into it. I'm going to go off script-- >> Do they have unicorns down under? I think they have some strange species right? >> Well we put the pointy bit on the back. You guys have in on the front. >> So I was at an IBM event on Friday. It was a chief data officer summit, and I attended what was called the Data Divas' breakfast. It was a women in tech thing, and one of the CDOs, she said that 25% of chief data officers are women, which is much higher than you would normally see in the profile of IT. We happen to have 25% of our panelists are women. Is that common? Miriam and Jennifer, is that common for the data science field? Or is this a higher percentage than you would normally see-- >> James: Or a lower percentage? >> I think certainly for us, we have hired a number of additional women in the last year, and they are phenomenal data scientists. I don't know that I would say, I mean I think it's certainly typical that this is still a male-dominated field, but I think like many male-dominated fields, physics, mathematics, computer science, I think that that is slowly changing and evolving, and I think certainly, that's something that we've noticed in our firm over the years at our consultancy, as we're hiring new people. So I don't know if I would say 25% is the right number, but hopefully we can get it closer to 50. Jennifer, I don't know if you have... >> Yeah, so I know at Nielsen we have actually more than 25% of our team is women, at least the team I work with, so there seems to be a lot of women who are going into the field. Which isn't too surprising, because with a lot of the issues that come up in STEM, one of the reasons why a lot of women drop out is because they want real world jobs and they feel like they want to be in the workforce, and so I think this is a great opportunity with data science being so popular for these women to actually have a job where they can still maintain that engineering and science view background that they learned in school. >> Great, well Hillary Mason, I think, was the first data scientist that I ever interviewed, and I asked her what are the sort of skills required and the first question that we wanted to ask, I just threw other women in tech in there, 'cause we love women in tech, is about this notion of the unicorn data scientist, right? It's been put forth that there's the skill sets required to be a date scientist are so numerous that it's virtually impossible to have a data scientist with all those skills. >> And I love Dez's extreme sports analogy, because that plays into the whole notion of data science, we like to talk about the theme now of data science as a team sport. Must it be an extreme sport is what I'm wondering, you know. The unicorns of the world seem to be... Is that realistic now in this new era? >> I mean when automobiles first came out, they were concerned that there wouldn't be enough chauffeurs to drive all the people around. Is there an analogy with data, to be a data-driven company. Do I need a data scientist, and does that data scientist, you know, need to have these unbelievable mixture of skills? Or are we doomed to always have a skill shortage? Open it up. >> I'd like to have a crack at that, so it's interesting, when automobiles were a thing, when they first bought cars out, and before they, sort of, were modernized by the likes of Ford's Model T, when we got away from the horse and carriage, they actually had human beings walking down the street with a flag warning the public that the horseless carriage was coming, and I think data scientists are very much like that. That we're kind of expected to go ahead of the organization and try and take the challenges we're faced with today and see what's going to come around the corner. And so we're like the little flag-bearers, if you'd like, in many ways of this is where we're at today, tell me where I'm going to be tomorrow, and try and predict the day after as well. It is very much becoming a team sport though. But I think the concept of data science being a unicorn has come about because the coinage hasn't been very well defined, you know, if you were to ask 10 people what a data scientist were, you'd get 11 answers, and I think this is a really challenging issue for hiring managers and C-suites when the generants say I was data science, I want big data, I want an analyst. They don't actually really know what they're asking for. Generally, if you ask for a database administrator, it's a well-described job spec, and you can just advertise it and some 20 people will turn up and you interview to decide whether you like the look and feel and smell of 'em. When you ask for a data scientist, there's 20 different definitions of what that one data science role could be. So we don't initially know what the job is, we don't know what the deliverable is, and we're still trying to figure that out, so yeah. >> Craig what about you? >> So from my experience, when we talk about data science, we're really talking about a collection of experiences with multiple people I've yet to find, at least from my experience, a data science effort with a lone wolf. So you're talking about a combination of skills, and so you don't have, no one individual needs to have all that makes a data scientist a data scientist, but you definitely have to have the right combination of skills amongst a team in order to accomplish the goals of data science team. So from my experiences and from the clients that I've worked with, we refer to the data science effort as a data science team. And I believe that's very appropriate to the team sport analogy. >> For us, we look at a data scientist as a full stack web developer, a jack of all trades, I mean they need to have a multitude of background coming from a programmer from an analyst. You can't find one subject matter expert, it's very difficult. And if you're able to find a subject matter expert, you know, through the lifecycle of product development, you're going to require that individual to interact with a number of other members from your team who are analysts and then you just end up well training this person to be, again, a jack of all trades, so it comes full circle. >> I own a business that does nothing but data solutions, and we've been in business 15 years, and it's been, the transition over time has been going from being a conventional wisdom run company with a bunch of experts at the top to becoming more of a data-driven company using data warehousing and BI, but now the trend is absolutely analytics driven. So if you're not becoming an analytics-driven company, you are going to be behind the curve very very soon, and it's interesting that IBM is now coining the phrase of a cognitive business. I think that is absolutely the future. If you're not a cognitive business from a technology perspective, and an analytics-driven perspective, you're going to be left behind, that's for sure. So in order to stay competitive, you know, you need to really think about data science think about how you're using your data, and I also see that what's considered the data expert has evolved over time too where it used to be just someone really good at writing SQL, or someone really good at writing queries in any language, but now it's becoming more of a interdisciplinary action where you need soft skills and you also need the hard skills, and that's why I think there's more females in the industry now than ever. Because you really need to have a really broad width of experiences that really wasn't required in the past. >> Greg Piateski, you have a comment? >> So there are not too many unicorns in nature or as data scientists, so I think organizations that want to hire data scientists have to look for teams, and there are a few unicorns like Hillary Mason or maybe Osama Faiat, but they generally tend to start companies and very hard to retain them as data scientists. What I see is in other evolution, automation, and you know, steps like IBM, Watson, the first platform is eventually a great advance for data scientists in the short term, but probably what's likely to happen in the longer term kind of more and more of those skills becoming subsumed by machine unique layer within the software. How long will it take, I don't know, but I have a feeling that the paradise for data scientists may not be very long lived. >> Greg, I have a follow up question to what I just heard you say. When a data scientist, let's say a unicorn data scientist starts a company, as you've phrased it, and the company's product is built on data science, do they give up becoming a data scientist in the process? It would seem that they become a data scientist of a higher order if they've built a product based on that knowledge. What is your thoughts on that? >> Well, I know a few people like that, so I think maybe they remain data scientists at heart, but they don't really have the time to do the analysis and they really have to focus more on strategic things. For example, today actually is the birthday of Google, 18 years ago, so Larry Page and Sergey Brin wrote a very influential paper back in the '90s About page rank. Have they remained data scientist, perhaps a very very small part, but that's not really what they do, so I think those unicorn data scientists could quickly evolve to have to look for really teams to capture those skills. >> Clearly they come to a point in their career where they build a company based on teams of data scientists and data engineers and so forth, which relates to the topic of team data science. What is the right division of roles and responsibilities for team data science? >> Before we go, Jennifer, did you have a comment on that? >> Yeah, so I guess I would say for me, when data science came out and there was, you know, the Venn Diagram that came out about all the skills you were supposed to have? I took a very different approach than all of the people who I knew who were going into data science. Most people started interviewing immediately, they were like this is great, I'm going to get a job. I went and learned how to develop applications, and learned computer science, 'cause I had never taken a computer science course in college, and made sure I trued up that one part where I didn't know these things or had the skills from school, so I went headfirst and just learned it, and then now I have actually a lot of technology patents as a result of that. So to answer Jim's question, actually. I started my company about five years ago. And originally started out as a consulting firm slash data science company, then it evolved, and one of the reasons I went back in the industry and now I'm at Nielsen is because you really can't do the same sort of data science work when you're actually doing product development. It's a very very different sort of world. You know, when you're developing a product you're developing a core feature or functionality that you're going to offer clients and customers, so I think definitely you really don't get to have that wide range of sort of looking at 8 million models and testing things out. That flexibility really isn't there as your product starts getting developed. >> Before we go into the team sport, the hard skills that you have, are you all good at math? Are you all computer science types? How about math? Are you all math? >> What were your GPAs? (laughs) >> David: Anybody not math oriented? Anybody not love math? You don't love math? >> I love math, I think it's required. >> David: So math yes, check. >> You dream in equations, right? You dream. >> Computer science? Do I have to have computer science skills? At least the basic knowledge? >> I don't know that you need to have formal classes in any of these things, but I think certainly as Jennifer was saying, if you have no skills in programming whatsoever and you have no interest in learning how to write SQL queries or RR Python, you're probably going to struggle a little bit. >> James: It would be a challenge. >> So I think yes, I have a Ph.D. in physics, I did a lot of math, it's my love language, but I think you don't necessarily need to have formal training in all of these things, but I think you need to have a curiosity and a love of learning, and so if you don't have that, you still want to learn and however you gain that knowledge I think, but yeah, if you have no technical interests whatsoever, and don't want to write a line of code, maybe data science is not the field for you. Even if you don't do it everyday. >> And statistics as well? You would put that in that same general category? How about data hacking? You got to love data hacking, is that fair? Eaves, you have a comment? >> Yeah, I think so, while we've been discussing that for me, the most important part is that you have a logical mind and you have the capability to absorb new things and the curiosity you need to dive into that. While I don't have an education in IT or whatever, I have a background in chemistry and those things that I learned there, I apply to information technology as well, and from a part that you say, okay, I'm a tech-savvy guy, I'm interested in the tech part of it, you need to speak that business language and if you can do that crossover and understand what other skill sets or parts of the roles are telling you I think the communication in that aspect is very important. >> I'd like throw just something really quickly, and I think there's an interesting thing that happens in IT, particularly around technology. We tend to forget that we've actually solved a lot of these problems in the past. If we look in history, if we look around the second World War, and Bletchley Park in the UK, where you had a very similar experience as humans that we're having currently around the whole issue of data science, so there was an interesting challenge with the enigma in the shark code, right? And there was a bunch of men put in a room and told, you're mathematicians and you come from universities, and you can crack codes, but they couldn't. And so what they ended up doing was running these ads, and putting challenges, they actually put, I think it was crossword puzzles in the newspaper, and this deluge of women came out of all kinds of different roles without math degrees, without science degrees, but could solve problems, and they were thrown at the challenge of cracking codes, and invariably, they did the heavy lifting. On a daily basis for converting messages from one format to another, so that this very small team at the end could actually get in play with the sexy piece of it. And I think we're going through a similar shift now with what we're refer to as data science in the technology and business world. Where the people who are doing the heavy lifting aren't necessarily what we'd think of as the traditional data scientists, and so, there have been some unicorns and we've championed them, and they're great. But I think the shift's going to be to accountants, actuaries, and statisticians who understand the business, and come from an MBA star background that can learn the relevant pieces of math and models that we need to to apply to get the data science outcome. I think we've already been here, we've solved this problem, we've just got to learn not to try and reinvent the wheel, 'cause the media hypes this whole thing of data science is exciting and new, but we've been here a couple times before, and there's a lot to be learned from that, my view. >> I think we had Joe next. >> Yeah, so I was going to say that, data science is a funny thing. To use the word science is kind of a misnomer, because there is definitely a level of art to it, and I like to use the analogy, when Michelangelo would look at a block of marble, everyone else looked at the block of marble to see a block of marble. He looks at a block of marble and he sees a finished sculpture, and then he figures out what tools do I need to actually make my vision? And I think data science is a lot like that. We hear a problem, we see the solution, and then we just need the right tools to do it, and I think part of consulting and data science in particular. It's not so much what we know out of the gate, but it's how quickly we learn. And I think everyone here, what makes them brilliant, is how quickly they could learn any tool that they need to see their vision get accomplished. >> David: Justin? >> Yeah, I think you make a really great point, for me, I'm a Marine Corp veteran, and the reason I mentioned that is 'cause I work with two veterans who are problem solvers. And I think that's what data scientists really are, in the long run are problem solvers, and you mentioned a great point that, yeah, I think just problem solving is the key. You don't have to be a subject matter expert, just be able to take the tools and intelligently use them. >> Now when you look at the whole notion of team data science, what is the right mix of roles, like role definitions within a high-quality or a high-preforming data science teams now IBM, with, of course, our announcement of project, data works and so forth. We're splitting the role division, in terms of data scientist versus data engineers versus application developer versus business analyst, is that the right breakdown of roles? Or what would the panelists recommend in terms of understanding what kind of roles make sense within, like I said, a high performing team that's looking for trying to develop applications that depend on data, machine learning, and so forth? Anybody want to? >> I'll tackle that. So the teams that I have created over the years made up these data science teams that I brought into customer sites have a combination of developer capabilities and some of them are IT developers, but some of them were developers of things other than applications. They designed buildings, they did other things with their technical expertise besides building technology. The other piece besides the developer is the analytics, and analytics can be taught as long as they understand how algorithms work and the code behind the analytics, in other words, how are we analyzing things, and from a data science perspective, we are leveraging technology to do the analyzing through the tool sets, so ultimately as long as they understand how tool sets work, then we can train them on the tools. Having that analytic background is an important piece. >> Craig, is it easier to, I'll go to you in a moment Joe, is it easier to cross train a data scientist to be an app developer, than to cross train an app developer to be a data scientist or does it not matter? >> Yes. (laughs) And not the other way around. It depends on the-- >> It's easier to cross train a data scientist to be an app developer than-- >> Yes. >> The other way around. Why is that? >> Developing code can be as difficult as the tool set one uses to develop code. Today's tool sets are very user friendly. where developing code is very difficult to teach a person to think along the lines of developing code when they don't have any idea of the aspects of code, of building something. >> I think it was Joe, or you next, or Jennifer, who was it? >> I would say that one of the reasons for that is data scientists will probably know if the answer's right after you process data, whereas data engineer might be able to manipulate the data but may not know if the answer's correct. So I think that is one of the reasons why having a data scientist learn the application development skills might be a easier time than the other way around. >> I think Miriam, had a comment? Sorry. >> I think that what we're advising our clients to do is to not think, before data science and before analytics became so required by companies to stay competitive, it was more of a waterfall, you have a data engineer build a solution, you know, then you throw it over the fence and the business analyst would have at it, where now, it must be agile, and you must have a scrum team where you have the data scientist and the data engineer and the project manager and the product owner and someone from the chief data office all at the table at the same time and all accomplishing the same goal. Because all of these skills are required, collectively in order to solve this problem, and it can't be done daisy chained anymore it has to be a collaboration. And that's why I think spark is so awesome, because you know, spark is a single interface that a data engineer can use, a data analyst can use, and a data scientist can use. And now with what we've learned today, having a data catalog on top so that the chief data office can actually manage it, I think is really going to take spark to the next level. >> James: Miriam? >> I wanted to comment on your question to Craig about is it harder to teach a data scientist to build an application or vice versa, and one of the things that we have worked on a lot in our data science team is incorporating a lot of best practices from software development, agile, scrum, that sort of thing, and I think particularly with a focus on deploying models that we don't just want to build an interesting data science model, we want to deploy it, and get some value. You need to really incorporate these processes from someone who might know how to build applications and that, I think for some data scientists can be a challenge, because one of the fun things about data science is you get to get into the data, and you get your hands dirty, and you build a model, and you get to try all these cool things, but then when the time comes for you to actually deploy something, you need deployment-grade code in order to make sure it can go into production at your client side and be useful for instance, so I think that there's an interesting challenge on both ends, but one of the things I've definitely noticed with some of our data scientists is it's very hard to get them to think in that mindset, which is why you have a team of people, because everyone has different skills and you can mitigate that. >> Dev-ops for data science? >> Yeah, exactly. We call it insight ops, but yeah, I hear what you're saying. Data science is becoming increasingly an operational function as opposed to strictly exploratory or developmental. Did some one else have a, Dez? >> One of the things I was going to mention, one of the things I like to do when someone gives me a new problem is take all the laptops and phones away. And we just end up in a room with a whiteboard. And developers find that challenging sometimes, so I had this one line where I said to them don't write the first line of code until you actually understand the problem you're trying to solve right? And I think where the data science focus has changed the game for organizations who are trying to get some systematic repeatable process that they can throw data at and just keep getting answers and things, no matter what the industry might be is that developers will come with a particular mindset on how they're going to codify something without necessarily getting the full spectrum and understanding the problem first place. What I'm finding is the people that come at data science tend to have more of a hacker ethic. They want to hack the problem, they want to understand the challenge, and they want to be able to get it down to plain English simple phrases, and then apply some algorithms and then build models, and then codify it, and so most of the time we sit in a room with whiteboard markers just trying to build a model in a graphical sense and make sure it's going to work and that it's going to flow, and once we can do that, we can codify it. I think when you come at it from the other angle from the developer ethic, and you're like I'm just going to codify this from day one, I'm going to write code. I'm going to hack this thing out and it's just going to run and compile. Often, you don't truly understand what he's trying to get to at the end point, and you can just spend days writing code and I think someone made the comment that sometimes you don't actually know whether the output is actually accurate in the first place. So I think there's a lot of value being provided from the data science practice. Over understanding the problem in plain english at a team level, so what am I trying to do from the business consulting point of view? What are the requirements? How do I build this model? How do I test the model? How do I run a sample set through it? Train the thing and then make sure what I'm going to codify actually makes sense in the first place, because otherwise, what are you trying to solve in the first place? >> Wasn't that Einstein who said if I had an hour to solve a problem, I'd spend 55 minutes understanding the problem and five minutes on the solution, right? It's exactly what you're talking about. >> Well I think, I will say, getting back to the question, the thing with building these teams, I think a lot of times people don't talk about is that engineers are actually very very important for data science projects and data science problems. For instance, if you were just trying to prototype something or just come up with a model, then data science teams are great, however, if you need to actually put that into production, that code that the data scientist has written may not be optimal, so as we scale out, it may be actually very inefficient. At that point, you kind of want an engineer to step in and actually optimize that code, so I think it depends on what you're building and that kind of dictates what kind of division you want among your teammates, but I do think that a lot of times, the engineering component is really undervalued out there. >> Jennifer, it seems that the data engineering function, data discovery and preparation and so forth is becoming automated to a greater degree, but if I'm listening to you, I don't hear that data engineering as a discipline is becoming extinct in terms of a role that people can be hired into. You're saying that there's a strong ongoing need for data engineers to optimize the entire pipeline to deliver the fruits of data science in production applications, is that correct? So they play that very much operational role as the backbone for... >> So I think a lot of times businesses will go to data scientist to build a better model to build a predictive model, but that model may not be something that you really want to implement out there when there's like a million users coming to your website, 'cause it may not be efficient, it may take a very long time, so I think in that sense, it is important to have good engineers, and your whole product may fail, you may build the best model it may have the best output, but if you can't actually implement it, then really what good is it? >> What about calibrating these models? How do you go about doing that and sort of testing that in the real world? Has that changed overtime? Or is it... >> So one of the things that I think can happen, and we found with one of our clients is when you build a model, you do it with the data that you have, and you try to use a very robust cross-validation process to make sure that it's robust and it's sturdy, but one thing that can sometimes happen is after you put your model into production, there can be external factors that, societal or whatever, things that have nothing to do with the data that you have or the quality of the data or the quality of the model, which can actually erode the model's performance over time. So as an example, we think about cell phone contracts right? Those have changed a lot over the years, so maybe five years ago, the type of data plan you had might not be the same that it is today, because a totally different type of plan is offered, so if you're building a model on that to say predict who's going to leave and go to a different cell phone carrier, the validity of your model overtime is going to completely degrade based on nothing that you have, that you put into the model or the data that was available, so I think you need to have this sort of model management and monitoring process to take this factors into account and then know when it's time to do a refresh. >> Cross-validation, even at one point in time, for example, there was an article in the New York Times recently that they gave the same data set to five different data scientists, this is survey data for the presidential election that's upcoming, and five different data scientists came to five different predictions. They were all high quality data scientists, the cross-validation showed a wide variation about who was on top, whether it was Hillary or whether it was Trump so that shows you that even at any point in time, cross-validation is essential to understand how robust the predictions might be. Does somebody else have a comment? Joe? >> I just want to say that this even drives home the fact that having the scrum team for each project and having the engineer and the data scientist, data engineer and data scientist working side by side because it is important that whatever we're building we assume will eventually go into production, and we used to have in the data warehousing world, you'd get the data out of the systems, out of your applications, you do analysis on your data, and the nirvana was maybe that data would go back to the system, but typically it didn't. Nowadays, the applications are dependent on the insight coming from the data science team. With the behavior of the application and the personalization and individual experience for a customer is highly dependent, so it has to be, you said is data science part of the dev-ops team, absolutely now, it has to be. >> Whose job is it to figure out the way in which the data is presented to the business? Where's the sort of presentation, the visualization plan, is that the data scientist role? Does that depend on whether or not you have that gene? Do you need a UI person on your team? Where does that fit? >> Wow, good question. >> Well usually that's the output, I mean, once you get to the point where you're visualizing the data, you've created an algorithm or some sort of code that produces that to be visualized, so at the end of the day that the customers can see what all the fuss is about from a data science perspective. But it's usually post the data science component. >> So do you run into situations where you can see it and it's blatantly obvious, but it doesn't necessarily translate to the business? >> Well there's an interesting challenge with data, and we throw the word data around a lot, and I've got this fun line I like throwing out there. If you torture data long enough, it will talk. So the challenge then is to figure out when to stop torturing it, right? And it's the same with models, and so I think in many other parts of organizations, we'll take something, if someone's doing a financial report on performance of the organization and they're doing it in a spreadsheet, they'll get two or three peers to review it, and validate that they've come up with a working model and the answer actually makes sense. And I think we're rushing so quickly at doing analysis on data that comes to us in various formats and high velocity that I think it's very important for us to actually stop and do peer reviews, of the models and the data and the output as well, because otherwise we start making decisions very quickly about things that may or may not be true. It's very easy to get the data to paint any picture you want, and you gave the example of the five different attempts at that thing, and I had this shoot out thing as well where I'll take in a team, I'll get two different people to do exactly the same thing in completely different rooms, and come back and challenge each other, and it's quite amazing to see the looks on their faces when they're like, oh, I didn't see that, and then go back and do it again until, and then just keep iterating until we get to the point where they both get the same outcome, in fact there's a really interesting anecdote about when the UNIX operation system was being written, and a couple of the authors went away and wrote the same program without realizing that each other were doing it, and when they came back, they actually had line for line, the same piece of C code, 'cause they'd actually gotten to a truth. A perfect version of that program, and I think we need to often look at, when we're building models and playing with data, if we can't come at it from different angles, and get the same answer, then maybe the answer isn't quite true yet, so there's a lot of risk in that. And it's the same with presentation, you know, you can paint any picture you want with the dashboard, but who's actually validating when the dashboard's painting the correct picture? >> James: Go ahead, please. >> There is a science actually, behind data visualization, you know if you're doing trending, it's a line graph, if you're doing comparative analysis, it's bar graph, if you're doing percentages, it's a pie chart, like there is a certain science to it, it's not that much of a mystery as the novice thinks there is, but what makes it challenging is that you also, just like any presentation, you have to consider your audience. And your audience, whenever we're delivering a solution, either insight, or just data in a grid, we really have to consider who is the consumer of this data, and actually cater the visual to that person or to that particular audience. And that is part of the art, and that is what makes a great data scientist. >> The consumer may in fact be the source of the data itself, like in a mobile app, so you're tuning their visualization and then their behavior is changing as a result, and then the data on their changed behavior comes back, so it can be a circular process. >> So Jim, at a recent conference, you were tweeting about the citizen data scientist, and you got emasculated by-- >> I spoke there too. >> Okay. >> TWI on that same topic, I got-- >> Kirk Borne I hear came after you. >> Kirk meant-- >> Called foul, flag on the play. >> Kirk meant well. I love Claudia Emahoff too, but yeah, it's a controversial topic. >> So I wonder what our panel thinks of that notion, citizen data scientist. >> Can I respond about citizen data scientists? >> David: Yeah, please. >> I think this term was introduced by Gartner analyst in 2015, and I think it's a very dangerous and misleading term. I think definitely we want to democratize the data and have access to more people, not just data scientists, but managers, BI analysts, but when there is already a term for such people, we can call the business analysts, because it implies some training, some understanding of the data. If you use the term citizen data scientist, it implies that without any training you take some data and then you find something there, and they think as Dev's mentioned, we've seen many examples, very easy to find completely spurious random correlations in data. So we don't want citizen dentists to treat our teeth or citizen pilots to fly planes, and if data's important, having citizen data scientists is equally dangerous, so I'm hoping that, I think actually Gartner did not use the term citizen data scientist in their 2016 hype course, so hopefully they will put this term to rest. >> So Gregory, you apparently are defining citizen to mean incompetent as opposed to simply self-starting. >> Well self-starting is very different, but that's not what I think what was the intention. I think what we see in terms of data democratization, there is a big trend over automation. There are many tools, for example there are many companies like Data Robot, probably IBM, has interesting machine learning capability towards automation, so I think I recently started a page on KDnuggets for automated data science solutions, and there are already 20 different forums that provide different levels of automation. So one can deliver in full automation maybe some expertise, but it's very dangerous to have part of an automated tool and at some point then ask citizen data scientists to try to take the wheels. >> I want to chime in on that. >> David: Yeah, pile on. >> I totally agree with all of that. I think the comment I just want to quickly put out there is that the space we're in is a very young, and rapidly changing world, and so what we haven't had yet is this time to stop and take a deep breath and actually define ourselves, so if you look at computer science in general, a lot of the traditional roles have sort of had 10 or 20 years of history, and so thorough the hiring process, and the development of those spaces, we've actually had time to breath and define what those jobs are, so we know what a systems programmer is, and we know what a database administrator is, but we haven't yet had a chance as a community to stop and breath and say, well what do we think these roles are, and so to fill that void, the media creates coinages, and I think this is the risk we've got now that the concept of a data scientist was just a term that was coined to fill a void, because no one quite knew what to call somebody who didn't come from a data science background if they were tinkering around data science, and I think that's something that we need to sort of sit up and pay attention to, because if we don't own that and drive it ourselves, then somebody else is going to fill the void and they'll create these very frustrating concepts like data scientist, which drives us all crazy. >> James: Miriam's next. >> So I wanted to comment, I agree with both of the previous comments, but in terms of a citizen data scientist, and I think whether or not you're citizen data scientist or an actual data scientist whatever that means, I think one of the most important things you can have is a sense of skepticism, right? Because you can get spurious correlations and it's like wow, my predictive model is so excellent, you know? And being aware of things like leaks from the future, right? This actually isn't predictive at all, it's a result of the thing I'm trying to predict, and so I think one thing I know that we try and do is if something really looks too good, we need to go back in and make sure, did we not look at the data correctly? Is something missing? Did we have a problem with the ETL? And so I think that a healthy sense of skepticism is important to make sure that you're not taking a spurious correlation and trying to derive some significant meaning from it. >> I think there's a Dilbert cartoon that I saw that described that very well. Joe, did you have a comment? >> I think that in order for citizen data scientists to really exist, I think we do need to have more maturity in the tools that they would use. My vision is that the BI tools of today are all going to be replaced with natural language processing and searching, you know, just be able to open up a search bar and say give me sales by region, and to take that one step into the future even further, it should actually say what are my sales going to be next year? And it should trigger a simple linear regression or be able to say which features of the televisions are actually affecting sales and do a clustering algorithm, you know I think hopefully that will be the future, but I don't see anything of that today, and I think in order to have a true citizen data scientist, you would need to have that, and that is pretty sophisticated stuff. >> I think for me, the idea of citizen data scientist I can relate to that, for instance, when I was in graduate school, I started doing some research on FDA data. It was an open source data set about 4.2 million data points. Technically when I graduated, the paper was still not published, and so in some sense, you could think of me as a citizen data scientist, right? I wasn't getting funding, I wasn't doing it for school, but I was still continuing my research, so I'd like to hope that with all the new data sources out there that there might be scientists or people who are maybe kept out of a field people who wanted to be in STEM and for whatever life circumstance couldn't be in it. That they might be encouraged to actually go and look into the data and maybe build better models or validate information that's out there. >> So Justin, I'm sorry you had one comment? >> It seems data science was termed before academia adopted formalized training for data science. But yeah, you can make, like Dez said, you can make data work for whatever problem you're trying to solve, whatever answer you see, you want data to work around it, you can make it happen. And I kind of consider that like in project management, like data creep, so you're so hyper focused on a solution you're trying to find the answer that you create an answer that works for that solution, but it may not be the correct answer, and I think the crossover discussion works well for that case. >> So but the term comes up 'cause there's a frustration I guess, right? That data science skills are not plentiful, and it's potentially a bottleneck in an organization. Supposedly 80% of your time is spent on cleaning data, is that right? Is that fair? So there's a problem. How much of that can be automated and when? >> I'll have a shot at that. So I think there's a shift that's going to come about where we're going to move from centralized data sets to data at the edge of the network, and this is something that's happening very quickly now where we can't just hold everything back to a central spot. When the internet of things actually wakes up. Things like the Boeing Dreamliner 787, that things got 6,000 sensors in it, produces half a terabyte of data per flight. There are 87,400 flights per day in domestic airspace in the U.S. That's 43.5 petabytes of raw data, now that's about three years worth of disk manufacturing in total, right? We're never going to copy that across one place, we can't process, so I think the challenge we've got ahead of us is looking at how we're going to move the intelligence and the analytics to the edge of the network and pre-cook the data in different tiers, so have a look at the raw material we get, and boil it down to a slightly smaller data set, bring a meta data version of that back, and eventually get to the point where we've only got the very minimum data set and data points we need to make key decisions. Without that, we're already at the point where we have too much data, and we can't munch it fast enough, and we can't spin off enough tin even if we witch the cloud on, and that's just this never ending deluge of noise, right? And you've got that signal versus noise problem so then we're now seeing a shift where people looking at how do we move the intelligence back to the edge of network which we actually solved some time ago in the securities space. You know, spam filtering, if an emails hits Google on the west coast of the U.S. and they create a check some for that spam email, it immediately goes into a database, and nothing gets on the opposite side of the coast, because they already know it's spam. They recognize that email coming in, that's evil, stop it. So we've already fixed its insecurity with intrusion detection, we've fixed it in spam, so we now need to take that learning, and bring it into business analytics, if you like, and see where we're finding patterns and behavior, and brew that out to the edge of the network, so if I'm seeing a demand over here for tickets on a new sale of a show, I need to be able to see where else I'm going to see that demand and start responding to that before the demand comes about. I think that's a shift that we're going to see quickly, because we'll never keep up with the data munching challenge and the volume's just going to explode. >> David: We just have a couple minutes. >> That does sound like a great topic for a future Cube panel which is data science on the edge of the fog. >> I got a hundred questions around that. So we're wrapping up here. Just got a couple minutes. Final thoughts on this conversation or any other pieces that you want to punctuate. >> I think one thing that's been really interesting for me being on this panel is hearing all of my co-panelists talking about common themes and things that we are also experiencing which isn't a surprise, but it's interesting to hear about how ubiquitous some of the challenges are, and also at the announcement earlier today, some of the things that they're talking about and thinking about, we're also talking about and thinking about. So I think it's great to hear we're all in different countries and different places, but we're experiencing a lot of the same challenges, and I think that's been really interesting for me to hear about. >> David: Great, anybody else, final thoughts? >> To echo Dez's thoughts, it's about we're never going to catch up with the amount of data that's produced, so it's about transforming big data into smart data. >> I could just say that with the shift from normal data, small data, to big data, the answer is automate, automate, automate, and we've been talking about advanced algorithms and machine learning for the science for changing the business, but there also needs to be machine learning and advanced algorithms for the backroom where we're actually getting smarter about how we ingestate and how we fix data as it comes in. Because we can actually train the machines to understand data anomalies and what we want to do with them over time. And I think the further upstream we get of data correction, the less work there will be downstream. And I also think that the concept of being able to fix data at the source is gone, that's behind us. Right now the data that we're using to analyze to change the business, typically we have no control over. Like Dez said, they're coming from censors and machines and internet of things and if it's wrong, it's always going to be wrong, so we have to figure out how to do that in our laboratory. >> Eaves, final thoughts? >> I think it's a mind shift being a data scientist if you look back at the time why did you start developing or writing code? Because you like to code, whatever, just for the sake of building a nice algorithm or a piece of software, or whatever, and now I think with the spirit of a data scientist, you're looking at a problem and say this is where I want to go, so you have more the top down approach than the bottom up approach. And have the big picture and that is what you really need as a data scientist, just look across technologies, look across departments, look across everything, and then on top of that, try to apply as much skills as you have available, and that's kind of unicorn that they're trying to look for, because it's pretty hard to find people with that wide vision on everything that is happening within the company, so you need to be aware of technology, you need to be aware of how a business is run, and how it fits within a cultural environment, you have to work with people and all those things together to my belief to make it very difficult to find those good data scientists. >> Jim? Your final thought? >> My final thoughts is this is an awesome panel, and I'm so glad that you've come to New York, and I'm hoping that you all stay, of course, for the the IBM Data First launch event that will take place this evening about a block over at Hudson Mercantile, so that's pretty much it. Thank you, I really learned a lot. >> I want to second Jim's thanks, really, great panel. Awesome expertise, really appreciate you taking the time, and thanks to the folks at IBM for putting this together. >> And I'm big fans of most of you, all of you, on this session here, so it's great just to meet you in person, thank you. >> Okay, and I want to thank Jeff Frick for being a human curtain there with the sun setting here in New York City. Well thanks very much for watching, we are going to be across the street at the IBM announcement, we're going to be on the ground. We open up again tomorrow at 9:30 at Big Data NYC, Big Data Week, Strata plus the Hadoop World, thanks for watching everybody, that's a wrap from here. This is the Cube, we're out. (techno music)

Published Date : Sep 28 2016

SUMMARY :

Brought to you by headline sponsors, and this is a cube first, and we have some really but I want to hear them. and appreciate you organizing this. and the term data mining Eves, I of course know you from Twitter. and you can do that on a technical level, How many people have been on the Cube I always like to ask that question. and that was obviously Great, thank you Craig, and I'm also on the faculty and saw that snake swallow a basketball and with the big paradigm Great, thank you. and I came to data science, Great, thank you. and so what I think about data science Great, and last but not least, and the scale at which I'm going to go off script-- You guys have in on the front. and one of the CDOs, she said that 25% and I think certainly, that's and so I think this is a great opportunity and the first question talk about the theme now and does that data scientist, you know, and you can just advertise and from the clients I mean they need to have and it's been, the transition over time but I have a feeling that the paradise and the company's product and they really have to focus What is the right division and one of the reasons I You dream in equations, right? and you have no interest in learning but I think you need to and the curiosity you and there's a lot to be and I like to use the analogy, and the reason I mentioned that is that the right breakdown of roles? and the code behind the analytics, And not the other way around. Why is that? idea of the aspects of code, of the reasons for that I think Miriam, had a comment? and someone from the chief data office and one of the things that an operational function as opposed to and so most of the time and five minutes on the solution, right? that code that the data but if I'm listening to you, that in the real world? the data that you have or so that shows you that and the nirvana was maybe that the customers can see and a couple of the authors went away and actually cater the of the data itself, like in a mobile app, I love Claudia Emahoff too, of that notion, citizen data scientist. and have access to more people, to mean incompetent as opposed to and at some point then ask and the development of those spaces, and so I think one thing I think there's a and I think in order to have a true so I'd like to hope that with all the new and I think So but the term comes up and the analytics to of the fog. or any other pieces that you want to and also at the so it's about transforming big data and machine learning for the science and now I think with the and I'm hoping that you and thanks to the folks at IBM so it's great just to meet you in person, This is the Cube, we're out.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Miriam Fridell	PERSON	0.99+
Greg Piateski	PERSON	0.99+
Justin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Jeff Frick	PERSON	0.99+
2015	DATE	0.99+
Joe Caserta	PERSON	0.99+
James Cubelis	PERSON	0.99+
James	PERSON	0.99+
Miriam	PERSON	0.99+
Jim	PERSON	0.99+
Joe	PERSON	0.99+
Claudia Emahoff	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Hillary	PERSON	0.99+
New York	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
Justin Sadeen	PERSON	0.99+
Greg	PERSON	0.99+
Dave	PERSON	0.99+
55 minutes	QUANTITY	0.99+
Trump	PERSON	0.99+
2016	DATE	0.99+
Craig	PERSON	0.99+
Dave Valante	PERSON	0.99+
George	PERSON	0.99+
Dez Blanchfield	PERSON	0.99+
UK	LOCATION	0.99+
Ford	ORGANIZATION	0.99+
Craig Brown	PERSON	0.99+
10	QUANTITY	0.99+
8 Path Solutions	ORGANIZATION	0.99+
CISCO	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Kirk	PERSON	0.99+
25%	QUANTITY	0.99+
Marine Corp	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
43.5 petabytes	QUANTITY	0.99+
Boston	LOCATION	0.99+
Data Robot	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
Einstein	PERSON	0.99+
New York City	LOCATION	0.99+
Nielsen	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Friday	DATE	0.99+
Ralph Timbal	PERSON	0.99+
U.S.	LOCATION	0.99+
6,000 sensors	QUANTITY	0.99+
UC Berkeley	ORGANIZATION	0.99+
Sergey Brin	PERSON	0.99+

Rob Bearden, Hortonworks - Executive On-the-Ground #theCUBE

>> Voiceover: On the Ground, presented by The Cube. Here's your host John Furrier. (techno music) >> Hello, everyone. Welcome to a special On the Ground executive interview with Rob Bearden, the CEO of Hortonworks. I'm John Furrier with The Cube. Rob, welcome to this On the Ground. >> Thank you. >> So I got to ask you, you're five years old this year, your company Hortonworks in June, have Hadoop Summit coming up, what a magical run. You guys went public. Give us a quick update on Hortonworks and what's going on. The five-year birthday, any special plans? >> Well, we're going to actually host the 10-year birthday party of Hadoop, which is you know, started at Yahoo! and open-source community. So everyone's invited. Hopefully you'll be able to make it as well. We've accomplished a lot in the last five years. We've grown to over 1000 employees, over 900 customers. This year is our first full year of being a public company, and the street has us at $265 million dollars in billings. So tremendous progress has happened and we've seen the entire data architecture begin to re-platform around Hadoop now. >> CEOs across the globe are facing profound challenges, data, cloud, mobile, obviously this digital transformation. What are you seeing our there as you talk to your customers? >> Well they view that the digital transformation is a massive opportunity for value creation, for that enterprise. And they realize that they can really shift their business models from being very reactive post-transaction to actually being able to consolidate all of the new paradigm data with the existing transaction data and actually get to a very pro-active model pre-transaction. And so they understand their customer's patterns. They understand the kinds of things that their customers want to buy before they ever engage in the procurement process. And they can make better and more compelling offers at better price points and be able to serve their customers better, and that's really the transformation that's happening and they realize the value of that creation between them and their customer. >> And one of the exciting things about The Cube is we go to all these different industry events and you were speaking last week at an event where data is at the center of the value proposition around digital transformation, and that's really been the key trend that we've been seeing consistently, that buzz word digital transformation. What does that mean to you? Because this is coming up over and over again around this digital platform, digital weathers, digital media or digital engagement. It's all around data. What's your thoughts and what is from your perspective digital transformation? >> Well, it's about being able to derive value from your data and be able to take that value back to your customers under your supply chain, and to be able to create a completely new engagement with how you're managing your interaction with your customers and your supply chain from the data that they're generating and the data that you have about them. >> When you talk to CEOs and people in the business out in the field, how much of this digital transformation do you see as real in terms of progress, real progress? In terms of total transitions, or is it just being talked about now? What's your progress bar meter? How would you peg this trend? >> I would say we're at four and I believe we'll be at six by the end of 2016. And it's one of the biggest movements I've seen since the '90s and ERP, because it's so transformational into the business model by being able to transform the data that we have about our collective entity and our collective customer and collective supply chain, and be able to apply predictive and real-time interactions against that data as events and occurrences are happening, and to be able to quickly offer products and services, and the velocity that that creates to modernization and the value creation back is at a pace that's never been able to happen. And they've really understood the importance of doing that or being disintermediated in their existing spaces. >> You mention ERP, it kind of shows our age, but I'll ask the question. Back in the '90s ERP, CRM, these were processes that were well known, that people automated with technology which was at that time unknown. You got a riser-client server technology, local area networking, TCP IP was emerging, so you got some unknown technology stuff happening, but known processes that were being automated and hence saw that boom. Now you mention today, it's interesting because Peter Burris at Wikibon's thesis says today the processes are unknown and the technology's known, so there's now a new dynamic. It's almost flipped upside-down where this digital transformation is exact opposite. IoT is a great use case where all these unknown things are coming into the enterprise that are value opportunities. Get the technology knows, so now the challenge is how to use technology, to deploy it, and be agile to capture and automate these future and/or real-time unknown processes. Your thoughts on that premise. >> The answers are buried in the data, is the great news, and so the technology as you said is there, and you have these new, unknown processes through Internet of Things, the new paradigm data sets with sensors and clickstream and mobile data. And the good news is they generate the data and we can apply technology to the data through AI and machine learning to really make sure that we understand how to transform the value out of that, out of those data sets. >> So how does IT deal with this? 'Cause going back 30 years IT was a clear line of sight, again, automating those known processes. Now you have unknown opportunities, but you have to be in a position for that. Call that cloud, call that DevOps, call that data driven, whatever the metaphor is. People are being agile, be ready for it. How is that different now and what is the future of data in that paradigm? And how does a customer come to grips and rationalize this notion of I need a clear line of sight of the value, not knowing what the processes is about data. What should they be doing? >> Well, we don't know the processes necessarily, per se, but we do know what the data is telling us because we can bring all that data under management. We can apply the right kind of algorithms, the right kind of tools on it, to give us the outcomes that we want and have the ability to monetize and unlock that value very quickly. >> Hortonworks architecture is kind of designed now at the last Hadoop Summit in Dublin. We heard about the platform. Your architecture's going beyond Hadoop, and it says Hadoop Summit and Hadoop was the key to big data. Going beyond Hadoop means other things. What does that mean for the customer? Because now they're seeing these challenges. How does Hortonworks describe that and what value do you bring to those customers? >> Big data was about data at rest and being able to drive the transformation that it has, being able to consolidate all the transactional platforms into central data architecture. Being able to bring all the new paradigm data sets to the mobile, the clickstream, the IoT data, and bring that together and be able to really transition from being reactive post-transaction to be able to be predictive and interactive pre-transaction. And that's a very, very powerful value proposition and you create a lot of value doing that, but what's really learned through that process is in the digital transformation journey, that actually the further upstream that we can get to engaging with the data, even if we can get to it at the point of origination at the furthest edge, at the point of center, at the actual time of clickstream and we can engage with that data as those events and occurrences are happening and we can process against those events as their happening, it creates higher levels of value. So from the Hortonworks platform we have the ability to manage data at rest with Hadoop, as well as data in motion with the Hortonworks data flow platform. And our view is that we must be able to engage with all the data all the time. And so we bring the platforms to bring data under management from the point of origination all the way through as it's in motion, and to the point it comes at rest and be able to aggregate those interactions through the entire process. >> It's interesting, you mention real-time, and one of the ideas of Hadoop was it was always going to be a data warehouse killer, 'cause it makes a lot of sense. You can store the data. It's unstructured data and you can blend in structured on top of that and build on top of that. Has that happened? And does real-time kind of change that equation? Because there's still a role for a data warehouse. If someone has an investment are they being modernized? Clear that up for me because I just can't kind of rationalize that yet. Data warehouses are old, the older ones, but they're not going away any time soon from what we're hearing. Your thoughts as Hadoop as the data warehouse killer. >> Yeah, well, our strategy from day one has never been to go in and disintermediate any of the existing platforms or any of the existing applications or services. In fact, to the contrary. What we wanted to do and have done from day one is be able to leverage Hadoop as an extension of those data platforms. The DW architecture has limitations to it in terms of how much data pragmatically and economically is really viable to go into the data warehouse. And so our model says let's bring more data under management as an extension to the existing data warehouses and give the existing data warehouses the ability to have a more holistic view of data. Now I think the next generation of evolution is happening right now and the enterprise is saying that's great. We're able to get more value longer from our existing data warehouse and tools investment by bringing more data under management, leveraging a combined architecture of Hadoop and data warehouse. But now they're trying to redefine really what does the data warehouse of the future look like, and it's really about how we make decisions, right? And at what point do we make decisions because in the world of DW today it assumes that data's aggregated post-transaction, right? In the new world of data architecture that's across the IT landscape, it says we want to engage with data from the point it's originated, and we want to be able to process and make decisions as events and as occurrences and as opportunities arise before that transaction potentially ever happens. And so the data warehouse of the future is much different in terms of how and when a decision's made and when that data's processed. And in many cases it's pre-transaction versus post-transaction. >> Well also I would just add, and I want to get your thoughts on this, real-time, 'cause now in the moment at the transaction we now have cloud resources and potentially other resources that could become available. Why even go to the data warehouses? So how has real-time changed the game? 'Cause data in motion kind of implies real-time whether it's IoT or some sort of bank transaction or something else. How has real-time changed the game? >> Well, it's at what point can we engage with the customer, but what it really has established is the data has to be able to be processed whether it be on Prim, in the cloud, or in a hybrid architecture. And we can't be constrained by where the data's processed. We need to be able to take the processing to the data versus having to wait for the data to come to the processing. And I think that's the very powerful part of cloud, the on Prim, and software to find networking, and when you bring all of those platforms together, you get the ability to have a very powerful and elastic processing capability at any point in the life cycle of the data. And we've never been able to put all those pieces together on an economically viable model. >> So I got to ask you, you guys are five years old in June, Hadoop's only 10 years old. Still young, still kind of in the early days, but yet you guys are public company. How are you guys looking at the growth strategy for you guys? 'Cause the trend is for people to go private. You guys went public. You're out in the open. Certainly your competitor Cloud ARIS is private, but people can get that they're kind of behind the curtain. Some say public with a $3 billion dollar graduation, but for the most part you're public. So the question is how are you guys going to sustain the growth? What is the growth strategy? What's your innovation strategy? >> Well if you look at the companies that are going private, those are the companies that are the older platforms, the older technologies, in a very mature market that have not been able to innovate those core platforms and they sort of reached their maturity cycle, and I think going private gives them the ability to do that innovation, maybe change their licensing model, the subscription, and make some of the transformations they need to make. I have no doubt they'll be very successful doing that. Our situation's much different. As the modern IT landscape is re-architecting itself almost across every layer. If you look at what's happening in the networking layer going to SDN. Certainly in our space with data and it's moving away from just transactional siloed environments to central data architectures and next generation data platforms. And being able to go all the way out to the edge and bring data under management through the entire movement cycle. We're in a market that we're able to innovate rapidly. Not only in terms of the architecture of the data platform being able to bring batch, real-time applications together simultaneously on a central data set and consolidate all of the data, but also then be able to move out and do the data in motion and be able to control an entire life cycle. There's a tremendous amount of innovation that's going to happen there, and these are significant growth markets. Both the data in motion and the data at rest market. The data at rest market's a $50 billion dollar marketplace. The data in motion market is a $1 trillion dollar TAM. So when you look at the massive opportunity to create value in these high growth markets, in the ability to innovate and create the next generation data platforms, there's a lot of room for growth and a lot of room for scale. And that's exactly why you should be public when you're going though these large growth markets in a space that's re-platforming, because the CIO wants to understand and have transparent visibility into their platform partners. They want to know how you're doing. Are you executing the plan? Or are you hiding behind a facade of one perception or another. >> Or pivoting or some sort of re-architecture. >> Right, so I think it's very appropriate in a high growth, high innovation market where the IT platforms are going through a re-architecture that you actually are public going through that growth phase. Now it forces discipline around how you operationalize the business and how you run the business, but I think that's very healthy for both the tech and the company. >> Michael Dell told me he wanted to go private mainly because he had to do some work essentially behind the curtain. Didn't want the 90-day shot clock, the demands of Wall Street. Other companies do it because the can't stand alone. They don't have a platform and they're constantly pivoting internally to try to grope and find that groove swing, if you will. You're saying that you guys have your groove swing and as Dave Velanti always says, always get behind a growing total adjustment market or TAM, you saying that. Okay, I buy that. So the TAM's growing. What are you guys doing on the platform side that's enabling your customers to re-platform and take advantage of their current data situation as well as the upcoming IoT boom that's being forecasted? >> Well, the first thing is the genesis of which we started the company around, which is we transformed Hadoop from being a batch architecture, single data set, single application, to being able to actually manage a central data architecture where all data comes under management and be able to drive and evolve from batch to batch interactive and real-time simultaneously over that central data set. And then making sure that it's truly an enterprise viable, enterprise ready platform to manage mission critical workloads at scale. And those are the areas where we're continuing to innovate around security, around data governance, around life cycle management, the operations and the management consoles. But then we want to expand the markets that we operate in and be world class and best tech on planet Earth for that data at rest and our core Hadoop business. But as we then see the opportunities to go out to the edge and from the point of origination truly manage and bring that data under management through its entire life cycle, through the movement process and create value. And so we want to continue to extend the reach of when we have data under management and the value we bring to the data through its entire life cycle. And then what's next is you have that data in its life cycle. You then move into the modern data applications, and if you look at what we've done with cyber security and some of the offerings that we've engaged in the cyber security space, that was our first entry. And that's proven to be a significant game changer for us and our customers both. >> Cyber security certainly a big data problem. Also a cloud opportunity with the horsepower you can get with computing. Give us the update. What are you seeing there from a traction standpoint? What's some of the level of engagements your having with enterprises outside of the NSA and the big government stuff, which I'm sure they're customers don't have to disclose that, but for the most part a normal enterprise are constantly planning as if they are already attacked and they're having different schemes that they're deploying. How are they using your platform for that right now? >> Well, the nature of attacks has changed. And it's evolved from just trying to find the hole in the firewall or where we get into the gateway, to how we find a way through a back door and just hang out in your network and watch for patterns and watch for the ability to aggregate relationships and then pose as a known entity that you can then cascade in. And in the world of cyber security you have to be able to understand those anomalies and be able to detect those anomalies that sit there and watch for their patterns to change. And as you go through a whole life cycle of data management between a cloud on Prim and a hybrid architecture, it opens up many, many opportunities for the bad guys to get in and have very new schemes. And our cyber security models give the ability to really track how those anomalies are attaching, where the patterns are emerging, and to be able to detect that in real-time and we're seeing the major enterprises shift to these new models, and it's become a very big part of our growth. >> So I got to change gears and ask you about open-source. You've been an open-source really from the beginning, I would call first generation commercial. But it was not a tier one citizen at that time. It was an alternative to other privatery platforms, whether you look at the network stack or certainly from software. Now today it's tier one. Still we hear business people kind of like, well, open-source. Why should a business executive care about opens-source now? And what would you say to that person who's watching about the benefits of open-source and some of the new models that could help them. >> Well, open-source in general's going to give a number of things. One, it's going to probably provide the best tech, the most innovation in a space, whether that be at the network layer or whether that be at the middle wear layer, the tools layer or certainly the data layer. And you're going to see more innovation typically happen on those platforms much faster and you've got transparent visibility into it. And it brings an ecosystem with it and I think that's really one of the fundamental issues that someone should be concerned with is what does the ecosystem around my tech look like? An open-source really draws forward a very big ecosystem in terms of innovators of the tech, but also enablers of the tech and adopters of the tech in terms of incremental applications, incremental tool sets. And what it does and the benefit to the end customer is the best tech, the most innovation, and typically operating models that don't generate lock in for 'em, and it gives them optionality to use the tech in the most appropriate architecture in the best economic model without being locked in to a proprietary path that they end up with no optionality. >> So talk about the do-it-yourself mentality. In IT that's always been frowned upon because it's been expensive, time-consuming, yet now with organic open-source and now with cloud, you saw that first generation do-it-yourself, standing up stuff on Amazon, whatnot, is being very viable. It funded shadow IT and a variety of other great things around virtualization, visualization, and so on. Today we're seeing that same pattern swing back to do-it-yourself, is good for organic innovation but causes some complexities. So I want to get your thoughts on this because this seems to be a common thread on our Cube interviews and at Hadoop Summit and at Big Data SV as part of Big Data Week when we were in town. We heard from customers and we heard the following: It's still complex and the total cost of ownership's still too high. That seems to be the common theme for slowing down the rapid acceleration of Hadoop and its ecosystem in general. One, do you agree with that? And two, if so, or what would be than answer to make that go faster? >> Well, I think you're seeing it accelerate. I think you're seeing the complexities dwindle away through both innovation and the tech and the maturing of the tech, as well as just new tool sets and applications that are leveraging it, that take away any complexity that was there. But what I think has been acknowledged is, the value that it creates and that it's worth the do-it-yourself and bringing together the spare techs because the innovation that it brings, the new architectures and the value that it creates as these platforms move into the different use cases that they're enabling. >> So I got to ask you this question. I know you're not going to like it and all the people always say, well John, why does everyone always ask that same question? You guys have a radically different approach than Cloudera. It's the number one question. I get ask them about Cloudera. Cloudera, ask them about Hortonworks. You guys have been battling. They were first. You guys came right fast followers second. With the Yahoo! thing we've been following you guys since day one. Explain the difference between Cloudera, because now a couple things have changed over the past few years. One is, Hadoop wasn't the be all end all for big data. There's been a lot of other things certainly SPARK and some other stuff happening, but yet now enterprises are adopting and coexisting with other stuff. So we've seen Cloudera make some pivots. They certainly got some good technology, but they've had some good right answers and some wrong answers. How've you guys been managing it because you're now public, so we can see all the numbers. We know what the business is doing. But relative to the industry, how are you guys compared to Cloudera? What's the differences? And what are you guys doing differently that makes Hortonworks a better vendor than Cloudera? >> I can't speak to all the Cloudera models and strategies. What I'll tell you is the foundation of our model and strategy is based on. When we founded the company we were as you mentioned, three of four years post Cloudera's founding. We felt like we needed to evolve Hadoop in terms of the architecture, and we didn't want to adopt the batch-oriented architecture. Instead we took the core Hadoop platform and through YARN enabled it to bring a central data architecture together as well as be able to be generating batch interactive in real-time applications, leveraging YARN as the data operating system for Hadoop. And then the real strategy behind that was to open up the data sets, open up the different types of use cases, be able to do it on a central data architecture. But then as other processing engines emerged, whether it be a SPARK as you brought up or some of the other ones that we see coming down the pipe, we can then integrate those engines through YARN onto the central data platform. And we open up the number of opportunities, and that's the core basis. I think that's different than some of the other competitor's technology architecture. >> Looking back now five years, are there moves that you were going to make that others have made, that you look back and say I'm glad we didn't do that given today's landscape? >> What I'm glad we did do is open up to the most use cases and workloads and data sets as possible through YARN, and that's proven to be a very, very, fundamentally differentiation of our model and strategy for anybody in the Hadoop space certainly. And I'm also very happy that we saw the opportunity about a year ago that it needed to be more than just about data at rest on Hadoop, and that actually to truly be the next generation data architecture, that you've got to be able to provide the platforms for data at rest and data in motion and our acquisition of Onyara, to be able to get the NiFi technology so that we're truly capturing the data from the point of origination all the way through the movement cycle until it comes at rest has given us now the ability to do a complete life cycle management for an entire data supply chain. And those decisions have proven to be very, very differentiation between us and any of our other competitors and it's opened up some very, very big markets. More importantly, it's accelerated the time to value that our customers get in the use cases that they're enabling through us. >> How would you talk about the scenario that people are saying about Hadoop not being the end all be all industry? At the same time, 'cause big data, as Aroon Merkey said on the Keblan Dublin. It's bigger than Hadoop now, but Hadoop has become synonymous with big data generally. Where's the leadership coming from in your mind? Because we're certainly not seeing it on the data warehouse side, 'cause those guys still have the old technology, trying to co-exist and re=platform for the future. So question is, is Hortonworks viewing Hadoop as still leading generically as a big data industry or has it become a sidebar of the big data industry? >> Of Hadoop? Hadoop is the platform, and we believe ground zero for big data. But we believe it's bigger than that. It's about all data and being able to manage the entire life cycle of all data, and that starts from the point of origination, until it comes at rest, and be able to continue to drive that entire life cycle. Hadoop certainly is the underpinning of the platform for big data, but it's really got to be about all data. Data at rest, data in motion, and what you'll see is the next leg in this is, the modern data applications that then emerge from that. >> How has the ecosystem in the Hadoop industry, I would agree with by the way the Hadoop players are leading big data in general in terms of innovation. The ecosystem's been a big part of it. You guys have invested in it. Certainly a lot of developers and open-source. How has the ecosystem changed given the current situation from where it was? And where do you see the ecosystem going? With the re-platforming not everyone can have a platform. There's a ton of guys out there that have tools, that are looking for a home, they're trying to figure out the chessboard on what's going on with the ecosystem. What's your thoughts of the current situation and how it will evolve in your view? >> Well, I think one of the strongest statements from day one is whether it's EDW or BI or relational, none of the traditional platform players say the way you solve your big data problem is with my platform. They to a company have a Hadoop platform strategy of some form to bring all of that huge volume of big data under management, and it fits our model very well in that we're not trying to disintermediate, but extend those platforms by leveraging HDP as an extension of their platform. And what that's done is it's created pool markets. It's brought Hadoop into the enterprise with a very specific value proposition in use case, bringing more data under management for that tool, that application, or that platform. And then the enterprises has realized there's other opportunities beyond that. And new use cases and new data sets, we can also gain more leverage from. And that's what's really accelerated-- >> So you see growth in the ecosystem? >> We're actually seeing exponential acceleration of the growth around the ecosystem. Not only in terms of the existing platform and tools and applications for either adopting Hadoop, but now new start-up companies building completely from scratch applications just for the big data sets. >> Let's talk about STARS. We were talking before we sat down about the challenges being an entrepreneur. You mentioned the exponential acceleration of entrepreneurs coming into the ecosystem. That's a safe harbor right now. It seems to be across the board. And a lot of the big platforms have robust, growing ecosystems. What's the current landscape of STARS? I know you're an active investor yourself and you're involved in a lot of different start-up conversations and advisor. What's your view of the current landscape right now? Series A, B, C, growth. Stalling. What needs to be in place for these companies to be successful? What are some of the things that you're seeing? >> You have to be surgically focused right now or on a very particular problem set, maybe even by industry. And understand how to solve the problem and have an absolute correlation to a value proposition and a very well defined and clear model of how you're going to go solve that problem, monetize it, and scale. Or you have to have an incredibly well-financed and deep war chest to go after a platform play that's going after a very large TAM that is enabling a re-platforming at one of the levels and the new IT landscape. >> So laser focus in a stack or vertical, and/or a huge cash from funded benchmark or other VCs, tier one VCs, to have a differentiator. They have to have some sort of enabler. >> To enable a next generation platform and something that's very transformational as a platform that really evolves the IT stack. >> What strategies would you advise entrepreneurs in terms of either white spaces to attack and/or their orientation to this new data layer? Because if this plays out as we were talking about, you're going to have a horizontal data layer where you need eye dropper ability. Need to have data in motion, but data aware. Smart data you integrate into disparate systems. Breaking down the siloed concept. How should an entrepreneur develop or look at that? Is there a certain model you've seen work successfully? Is there a certain open-source group they can jump into? What thoughts would you share? 'Cause this seems to be the toughest nut to crack for entrepreneurs. >> Right now you're seeing a massive shift in the IT data architecture, is one example. You're seeing another massive shift in the network architecture. For example, the SDN, right? You're seeing I think a big shift in the kinds of applications getting away from application functionality to data enabled applications. And I think it's important for the entrepreneur to understand where in the landscape do they really want to position? Where do they bring intellectual capital that can be monetized? Some of the areas that I think you'll see emerge very quickly in the next four, six, eight quarters are the new optimization engines, and so things around AI and machine learning. And now that we have all of the data under management through its entire life cycle, how do I now optimize both where that data's processed, in the cloud or on Prim, or as it's in motion. And there's a massive opportunity through software defined networking to actually come in and now optimize at the purest price point and/or efficiency where that data's managed, where that data's stored, and let it continue to reap the benefits. Just as Amazon's done in retail, if you like this, you should look at that. Just as Yahoo! did, I'll point out with Hadoop, it's advertising models and strategies of being able to put specific content in front of you. Those kinds of opportunities are now available for the processing and storage of data through the entire life cycle across any architectural strategy. >> Are you seeing data from a developer's standpoint being instrumental in their use cases? Meaning as I'm developing on top a data platforms like Hortonworks or others, where there's disparate data, what's their interaction? What's their relationship to the data? How are they using it? What do they need to know? Where's the line in terms of their involvement in the data? >> Well, what we're seeing is very big movement with the developed community that they now want to be able to just let the data tell them where the application service needs to be. Because in the new world of data they understand what the entity relationships are with their customers and the patterns that their customers happening. They now can highly optimize when their customers are about to cross over into from one event to the other, and what that typically means and therefore what the inverted action should be to create the best experience with their customer, to create a higher level of service, to be able to create a better packaged price point at a better margin. They also have the ability to understand it in real-time based on what the data trend is flowing, how well their product's performing. Any obstacles or issues that are happening with their product. So they don't want to have to have application logic that then they run a report on three days, three weeks after some events happened. They now are taking the data and as that data and events are happening in the data and it's telling them what to do and they're able to prescriptively act on whatever event or circumstance unfold from that. >> So they want the data now. They want real-time data embedded in the apps as on the front line developer. >> And they want to optimize what that data is doing as it's unfolding through its natural life cycle. >> Let's talk with your customer base and what their expectations are. What questions should a customer or potential customer ask to their big data vendor as they look at the future? What are the key questions they should ask? >> They should really be comparing what is your architectural strategy, first and foremost. For managing data. And what kinds of data can I manage? What are the limitations in your architecture? What workloads and data sets can't I manage? What are the latency issues that your architecture would create for me? What's your business model that's associated with us engaging together? How much of the life cycle can you enable of my data? How secure are you making my data? What kind of long tail of visibility and chain of custody can I have around the governance? What kind of governance standards are you applying to the data? How much of my governance standards can you help me automate? How easy is it to operate and how intuitive is it? How big is your ecosystem? What's your road map and your strategy? What's next in your application stack? >> So enterprises are looking at simplicity. They're looking for total cost of ownership. How is big data innovation going to solve that problem? Because with IoT, again, a lot of new stuff's happening really, really fast. How do they get their arms around this simplicity question in this total cost of ownership? How should they be thinking about it? >> Well, what the Hadoop platforms have to do and the data in motion platforms have to do is to be able to bring the data under management and bring all of the enterprise services that they have in their existing data platforms, in the areas of security, in the areas of management, in the areas of data governance, so they can truly run mission critical workloads at scale with all the same levels of predictability that they have in isolation, in their existing proprietary platforms. And be able to do it in a way that's very intuitive for their existing platforms to be able to access it, very intuitive for their operations teams to be able to manage it, and very clean and easy for their existing tools and platforms investments to leverage it. >> On the industry landscape right now what are you seeing if a consolidation? Some are saying we're seeing some consolidation. Lot of companies going private. You're seeing people buckle down. It's almost a line. If you weren't born before a certain date for the company, you might have the wrong architecture. Certainly enterprises re-platform, I would agree with that, but as a supplier to customers, you're one of the young guys. You were born in the cloud. You were born in open-source, Hortonworks. Not everyone else is like that, and certainly Oracle's like one of the big guys that keep on doing well. IBM's been around. But they're all changing, as well. And certainly a lot of these growth companies pre-IPO are kind of being sold off. What's your take on the current situation with the bubble, the softening, whatever people calling it. What's your thoughts? >> I think you see some companies who got caught up and if we sort of unpack that to the ones who are going private now, those are the companies that have operated in a very mature market space. They were able to not innovate as much as they would probably have liked to, they're probably locked into a proprietary technology in a non-subscription model of some sort. Maybe a perpetual license model. And those are very different models than the enterprise wants to adopt today and their ability to innovate and grow because the market shrank, forced them to go into very constrained environments. And ultimately, they can be great companies. They have great value propositions, but they need to go through transformations that don't include a 90-day shot clock in the public market. In the markets where there's maybe, I was in the B round or the C round and I was focused on providing a niche offering into one of those mature spaces that's becoming disintermediated or evolve quickly because an open-source company has come into the space or that section of IT stack has morphed into more of a cloud-centric or SAP-centric or an open-source centric environment. They got cut short. Their market's gone away. Their market shrunk. They can't innovate their way out of it. And they then ultimately have to find a different approach, and they may or may not be able to get the financing to do that. We're in a much different position. >> Certainly the down round. We're seeing down rounds from the high valuations. That's the first sign of trouble. >> That's the first sign. I've gotten three calls this week from companies that are liquidating and have two weeks to find a new home. >> Great, we'll look for some furniture for our new growing SiliconANGLE office. >> I think you'll have some good values. >> You personally, looking back over five year now in this journey, what an incredible run you guys have had and fun to watch you guys. What's the biggest thing that surprised you and what's the biggest thing that's happened? If you can talk about those two things 'cause again, a lots happened. The markets changed significantly. You guys went public. You got a big office here. What surprised you and what was the biggest thing that you think was the catalyst of the current trajectory? >> How quickly the market grew. We saw from day one when we started the company that this was a billion dollar opportunity, and that was the bar for starting whatever we did. We were looking for new opportunities. We had to see a billion dollar opportunity. How quickly we have seen the growth and the formation of the market in general. And then how quickly some of the new opportunities have opened up, in particular around streaming, Internet of Things, the new paradigm data sets, and how quickly the enterprises have seen the ability to create a next generation data architecture and the aggressiveness in which their moving to do that with Hadoop. And then how quickly in the last year it swung to also being able to want to bring data in motion under management, as well. >> If you could talk to a customer right here, right now, and they asked you the following question, Rob, look around the corner five years out. Tell me something that someone else can't see that you see, that I should be aware of in my business. And why should I go with Hortonworks? >> It's going to be a table stake requirement to be able to understand from whether it be your customer or your supply chain from the point they begin to engage and the first step towards engaging with your product or your service, what they're trying to accomplish, and to be able to interact with them from that first inception point. It's also going to be table stakes to understand to be able to monitor your product in real-time, and be able to understand how well it's performing, down to the component level so that you can make real-time corrections, improvements, and be able to do that on the fly. The other thing that you're going to see is that it's going to be a table stake requirement to be able to aggregate the data that's happened in that life cycle and give your customer the ability to monetize the data about them. But you as the enterprise will be responsible for creating anonymity, confidentiality and security of the data. But you're going to have to be able to provide the data about your customers and give them the ability to if they choose to monetize the data about them, that the ability to do so. >> So I get that correct, you're basically saying 100% digital. >> Oh, it's by far, within the next five years, absolutely. If you do not have a full digital model, in most industries you'll be disintermediated. >> Final question. What's the big bet that you're making right now at Hortonworks? That you say we're pinning the company on blank, fill in the blank. >> It's not about big data. It's about all data under management. >> Rob, thanks so much for spending the time here On the Ground. Rob Bearden, CEO of Hortonworks here for an executive On the Ground. I'm John for The Cube. Thanks for watching. (techno music)

Published Date : Jun 24 2016

SUMMARY :

Voiceover: On the Ground, Welcome to a special On the Ground executive interview So I got to ask you, and the street has us at $265 million dollars in billings. CEOs across the globe are facing profound challenges, and that's really the transformation that's happening and that's really been the key trend and the data that you have about them. and the value creation back is at a pace so now the challenge is how to use technology, and so the technology as you said is there, line of sight of the value, and have the ability to monetize and unlock What does that mean for the customer? the ability to manage data at rest with Hadoop, and one of the ideas of Hadoop was it was And so the data warehouse of the future So how has real-time changed the game? the data has to be able to be processed whether it be So the question is how are you guys going to of the data platform being able to bring batch, for both the tech and the company. So the TAM's growing. and the value we bring to the data What's some of the level of engagements for the bad guys to get in and have very new schemes. and some of the new models that could help them. and adopters of the tech in terms of So talk about the do-it-yourself mentality. and the tech and the maturing of the tech, and all the people always say, and that's the core basis. it's accelerated the time to value that our customers get or has it become a sidebar of the big data industry? and that starts from the point of origination, How has the ecosystem in the Hadoop industry, say the way you solve your big data problem acceleration of the growth around the ecosystem. And a lot of the big platforms have robust, and have an absolute correlation to a value proposition They have to have some sort of enabler. that really evolves the IT stack. 'Cause this seems to be the toughest nut and let it continue to reap the benefits. They also have the ability to understand it as on the front line developer. And they want to optimize what that data is doing What are the key questions they should ask? How much of the life cycle can you How is big data innovation going to solve that problem? and the data in motion platforms have to do and certainly Oracle's like one of the big guys and their ability to innovate and grow We're seeing down rounds from the high valuations. That's the first sign. for our new growing SiliconANGLE office. and fun to watch you guys. have seen the ability to create and they asked you the following question, that the ability to do so. So I get that correct, If you do not have a full digital model, What's the big bet that you're making right now It's about all data under management. for an executive On the Ground.

ENTITIES

Entity	Category	Confidence
Rob Bearden	PERSON	0.99+
Dave Velanti	PERSON	0.99+
Peter Burris	PERSON	0.99+
Rob	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
$3 billion	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
two weeks	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
Aroon Merkey	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
90-day	QUANTITY	0.99+
three days	QUANTITY	0.99+
June	DATE	0.99+
two things	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
first sign	QUANTITY	0.99+
first entry	QUANTITY	0.99+
five years	QUANTITY	0.99+
last week	DATE	0.99+
one	QUANTITY	0.99+
Dublin	LOCATION	0.99+
both	QUANTITY	0.99+
$1 trillion dollar	QUANTITY	0.99+
over 900 customers	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
over 1000 employees	QUANTITY	0.99+
$50 billion dollar	QUANTITY	0.99+
three calls	QUANTITY	0.99+
first	QUANTITY	0.99+
Hadoop	TITLE	0.99+
last year	DATE	0.99+
six	QUANTITY	0.99+
$265 million dollars	QUANTITY	0.98+
Big Data Week	EVENT	0.98+
three weeks	QUANTITY	0.98+
one example	QUANTITY	0.98+
Series A	OTHER	0.98+
Keblan Dublin	ORGANIZATION	0.98+
this week	DATE	0.98+
first step	QUANTITY	0.98+
Hadoop Summit	EVENT	0.98+
Yahoo!	ORGANIZATION	0.98+
Both	QUANTITY	0.97+
first generation	QUANTITY	0.97+
this year	DATE	0.97+
One	QUANTITY	0.97+
four	QUANTITY	0.96+
This year	DATE	0.96+
Today	DATE	0.96+
10-year birthday	QUANTITY	0.96+
Hadoop	ORGANIZATION	0.95+
end of 2016	DATE	0.95+

Ritika Gunnar & David Richards - #BigDataSV 2016 - #theCUBE

>> Narrator: From San Jose, in the heart of Silicon Valley, it's The Cube, covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data Week, Big Data SV Strata Hadoop. This is The Cube, SiliconANGLE's flagship program. We go out to the events and extract the signals from the noise. I'm John Furrier, my co-host is Peter Burris. Our next guest is Ritika Gunnar, VP of Data and Analytics at IBM and David Richards is the CEO of WANdisco. Welcome to The Cube, welcome back. >> Thank you. >> It's a pleasure to be here. >> So, okay, IBM and WANdisco, why are you guys here? What are you guys talking about? Obviously, partnership. What's the story? >> So, you know what WANdisco does, right? Data replication, active-active replication of data. For the past twelve months, we've been realigning our products to a market that we could see rapidly evolving. So if you had asked me twelve months ago what we did, we were talking about replicating just Hadoop, but we think the market is going to be a lot more than that. I think Mike Olson famously said that this Hadoop was going to disappear and he was kind of right because the ecosystem is evolving to be a much greater stack that involves applications, cloud, completely heterogeneous storage environment, and as that happens the partnerships that we would need have to move on from just being, you know, the sort of Hadoop-specific distribution vendors to actually something that can deliver a complete solution to the marketplace. And very clearly, IBM has a massive advantage in the number of people, the services, ecosystem, infrastructure, in order to deliver a complete solution to customers, so that's really why we're here. >> If you could talk about the stack comment, because this is something that we're seeing. Mike Olson's kind of being political when he says make it invisible, but the reality is there is more to big data than Hadoop. There's a lot of other stuff going on. Call it stack, call it ecosystem. A lot of great things are growing, we just had Gaurav on from SnapLogic said, "everyone's winning." I mean, I just love that's totally true, but it's not just Hadoop. >> It's about Alldata and it's about all insight on that data. So when you think about Alldata, Alldata is a very powerful thing. If you look at what clients have been trying to do thus far, they've actually been confined to the data that may be in their operational systems. With the advent of Hadoop, they're starting to bring in some structured and unstructured data, but with the advent of IOT systems, systems of engagement, systems of records and trying to make sense of all of that, Alldata is a pretty powerful thing. When I think of Alldata, I think of three things. I think of data that is not only on premises, which is where a lot of data resides today, but data that's in the cloud, where data is being generated today and where a majority of the growth is. When I think of Alldata, I think of structured data, that is in your traditional operational systems, unstructured and semi-structured data from IOT systems et cetera, and when I think of Alldata, I think of not just data that's on premises for a lot of our clients, but actually external data. Data where we can correlate data with, for example, an acquisition that we just did within IBM with The Weather Company or augmenting with partnerships like Twitter, et cetera, to be able to extract insight from not just the data that resides within the walls of your organization, but external data as well. >> The old expression is if you want to go fast, do it alone, if you want to go deeper and broader and more comprehensive, do it as a team. >> That's right. >> That expression can be applied to data. And you look at The Weather data, you think, hmmm, that's an outlier type acquisition, but when you think about the diversity of data, that becomes a really big deal. And the question I want to ask you guys is, and Ritika, we'll start with you, there's always a few pressure points we've seen in big data. When that pressure is relieved, you've seen growth, and one was big data analytics kind of stalled a little bit, the winds kind of shifted, eye of the storm, whatever you want to call it, then cloud comes in. Cloud is kind of enabling that to go faster. Now, a new pressure point that we're seeing is go faster with digital transformation. So Alldata kind of brings us to all digital. And I know IBM is all about digitizing everything and that's kind of the vision. So you now have the pressure of I want all digital, I need data driven at the center of it, and I've got the cloud resource, so kind of the perfect storm. What's your thoughts on that? Do you see that similar picture? And then does that put the pressure on, say, WANdisco, say hey, I need replication, so now you're under the hood? Is that kind of where this is coming together? >> Absolutely. When I think about it, it's about giving trusted data and insights to everyone within the organization, at the speed in which they need it. So when you think about that last comment of, "At the speed in which they need it," that is the pressure point of what it means to have a digitally transformed business. That means being able to make insights and decisions immediately and when we look at what our objective is from an IBM perspective, it's to be able to enable our clients to be able to generate those immediate insights, to be able to transform their business models and to be able to provide the tooling and the skills necessary, whether we have it organically, inorganically, or through partnerships, like with WANdisco to be able to do that. And so with WANdisco, we believe we really wanted to be able to activate where that data resides. When I talk about Alldata and activation of that data, WANdisco provided to us complementary capabilities to be able to activate that data where it resides with a lot of the capabilities that they're providing through their fusion. So, being able to have and enable our end-users to have that digitally infused set of reactive type of applications is absolutely something... >> It's like David, we talk about, and maybe I'm oversimplifying your value proposition, but I always look at WANdisco as kind of the five nines of data, right? You guys make stuff work, and that's the theme here this year, people just want it to work, right? They don't want to have it down, right? >> Yeah, we're seeing, certainly, an uptick in understanding about what high availability, what continuous availability means in the context of Hadoop, and I'm sure we'll be announcing some pretty big deals moving forward. But we've only just got going with IBM. I would, the market should expect a number of announcements moving forward as we get going with this, but here's the very interesting question associated with cloud. And just to give you a couple of quick examples, we are seeing an increasing number of Global 1,000 companies, Fortune 100 companies move to cloud. And that's really important. If you would have asked me 12 months ago, how is the market going to shape up, I'd have said, well, most CIO's want to move to cloud. It's already happening. So, FINRA, the major financial regulator in the United States is moving to cloud, publicly announced it. The FCA in the UK publicly announced they are moving 100% to cloud. So this creates kind of a microcosm of a problem that we solve, which is how do you move transactional data from on-premise to cloud and create a sort of hybrid environment. Because with the migration, you have to build a hybrid cloud in order to do that anyway. So, if it's just archive systems, you can package it on a disk drive and post it, right? If we're talking about transactional data, i.e, stuff that you want to use, so for example, a big travel company can't stop booking flights while they move their data into the cloud, right? They would take six months to move petabyte scale data into cloud. We solve that problem. We enable companies to move transactional data from on-premise into cloud, without any interruption to services. >> So not six months? >> No, not six months. >> Six hours? >> And you can keep on using the data while it is in transit. So we've been looking for a really simplistic problem, right, to explain this really complex algorithm that we've got that you know does this active-active replication stuff. That's it, right? It's so simple, and nobody else can do it. >> So no downtime, no disruption to their business? >> No, and you can use the cloud or you can use the on-prem applications while the data is in transit. >> So when you say all cloud, now we're on a theme, Alldata, all digital, all cloud, there's a nuance there because most, and we had Gaurav from SnapLogic talk about it, there's always going to be an on-prem component. I mean, probably not going to see 100% everyone move to the cloud, public cloud, but cloud, you mean hybrid cloud essentially, with some on-prem component. I'm sure you guys see that with Bluemix as well, that you've got some dabbling in the public cloud, but ultimately, it's one resource pool. That's essentially what you're saying. >> Yeah, exactly. >> And I think it's really important. One of the things that's very attractive e about the WANdisco solution is that it does provide that hybridness from the on-premises to cloud and that being able to activate that data where it resides, but being able to do that in a heterogeneous fashion. Architectures are very different in the cloud than they are on premises. When you look at it, your data like may be as simple as Swift object store or as S3, and you may be using elements of Hadoop in there, but the architectures are changing. So the notion of being able to handle hybrid solutions both on-premises and cloud with the heterogeneous capability in a non-invasive way that provides continuous data is something that is not easily achieved, but it's something that every enterprise needs to take into account. >> So Ritika, talk about the why the WANdisco partnership, and specifically, what are some of the conversations you have with customers? Because, obviously there's, it sounds like, the need to go faster and have some of this replication active-active and kind of, five nines if you will, of making stuff not go down or non-disruptive operations or whatever the buzzword is, but you know, what's the motivation from your standpoint? Because IBM is very customer-centric. What are some of the conversations and then how does WANdisco fit into those conversations? >> So when you look at the top three use cases that most clients use for even Hadoop environments or just what's going on in the market today, the top three use cases are you know, can I build a logical data warehouse? Can I build areas for discovery or analytical discovery? Can I build areas to be able to have data archiving? And those top three solutions in a hybrid heterogeneous environment, you need to be able to have active-active access to the data where that data resides. And therefore, we believe, from an IBM perspective, that we want to be able to provide the best of breed regardless of where that resides. And so we believe from a WANdisco perspective, that WANdisco has those capabilities that are very complementary to what we need for that broader skills and tooling ecosystem and hence why we have formed this partnership. >> Unbelievably, in the market, we're also seeing and it feels like the Hadoop market's just got going, but we're seeing migrations from distributions like Cloudera into cloud. So you know, those sort of lab environments, the small clusters that were being set up. I know this is slightly controversial, and I'll probably get darts thrown at me by Mike Olson, but we are seeing pretty large-scale migration from those sort of labs that were set up initially. And as they progress, and as it becomes mission-critical, they're going to go to companies like IBM, really, aren't they, in order to scale up their infrastructure? They're going to move the data into cloud to get hyperscale. For some of these cases that Ritika was just talking about so we are seeing a lot of those migrations. >> So basically, Hadoop, there's some silo deployments of POC's that need to be integrated in. Is that what you're referring to? I mean, why would someone do that? They would say okay, probably integration costs, probably other solutions, data. >> If you do a roll-your-own approach, where you go and get some open-source software, you've got to go and buy servers, you've got to go and train staff. We've just seen one of our customers, a big bank, two years later get servers. Two years to get servers, to get server infrastructure. That's a pretty big barrier, a practical barrier to entry. Versus, you know, I can throw something up in Bluemix in 30 minutes. >> David, you bring up a good point, and I want to just expand on that because you have a unique history. We know each other, we go way back. You were on The Cube when, I think we first started seven years ago at Hadoop World. You've seen the evolution and heck, you had your own distribution at one point. So you know, you've successfully navigated the waters of this ecosystem and you had gray IP and then you kind of found your swim lanes and you guys are doing great, but I want to get your perspective on this because you mentioned Cloudera. You've seen how it's evolving as it goes mainstream, as you know, Peter says, "The big guys are coming in and with power." I mean, IBM's got a huge spark investment and it's not just you know, lip service, they're actually donating a ton of code and actually building stuff so, you've got an evolutionary change happening within the industry. What's your take on the upstarts like Cloudera and Hortonworks and the Dishrow game? Because that now becomes an interesting dynamic because it has to integrate well. >> I think there will always be a market for the distribution of opensource software. As that sort of, that layer in the stack, you know, certainly Cloudera, Hortonworks, et cetera, are doing a pretty decent job of providing a distribution. The Hadoop marketplace, and Ritika laid this on pretty thick as well, is not Hadoop. Hadoop is a component of it, but in cloud we talk about object store technology, we talk about Swift, we talk about S3. We talk about Spark, which can be run stand-alone, you don't necessarily need Hadoop underneath it. So the marketplace is being stretched to such a point that if you were to look at the percentage of the revenue that's generated from Hadoop, it's probably less than one percent. I talked 12 months ago with you about the whale season, the whales are coming. >> Yeah, they're here. >> And they're here right now, I mean... >> (laughs) They're mating out in the water, deals are getting done. >> I'm not going to deal with that visual right now, but you're quite right. And I love the Peter Drucker quote which is, "Strategy is a commodity, execution is an art." We're now moving into the execution phase. You need a big company in order to do that. You can't be a five hundred or a thousand person... >> Is Cloudera holding onto dogma with Hadoop or do they realize that the ecosystem is building around them? >> I think they do because they're focused on the application layer, but there's a lot of competition in the application layer. There's a little company called IBM, there's a little company called Microsoft and the little company called Amazon that are kind of focused on that as well, so that's a pretty competitive environment and your ability to execute is really determined by the size of the organization to be quite frank. >> Awesome, well, so we have Hadoop Summit coming up in Dublin. We're going to be in Ireland next month for Hadoop Summit with more and more coverage there. Guys, thanks for the insight. Congratulations on the relationship and again, WANdisco, we know you guys and know what you guys have done. This seems like a prime time for you right now. And IBM, we just covered you guys at InterConnect. Great event. Love The Weather Company data, as a weather geek, but also the Apple announcement was really significant. Having Apple up on stage with IBM, I think that is really, really compelling. And that was just not a Barney deal, that was real. And the fact that Apple was on stage was a real testament to the direction you guys are going, so congratulations. This is The Cube, bringing you all the action, here live in Silicon Valley here for Big Data Week, BigData SV, and Strata Hadoop. We'll be right back with more after this short break.

Published Date : Mar 30 2016

SUMMARY :

the heart of Silicon Valley, and David Richards is the CEO of WANdisco. What's the story? and as that happens the partnerships but the reality is there is but data that's in the cloud, if you want to go deeper and broader to ask you guys is, and to be able to provide the tooling how is the market going to that we've got that you know the cloud or you can use dabbling in the public cloud, from the on-premises to cloud the need to go faster and the top three use cases are you know, and it feels like the Hadoop of POC's that need to be integrated in. a practical barrier to entry. and it's not just you know, lip service, in the stack, you know, mating out in the water, And I love the Peter and the little company called Amazon to the direction you guys are

ENTITIES

Entity	Category	Confidence
Michiel	PERSON	0.99+
Anna	PERSON	0.99+
David	PERSON	0.99+
Bryan	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Chris	PERSON	0.99+
NEC	ORGANIZATION	0.99+
Ericsson	ORGANIZATION	0.99+
Kevin	PERSON	0.99+
Dave Frampton	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Kerim Akgonul	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Jared	PERSON	0.99+
Steve Wood	PERSON	0.99+
Peter	PERSON	0.99+
Lisa Martin	PERSON	0.99+
NECJ	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Mike Olson	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Michiel Bakker	PERSON	0.99+
FCA	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Nokia	ORGANIZATION	0.99+
Lee Caswell	PERSON	0.99+
ECECT	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
OTEL	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
Bryan Pijanowski	PERSON	0.99+
Rich Lane	PERSON	0.99+
Kerim	PERSON	0.99+
Kevin Bogusz	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jared Woodrey	PERSON	0.99+
Lincolnshire	LOCATION	0.99+
Keith	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Chuck	PERSON	0.99+
Jeff	PERSON	0.99+
National Health Services	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
March	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Ireland	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Michael Dell	PERSON	0.99+
Rajagopal	PERSON	0.99+
Dave Allante	PERSON	0.99+
Europe	LOCATION	0.99+
March of 2012	DATE	0.99+
Anna Gleiss	PERSON	0.99+
Samsung	ORGANIZATION	0.99+
Ritika Gunnar	PERSON	0.99+
Mandy Dhaliwal	PERSON	0.99+

Jack Norris | Strata-Hadoop World 2012

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George Gilder	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Kristin Filetti	PERSON	0.99+
Joel Hellison	PERSON	0.99+
John Schroeder	PERSON	0.99+
Joe	PERSON	0.99+
Jack	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Jack Norris	PERSON	0.99+
John	PERSON	0.99+
40 days	QUANTITY	0.99+
Melinda Graham	PERSON	0.99+
64%	QUANTITY	0.99+
$99	QUANTITY	0.99+
comScore	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Dave	PERSON	0.99+
Tuesday	DATE	0.99+
Matt BARR	PERSON	0.99+
Hellerstein	PERSON	0.99+
Google	ORGANIZATION	0.99+
George Gilder	PERSON	0.99+
Ted	PERSON	0.99+
John ferry	PERSON	0.99+
30 years	QUANTITY	0.99+
30,000 times	QUANTITY	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
a week later	DATE	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Dana	PERSON	0.99+
Tim SDS	PERSON	0.99+
one point	QUANTITY	0.99+
Java	TITLE	0.99+
first	QUANTITY	0.99+
six months later	DATE	0.99+
one	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
Linux	TITLE	0.98+
once a week	QUANTITY	0.98+
18 months	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
HBase	TITLE	0.98+
Kozum	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
this morning	DATE	0.97+
Telekom	ORGANIZATION	0.97+
this week	DATE	0.97+
10 years ago	DATE	0.97+
second dimension	QUANTITY	0.97+
both	QUANTITY	0.97+
Kozum	ORGANIZATION	0.95+
third one	QUANTITY	0.95+
One	QUANTITY	0.94+
three things	QUANTITY	0.94+
a year ago	DATE	0.94+
Hadoop	TITLE	0.93+
siliconangle.com	OTHER	0.93+
Knicks	ORGANIZATION	0.93+
Regents	ORGANIZATION	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Big Data Week: